This post was originally published here.
Recently, we’ve encountered some problems with SCOM 2007 R2 monitoring of a BizTalk 2010 environment. The SQL Agent Job “Backup BizTalk Server (BizTalkMgmtDb)” failed, but we didn’t receive any SCOM alert about that. The first reflex was to look at the BizTalk Management Pack.
BizTalk Server Management Pack
The BizTalk Server Management Pack contains a rule, called “CRITICAL ERROR: A SQL Server agent job failed – Backup BizTalk Server”, which is responsible for alerting us in case of troubles. The rule is disabled by default, but we had already enabled the rule with an override.
Why didn’t we receive an alert? We had a closer look at the default rule and discovered that this is the way of identifying the backup job failure:
- It’s a rule that subscribes on the event log [TypeID=”Windows!Microsoft.Windows.EventProvider”]
- It looks for event log entries with ID = 208 [EventDisplayNumber equals 208] in the Application log
- The computer name is set to $Target/Property[Type=”Microsoft.BizTalk.Server.2010.ServerRole”]/ComputerName$
When the error occurred, we indeed discovered event log entries with ID 208 on the SQL server. But this rule is subscribing on the event log of the BizTalk server. Conclusion: the default rule will only work on a single box installation (SQL server and BizTalk server have the same event log).
To solve this issue, we’ve created a custom rule, based on the default one. The custom rule had exactly the same configuration, except for the computer name that we’ve changed to:
Now the rule was subscribing on the event log of the SQL Server (running the ManagementDb), and we received an alert when the backup job failed. This worked fine, until we deployed this rule against a multi-server BizTalk environment, containing a SQL Server cluster. Suddenly we’ve received SCOM errors, complaining that the SCOM Agent couldn’t access the event log of the MgmtDbServer. The cause of this is that the BizTalk Management Pack actually discovers the virtual SQL Server Cluster name. As this is not a physical server name, it’s normal that the SCOM Agent can’t access the event log.
To overcome this issue, we could have created a new custom rule, which uses the discovered MgmtDbServer and performs a query on the [msdb].[dbo].[sysjobhistory] system table. The sysjob tables contain all needed information to detect SQL Agent Job failures. As this is too complex, we decided to have a look at the SQL Server Management Pack.
SQL Server Management Pack
As it is actually a SQL process that fails, the SQL Server Management Pack should be responsible for alerting us. But why didn’t that happen? There are actually two reasons for this:
- The discovery of SQL Agent Jobs is disabled by default.
- The alerting for SQL Agent Job Failures is disabled by default.
We enabled the discovery and alerting, tested the solution and everything went smooth.
SQL Agent Jobs should be monitored by the SQL Server Management Pack. This Management Pack should always be installed in a BizTalk environment, as SQL Server is the core of BizTalk. Please, keep in mind that the alerting of SQL Agent Jobs is disabled by default.
It’s actually a strange approach of Microsoft to try to include the monitoring of the SQL Agent Jobs in the BizTalk Server Management Pack. Certainly because the default implementation only works for a single box installation. On MSDN you can find a vague description on this subject.