Check SCOM 2012 Agent Health Programmatically

SCOM alerts are only as good as SCOM agents are. From time to time agents go “dark”. Repairing or reinstalling the agent in question may help.

Unfortunately this tends to happen with the SCOM RMS/MS agent too, one way to cause this condition is to restart SQL server running OperationsManager (SCOM) database. Even during patching runs, it may be necessary to keep in mind this dependency and ensure that SCOM SQL server is serviced/rebooted first, before SCOM RMS/MS server.

So what do we do if the server responsible for monitoring stops working, without anyone noticing? Most interestingly, when this happens SCOM console may continue to work (at least at a basic launch/navigate level).

Some folks set up multiple monitoring servers with intention of setting up some cross-monitoring, but in some cases it may be better to just schedule a small program or a script that would run a check of all SCOM agents and take some action if any of the agents are not healthy.

Checking Agent Status

        Dim agentCriteria As AgentManagedComputerCriteria
        Dim agents As ReadOnlyCollection(Of AgentManagedComputer)
        Dim mg As ManagementGroup

        Try
            Using mg = New ManagementGroup(My.Settings.ManagementGroup)
                agentCriteria = New AgentManagedComputerCriteria("LastModified >= '" + New DateTime(2000, 1, 1).ToString("G") + "'")
                agents = mg.Administration.GetAgentManagedComputers(agentCriteria)

                For Each agent As AgentManagedComputer In agents
                    If agent.HealthState > 1 Then
                        Log(agent.Name & " is monitored but not healthy. Status: " & agent.HealthState, EventLogEntryType.Warning, 390)
                        'do something to notify about the issue
                    End If
                Next

            End Using

        Catch ex As Exception
            'handle any unexpected errors here

        Finally
            agents = Nothing
            agentCriteria = Nothing

        End Try

This code snippet is pretty self-explanatory. AgentManagedComputer HealthState property can be of the following values:

  1. Agent health state 0 = not monitored
  2. Agent health state 1 = healthy
  3. Agent health state 3 = critical

I suspect that state 2 would be “dark” agent state, though I have not seen that condition yet after putting in this code. We can safely conclude that everything with a status of more than 1 is being monitored and is not healthy, and therefore is worthy of notifying about.

“Log” function in the code above should be substituted with whatever action you wish the script/program to do; in my case Log calls a function that writes an event into Event Log and I left the line to show the usage of Agent.Name and Agent.HealthState properties.

“Using” block can be used to open a new connection to the management group (“My.Settings.ManagementGroup” variable takes the name of the group from the VB.NET project settings). “Using” block automatically releases all resources used by the connection when the execution exits the block – quite convenient.

You can view other properties and methods of the AgentManagedComputer object class on MSDN.

 

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>