Monitor and Recover Stopped Automatic Services with Log Analytics

Update: This can now be accomplished using the Change Tracking and Inventory solution as well (see here).

I was working with a customer recently and one of the asks was to configure Log Analytics to monitor for stopped automatic services on servers throughout the environment.  Since I first posted this blog updates have been made to the Change Tracking and Inventory solution which allow for 1 minute collection intervals, and therefore using Log Analytics becomes an option for a simpler configuration to accomplish this task. The following query can be used for a simple service stopped alert:

| where SvcName =~ “W3svc”
| project Computer, SvcName, SvcDisplayName, SvcState, TimeGenerated
| where SvcState != “Running”

Once this query has been saved, it can be targeted in Azure Monitor Alerts.

The downside to using the Log Analytics solution is if the agent is not sending data to the workspace, the alert will no fire. Additionally, we can add more granular logic using a script. Both methods would work depending on the use case.

In this particular case we needed something a bit more granular.  Time for some fun with PowerShell, Azure Automation and the Data Collector API!

Step 1 was to create a PowerShell script to poll for services set to automatic and in a stopped state.The first part of the script simply gets the necessary credential assets, variables, and logs into Azure.  Nothing fancy here, but you will need to follow the prerequisites at the bottom of this post to ensure that your variables and credential assets are in place.

Write-Output "Getting Azure credentials...." 

#Get Creds
$AzureCred = Get-AutomationPSCredential -Name $AzureUser 

Write-Output $AzureCred
Write-Output "Logging into Azure...." 

#Login to Azure Subscription
Login-AzureRmAccount -Credential $AzureCred
Select-AzureRmSubscription -SubscriptionName "Microsoft Azure Sponsorship" 

Write-Output "Getting Local credentials...."
#Get Domain Creds to run local workflows
$DomainCred = Get-AutomationPSCredential -Name $DomainUser
Write-Output $DomainCred 

#Update customer Id to your OMS workspace ID
$CustomerID = Get-AutomationVariable -Name 'OMSWSID' 

#For shared key use either the primary or secondary Connected Sources client authentication key
$SharedKey = Get-AutomationVariable -Name 'OMSWSPK' 

#Get Workspace name and Resourcegroup name for OMS Search API function
$WorkSpaceName =Get-AutomationVariable -Name 'OMSWSName'
$ResourceGroupName = Get-AutomationVariable -Name 'OMSResourceGroup'

The next part of the script utilizes the Search API to collect a list of Log Analytics managed computers to poll for stopped services.  This allows me to avoid using text files or querying AD for a list of computers and avoids collecting data from non-production servers.

#Query OMS for computers registering heartbeats in the last 1 hour
Import-Module AzureRm.OperationalInsights
$dynamicQuery = 'Type=Heartbeat TimeGenerated>NOW-1HOUR | Measure count() by Computer | select Computer'
$Result = Get-AzureRmOperationalInsightsSearchResults `
	-ResourceGroupName $ResourceGroupName `
	-WorkspaceName $WorkspaceName `
	-Query $dynamicQuery
$OMSComputers=$Result.Value | ConvertFrom-Json
$OMSComputers | out-null

Now that I have my list of computers, I can loop through and query each computer using WMI for services that are both stopped and set to automatic.  I’ve also added logic to exclude services set to “Automatic Delayed” to avoid false alarms.  Any services that meet this criteria are then passed to the next section of the script where custom PS Objects are created for each property that will be passed to the Data Collector API using the Send-OMSAPIIngestionFile PowerShell module.

Note:  For larger environments I’ve provided an example using the PowerShell Workflow as we can utilize the ForEach -Parallel construction to iterate through a collection of objects in parallel rather than waiting for each loop to finish before moving on to the next.  This can save quite a bit of execution time.  We could of course use jobs as well, but during my testing jobs and Workflow took the same amount of time so I will provide the Workflow version as an example for those that haven’t used PowerShell Workflow in the past.  See the link at the bottom of the post for both runbook examples.


#Define custom for API
$Timestampfield = " " 

ForEach ($Computer in $OMSComputers)
        #Exclude the Hybrid Worker
        If ($Computer.Computer -ne "AAHybrid01.demo.local")
                Write-Output "Getting services on $ComputerName..." 

                $Array = @()
                $StoppedSvcs = @() 

                $StoppedSvcs = Invoke-Command -ScriptBlock {
                    $Services = Get-WmiObject -Class Win32_Service -Filter {State != 'Running' and StartMode = 'Auto'} -Credential $DomainCred -ComputerName $ComputerName -ea Continue 

                    #Exclude delayed start services
                    ForEach ($Service in $Services)
                            $DelayCheckSvc = $Service.Name
                            $DelayCheckReg = Get-ItemProperty -Path "HKLM:\SYSTEM\CurrentContolSet\Services\$DelayCheckSvc" -ErrorAction SilentlyContinue
                            $DelayCheck = $DelayCheckReg | Where-Object {$_.Start -eq 2 -and $_.DelayedAutoStart -eq 1} 

                            If (!$DelayCheck)

                $StoppedSvcs| out-null 

                If ($StoppedSvcs)
                        Foreach ($Svc in $StoppedSvcs)
                                #Format OMS schmea
                                $sx = New-Object PSObject ([ordered]@{

                                $jsonTable = ConvertTo-Json -InputObject $array

                                Send-OMSAPIIngestionFile -customerId $CustomerID -sharedKey $SharedKey -body $jsonTable -logType $logtype -TimeStampField $Timestampfield

        $ErrorMessage = "Exception Message: $($_.Exception.Message)"

"Exceptions...." $ErrorMessage

Once the code was fully tested (this is sample code and should be tested thoroughly before using in a production environment), the next step was to copy the code into a new Azure Automation PowerShell runbook called Get-StoppedServices.  Once we validated the functionality, the runbook was published and ready to go!  When executing the runbook in the Azure Automation Test Pane, the output should look similar to below:


And now to see if the data is showing up in Log Analytics…


Looking good!  The last step is to schedule the runbook so that we are collecting this data regularly.  I am running the Get-StoppedServices runbook every 10 minutes, but you can schedule the frequency for what works best in your environment.
Note:  To schedule runbooks at intervals less than 1 hour using Azure Scheduler see my post here.  Additional options include configuring a runbook to schedule intervals or even creating an hourly recurring schedule for each minute interval (see below).


Now that we have our runbook scheduled and the stopped services data is populating in OMS, we can create queries, alerts, and even use the data in custom solutions and views. Let’s take a look at what an alert might look like.


Notice that I’ve filtered my query to only alert when specific services are returned. Because this alert is tied to a remediation runbook, you may want to filter the services to avoid restarting services that are not critical or should not be started.  Another option would be to filter these services in the script.

We can also use the data collected to create a stopped services blade in View Designer or My Dashboard.  The blade below is reproduction of part of an application monitoring solution that I am working on with a customer.


Additionally, you may have noticed that I’ve linked the alert to a runbook called Restart-Stopped Services.  The Restart-StoppedServices runbook will be the topic of part 2 of this blog mini-series which I will be releasing soon.  Until then, happy testing!

NOTE:  The code provided is for testing purposes only and should not be used in production without thorough testing.

Get-StoppedServices Prerequisites:

  • Configure a Hybrid Runbook Worker (for on-premises servers) – if one does not already exist.
  • Configure an Azure Automation Account and link the account to the OMS workspace where the data will be collected.
  • Import the Get-StoppedServices runbook to Azure Automation.
  • Create a Variable Asset for the OMS Workspace ID called ‘OMSWSID’.
  • Create a Variable Asset for the OMS Primary Key called ‘OMSWSPK’.
  • Create a Credential Asset called ‘AzureCred’ with rights to log into Azure and write to OMS.
  • Create a Credential Asset called ‘DomainCred’ with rights to execute Get-WMIObject queries against the on-premises servers.

Get the sample Get-StoppedServices runbooks here


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s