The proper way to alert on Azure hosted VM guest OS metrics in Azure Monitor is a topic that is repeatedly raised when our engineers and developers start to configure alerting for their applications. In most cases, Log Analytics is enabled for each subscription (DevOps model), in addition to the base metrics available at the resource level for each VM. There is of course the option to enable guest OS diagnostic logs for extended performance metrics as well, so without some background into how these capabilities work under the hood it can be quite confusing to figure out from which tool metrics are generated and which metrics to use when configuring alerts in Azure Monitor Alerts.
Let’s start with the basics:
When Log Analytics is configured for performance data collection, most log data will be consumed from either the Log Analytics workspace or Azure Monitor Logs at the Azure Monitor or resource level. Additionally, several Azure Monitor Insights solutions utilize Log Analytics, including Azure Monitor for VMs. Log data is not consumed via Azure Monitor Metrics.
One caveat to the above statement is the ability to target a Log Analytics workspace from Azure Monitor Metrics when an alert has been created using “Metric alerts for Logs” (more on this below). When you create an alert targeting a Log Analytics workspace performance metric it gets routed into the metrics data store. This allows the alert to be “near real time” since the data is now using the quicker of the two pipelines (metrics vs. logs) for alerting. Once this alert is configured, the metric is not only part of the alert rule and stored in metrics, it is also available for viewing in metrics explorer.
To view metrics by targeting a Log Analytics workspace, choose the Log Analytics workspace as the “Resource” and the performance metric for which you created the alert. In this case I created an alert for % Free Space.
The default view will show the workspace total rather than individual VMs. To see individual VMs, choose the “Apply splitting” option and select “Computer”.
Because these metrics only appear once an alert is created which targets a “Metric alerts for logs” metric, using this method can be a bit more complicated than necessary. That said, if guest OS diagnostics are ruled out due to redundancy and cost, this could be an option to view “near real time” metrics for data collected by Log Analytics if the log ingestion latency becomes an issue for certain metrics.
Guest OS Diagnostic Logs:
When guest OS diagnostics are enabled, they will be consumed via Azure Monitor Metrics whether Log Analytics is enabled or not. When accessing performance data via Azure Monitor Logs/Log Analytics workspace/Azure Monitor for VMs, log data is being consumed and guest OS diagnostics are not utilized. It’s important to note that some base metrics are available by default in the portal when a VM is configured (CPU, Network, Disk bytes, Disk Operations). For all other guest OS performance metrics, diagnostic logs must be enabled to view metrics from Azure Monitor Metrics. Once enabled, these metrics will be available in Azure Monitor Metrics for consumption.
Update 7/16/2019: I’ve just noticed while working in the portal that the option to write VM guest OS diagnostics direct to the Azure Monitor sink via the Azure portal has been added in preview. The feature is not yet enabled, but this is great news for simplicity of managing metrics and alerts when guest OS diagnostics are enabled.
One thing to keep in mind re guest OS diagnostic logs is that unlike other Azure resources, although they are consumable via Azure Monitor Metrics, OS diagnostic logs are not available by default for alerting via Azure Monitor Alerts. This part becomes a bit confusing because for most other Azure resources this is not the case.
It is not currently possible to manually configure guest OS diagnostics to write to the Azure Monitor Sink via the Azure portal, which is where the data needs to be sent to enable alerting from the Azure Monitor Alerts. You CAN do this programmatically using code (i.e. ARM template), but in our case this does not suffice due to thousands of existing VMs and lack of resources at the team levels to retroactively update each VM. To work around this, we can use the aforementioned “Metric alerts for logs” capability which allows “near real time” alerting via Azure Monitor Alerts using Log Analytics performance data. As stated above, these logs are sent to the Metrics data store prior to ingestion into Log Analytics, which removes the ingestion latency that occurs when using log alerts. So, in short, if you enable guest OS diagnostics, you will still need to enable Log Analytics performance and event data collection and either use “Metric alerts for logs” or log alerts to alert on VM performance metrics and events (unless you use the template to add the Azure Monitor Sink to each VM).
The good news is that eventually Microsoft will add the ability to send VM guest OS diagnostic logs direct to Azure Monitor via the portal. Until this occurs, we are using “Metric alerts for logs” for low latency performance metric and event-based alerts with log alerts configured for metrics not supported in the base OS metrics listed here. For SQL or other technologies running on VMs, the only option is to use log-based alerts until the Azure Monitor Sink configuration option is added as there are no supported “Metric alerts for logs” metrics. Once this occurs we will reevaluate based on latency requirements, but for now Log Analytics with “Metric alerts for logs” meets our base requirements. For all non-VM resources (Azure SQL, App Service, etc.), diagnostic logs can be accessed for alerting by default via Azure Monitor Alerts.