Using Checkmk local checks to monitor the status of custom scripts
Check_mk is an open source monitoring tool that allows you to monitor many aspects of your IT infrastructure, including servers, networks, services and a whole lot more. It is ridiculously easy to get set up using Check_mk, and deploying agents on each system takes just a few minutes.
By default, Checkmk will discover a host of services on each system it is deployed on - and monitor these for you (storage usage, memory, CPU utilisation etc). This monitoring can be configured as you desire, with upstream integrations into alerting systems such as Opsgenie, SMS, Email and through webhooks with the likes of Slack channels.
However, it also has the ability to integrate with other services through the existing plugin library or by querying custom built ‘local checks’ deployed on each server. These local checks can be written in whatever language you want, so long as they can be executed on the target host they are going to be deployed on. All Check_mk needs to see is the output from the local check in a format it can interpret.
As an example, I have created a local check on one of my VMs that pushes out tweets to the @COVID19DataIE twitter account. The purpose of the check is to query whether or not a tweet has been sent that day, along with some basic tests to determine:
The output from each of these tests varies, and depending on what the status returned by the local check is - this corresponds to a criticality rating in the Check_mk monitoring dashboard (and thus alerts via Slack).
An example of an alert that would appear as OK in Check_mk for one of the tests above might be:
And another WARN alert for when the service may be affected, and might require troubleshooting.
Check_mk can be configured with several services to provide notifications for alerts that may appear on the monitoring dashboard. In my case, I have configured a simple webhook on Slack and provided this to Check_mk, which natively supports this. The two alerts highlighted above can be seen posted below in a Slack channel.
The diagram below outlines a high level view of how the custom local check I wrote determines which status to return. This is obviously based on my needs for this specific purpose, but the example is simple enough to give you an idea of how you could create your own.