TL:DR Monitor your apt-based packages with Prometheus.
Why do we need to monitor software packages?
When you are operating a larger amount of servers, VMs or containers, you need to keep the software on them up-to-date, at least for security reasons. You most probably won’t do this manually, but use helper systems like unattended upgrades on Debian/apt-based systems.
But, as always, something automated may or may not happen when not observed actively. Unattended upgrades may fail for example when user input is required.
Therefore, I built the “pkg-exporter”, which exports data about packages into Prometheus, which then may alert you if something failed.
Exporting upgradable and installed packages to Prometheus
For infrastructure monitoring, Prometheus with a node exporter per node is used often. With my setup, every node has a Prometheus node-exporter installed, managed by my SaltStack configuration management system.
Prometheus then checks collected metrics against a few rules, and, in combination with the Alertmanager, notifies me about any problems found.
Not only the number of upgradable packages may be exported from this exporter and then analyzed, but also other metrics like packages installed, autoremovable, broken or if the node requires a reboot.
The pkg-exporter textfile exporter
Getting Started
The pkg-exporter python package provides a script, which exports the metrics into a textfile formatted as Prometheus/open metrics format.
This textfile is meant to be collected by the Node Exporter. The textfile path must be configured in the node exporter using the --collector.textfile.directory
-Parameter. By default, the pkg-exporter writes into /var/prometheus/pkg-exporter.prom
, corresponding to the --collector.textfile.directory=/var/prometheus/
node-exporter argument. This path may be changed using the PKG_EXPORTER_FILE
-Environment-Variable.
The exporter is meant to be called every time the metrics shall be refreshed - at the moment, I use a 5-minute-interval in cron:
*/5 * * * * /usr/local/bin/pkg-exporter
A systemd timer may be also used as alternative to cron.
You may ask why the metrics are collected asynchronously and not like with many other exporters triggered by a Prometheus request?
I purposely designed it in this way, as the package metrics do not need to be as granular as other metrics (e.g. Interface Statistics), and a single collection run may runs relatively long (» 1s). Also, with the node exporter usually already available on systems of interest, you do not need another daemon running, with another target configuration, open port etc.
The pkg-exporter itself is supposed to be installed as a python package: pip3 install pkg-exporter
Also, python3-apt
needs to be installed to provide the necessary APIs: apt install python3-apt
Metrics Exported
As of writing of this blog post, the reboot_required
metric is exported, as well as the number of packages installed, upgradable, broken and removable, each split up per repository/source:
# HELP pkg_reboot_required Node Requires an Reboot
# TYPE pkg_reboot_required gauge
pkg_reboot_required 0.0
# HELP pkg_installed Installed packages per origin
# TYPE pkg_installed gauge
pkg_installed{archive="oldstable",component="main",label="Debian",origin="Debian",site="deb.debian.org",trusted="True"} 539.0
# HELP pkg_upgradable Upgradable packages per origin
# TYPE pkg_upgradable gauge
pkg_upgradable{archive="oldstable",component="main",label="Debian",origin="Debian",site="deb.debian.org",trusted="True"} 0.0
# HELP pkg_auto_removable Auto-removable packages per origin
# TYPE pkg_auto_removable gauge
pkg_auto_removable{archive="oldstable",component="main",label="Debian",origin="Debian",site="deb.debian.org",trusted="True"} 0.0
# HELP pkg_broken Broken packages per origin
# TYPE pkg_broken gauge
pkg_broken{archive="oldstable",component="main",label="Debian",origin="Debian",site="deb.debian.org",trusted="True"} 0.0
Timing
The exporter needs to be triggered externally in an appropriate interval. In my Setup the exporter is triggered every 5 minutes using cron:
*/5 * * * * /usr/local/bin/pkg-exporter
Alternatively, a systemd timer may be used.
When determining the interval time have in mind, that a more granular resolution of package data does not necessarily add value to the acquired data, but the generation of the metrics e.g. all 30s, introduces more CPU load on every system running the exporter. Therefore, an interval somewhere between 5 and 60 minutes seams reasonable.
The upgradable metric can obviously only be accurate if the package sources are refreshed, which is itself not handled by this exporter. This may be solved by running apt-get update
or unattended upgrade in a periodic interval. The success of refreshing the package sources is not (yet) monitored by the package exporter, but may be monitored in the future.
Internals
The pkg-exporter internally uses the prometheus-client python library which does the formatting and writing the metrics file. This would also make it possible to expand the pkg-exporter to a standalone exporter, without the need for the node exporter textfile-collector function.
Currently, the package is configured/shipped as python setuptools package. It may be also provided in other formats in the future, e.g. as dpkg package.
Also, it is possible to add support for other package managers, increasing the list of supported operating systems.
Alerting Rules
I have the following alerting rules defined for the pkg exporter:
groups:
- name: updates
rules:
- alert: RebootRequired
expr: pkg_reboot_required{job="node"} > 0
labels:
severity: high
annotations:
description: 'The instance requires an reboot'
summary: 'Instance {{ $labels.instance }} running an outdated kernel'
- alert: UpdatesPending
expr: min_over_time(pkg_upgradable[1d]) > 0
labels:
severity: info
annotations:
description: 'The instance has updates pending'
summary: 'Instance {{ $labels.instance }} has {{ $value }} updates pending'
- alert: UpdatesPendingCritical
expr: min_over_time(pkg_upgradable{label=~".*Security.*"}[4h]) > 0
labels:
severity: critical
annotations:
description: 'The instance has critical updates pending'
summary: 'Instance {{ $labels.instance }} has {{ $value }} critical updates pending'
In general, an alert is triggered when a reboot is necessary, or if packages are upgradable for longer than 24h (or 4h for critical upgrades). This time period are giving unattended upgrades enough time to handle upgrades before the alert is triggered and only triggers the alert if unattended has failed or a critical upgrade is necessary.
I also suggest watching the successful execution of the pkg-exporter itself by watching the latest refresh time of the exported textfiles:
name: NodeExporterTextfileStale
expr: time() - node_textfile_mtime_seconds >= 86400
labels:
severity: warning
annotations:
description: Node exporter textfile has gone stale.
summary: Instance {{ $labels.instance }}: Node exporter textfile {{ $labels.file }} has gone stale.
Outlook
Multiple additions to the exporter are thinkable. The design is open for supporting also other package managers than apt.
A gauge/counter exporting the time of the last (successful) package cache/source update would be very helpful for monitoring the successful operation of unattended upgrades or apt update
.
Also, improved shipping methods (e.g. a dpkg-package) are planned.
Feel free to open Issues or Pull Requests in the Repository!