Send alerts from systemd to Slack - N | Robots Science Gamedev

Published: October 14, 2021Updated: July 31, 2022

Table of contents

Preface
Install logagent
Configure logagent
Send systemd logs to logagent
Mockup service
Fixing bugs in logagent
Add filters
Send logs to Slack
Results

I found logagent from a google search. It should solve a very simple thing: get logs from systemd, focus on one service, format a message and send to Slack with webhooks. After six months I have found some time to make another effort and make it finally work.

Preface

For example you have a very small software company. You are using only free services and you do not have resources to write custom solutions.

When you work with real users, anything can happen. You have a system that requires constant monitoring. Normally one will setup alerts, but you do not know what alerts you need.

Then you decide to keep an eye on it, set a simple service redirecting important log messages into Slack, that way your managers can spot a problem and later formulate what alerts they need.

Install logagent

First thing first, get the code

git clone https://github.com/sematext/logagent-js.git
git rev-parse HEAD
af1661efcb69812bc495524cbc07a1c55e0767ec

Commit af1661efcb69812bc495524cbc07a1c55e0767ec is the place of my investigation. Also

node --version
v16.0.0
npm --version
7.11.2

install modules from npm

npm install

Here is my first error.

Flag --force makes it swallow the pill

npm install --force

Let's ignore deprecated warnings, we will fix it later. Now it's time to check the "binary":

node ./bin/logagent -h

Great! After this we can continue with the official tutorial on how to forward systemd logs to sematext (output from sematext will be configured in the second stage).

The howto has good explanations that I'm not going to repeat. I will only list here the steps you need to make everything work.

Configure logagent

Create example config file

node ./bin/logagent --writeConfig logagent.yml

It is huge! It has 1085 lines. Okay, just make a new one, we keep it simple:

input:
   journal-upload:
    module: input-journald-upload
    port: 9090
    worker: 0
    systemdUnitFilter: 
      include: !!js/regexp /.*/i
    removeFields:
      - __CURSOR
      - _TRANSPORT
      - _BOOT_ID
      - _UID
      - _GID
      - _CAP_EFFECTIVE
      - _MACHINE_ID
      - _HOSTNAME
      - _SYSTEMD_SLICE
      - _STREAM_ID
      - _PID
      - _COMM
      - _EXE
      - _CMDLINE
      - _SYSTEMD_CGROUP
      - _SYSTEMD_INVOCATION_ID
      - SYSLOG_FACILITY
      - PRIORITY
      - SYSLOG_IDENTIFIER

output: 
  stdout: yaml

More about configuration options of input-journald-upload in input plugin spec: https://sematext.com/docs/logagent/input-plugin-journald-upload/

Run the logagent. This will start a web server on our localhost on port 9090 waiting for journal info to come. Keep this terminal open.

node ./bin/logagent --config logagent.yml

Send systemd logs to logagent

There is no such package as systemd-journal-remote on Arch. This thing is already in the system:

pacman -Qo /usr/lib/systemd/system/systemd-journal-upload.service
/usr/lib/systemd/system/systemd-journal-upload.service is owned by systemd 247.6-1.0

So just add this URL URL=http://127.0.0.1:9090/00000000-0000-0000-0000-000000000000 to /etc/systemd/journal-upload.conf

sudo vim /etc/systemd/journal-upload.conf

Be carefull with instructions in the howto, their URL has a comma... and missing a token (which is some UUID, but we do only local work, so no need to create free account, because any UUID should be fine - I use zeroes). I figured that out from the source code (somewhere in these lines when I searched for "invalid logs token in url" message).

To edit systemd services I will use the righteous way of doing it. This way edits will survive Noah's Flood and the system update/upgrade.

sudo systemctl edit systemd-journal-upload

Add this content

[Unit]
StartLimitIntervalSec=5

[Service]
WatchdogSec=0
Restart=always
TimeoutStartSec=1
TimeoutStopSec=1
StartLimitBurst=1000

Settings from the main file /usr/lib/systemd/system/systemd-journal-upload.service are used, except those that we just have overwritten.

Be aware of "recent" systemd updates not reflected in the howto:

StartLimitInterval was moved from [Service] to [Unit] section in 6bf0f408e4833152197fb38fb10a9989c89f3a59, but the old location was still accepted for compatibility. The new StartLimitIntervalSec name is valid only in [Unit].

Reference: https://lists.freedesktop.org/archives/systemd-devel/2017-July/039255.html

Fire it up.

sudo systemctl enable systemd-journal-upload.service
sudo systemctl start systemd-journal-upload.service

At this point your terminal with logagent should start spitting out all logs from systemd into the air. Stop it. Ctrl+C.

Now it is time to tame the output with filters.

Mockup service

As I said earlier, I only want to receive logs from one specific service. Like this simple Python service: https://gist.github.com/mikolasan/917455a42152eeab24a0bb3fcb549647.

sudo mkdir /opt/my-python-project
sudo chown $USER:$USER -R /opt/my-python-project
vim /opt/my-python-project/main.py
chmod +x /opt/my-python-project/main.py

sudo vim /etc/systemd/system/my-python-project.service
sudo systemctl start my-python-project

Fixing bugs in logagent

Instead of include: !!js/regexp /.*/i, should be one specific service, right? Like include: !!js/regexp /my-python-project/i.

I actually see many many audit messages. They are not from my service. The thing about these messages - they do not have _SYSTEMD_UNIT attribute, hence it cannot be sorted out by any include/exclude logic.

Next, if you are curious, I will give you a hint where the bug is in journald-upload.js:

const _SYSTEMD_UNIT = '_systemd_unit'
console.log(log[_SYSTEMD_UNIT])
console.log(log._SYSTEMD_UNIT)

I hope you already have that gut feeling.

And how about unused function parseLine? I have started "blaming". And first thing I see:

+  // fastest loop is counting down
+  for (let i = lines.length; i >= 0 ; --i) {

fastest loop is counting down

In JavaScript. I laughed.

It was true on assembler level for Intel 8086. But current processors have parallel conveyers. I doubt that it is true in simple loops for modern compilers with -O3.

At this point you may need to wait until all prevoius messages will be ignored. Also I did restart the systemd-journal-upload several times, I do not know if that helped. Also try to make a break, get a cup of tea and it will work eventually. That's my boss' favourite word: eventually.

Add filters

This part must be really straight forward. Just look at the doc. But no. The journald-upload input plugin is not producing "raw input". It produces objects. Which means instead of input filters we must use otput filters. It is perfectly explained in ... comments in the source code.

DATA_RAW events are emitted by input-plugins, producing text lines, and must be handled by text based input-filters and parser.

DATA_OBJECT events are emitted by input plugins, producing structured data, with no need to be parsed. Such data needs to be processed by output-filters and output plugins. Skipping text based input filters and parser, and continue with directly output filters improves performance by saving serialisation to JSON and back to JS objects.

input:
   journal-upload:
    module: input-journald-upload
    port: 9090
    worker: 0
    systemdUnitFilter: 
      include: !!js/regexp /my-python-project/i
    removeFields:
      - __CURSOR
      - _TRANSPORT
      - _BOOT_ID
      - _UID
      - _GID
      - _CAP_EFFECTIVE
      - _MACHINE_ID
      - _HOSTNAME
      - _SYSTEMD_SLICE
      - _STREAM_ID
      - _PID
      - _COMM
      - _EXE
      - _CMDLINE
      - _SYSTEMD_CGROUP
      - _SYSTEMD_INVOCATION_ID
      - SYSLOG_FACILITY
      - PRIORITY
      - SYSLOG_IDENTIFIER

outputFilter:
  remove-fields:
    module: remove-fields
    matchSource: !!js/regexp .*
    fields:
      - _systemd_unit
  dropEvents:
    module: drop-events
    debug: false
    filters:
      message: 
        exclude: !!js/regexp /bad/

output: 
  stdout: yaml

When we add new drop-events filter, one can notice that developers like to debug with console.log and usually forget to remove it from production. I have a fix for that as well.

Send logs to Slack

This part must be really straight forward. Just look at the doc. But no. Slack has new API for their fancy Block Kit framework.

For the sake of a good example I will show how to print a timestamp from logs. It is not that easy as you think.

Time from journald-upload is received in microseconds, it is not a Unix timestamp, but Slack only understands Unix timestamps. Somewhere in logagent we have to divide our value by 1000. So how to transform values in logagent? It doesn't seem hard to add a new plugin, but I will add transform function to slack-webhook.js and define it in logagent.yml.