Decoding binary data in logstash

‹ Downloading Ansible galaxy roles in parallel | Against Empathy ›

By default, logstash stores the "message" field as a unicode string (in this example, we avoid including a '\x0a' byte to keep the pipeline configuration shorter):

% python -c "print(''.join(chr(i) for i in range(11, 256)))" | logstash -e 'input { stdin { codec => plain { charset => "ISO-8859-1" } } } output { stdout { codec => rubydebug { metadata => true } } }'
{
    "@timestamp" => 2017-03-11T23:28:10.859Z,
      "@version" => "1",
          "host" => "48de04e8584c",
       "message" => "\v\f\r\u000E\u000F\u0010\u0011...øùúûüýþÿ"
}

It's possible to use the ruby filter to convert this to a binary string

python -c "print(''.join(chr(i) for i in range(11, 256)))" | logstash -e "input { stdin { codec => plain { charset => 'ISO-8859-1' } } } output { stdout { codec => rubydebug { metadata => true } } }     filter { ruby { code => \"event.set('binary', event.get('message').encode('iso-8859-1'))\" } }"
{
    "binary" => "\v\f\r\x0E\x0F\x10...\xF2\xF3\xF4\xF5\xF6\xF7\xF8\xF9\xFA\xFB\xFC\xFD\xFE\xFF"
}

However most (if not all) of the current plugins either call:

And so even though it is possible to get binary data into logstash, it doesn't seem to be possible to get it back out cleanly.

If your application isn't sensitive to the overhead of the extra bytes, you could use Base64 encoding to armor the binary before it passes through Event.java

python -c "print(''.join(chr(i) for i in range(11, 256)))" | logstash -e "input { stdin { codec => plain { charset => 'ISO-8859-1' } } } output { stdout { codec => rubydebug { metadata => true } } }     filter { ruby { code => \"require 'base64'; event.set('base64', Base64.encode64(event.get('message').encode('iso-8859-1')))\" } }"
{
    "base64" => "CwwNDg8QERITFBUWFxgZGh...x8vP09fb3+Pn6+/z9/v8=\n",
}

Subscribe to All Posts - Wesley Tanaka