Giix

elk

04/01/17 cakephp , elk-stack , logging , php # , , , , , ,

CakePHP 3 out-of-the-box logging to Logstash and Elasticsearch using Syslog

CakePHP 3 out-of-the-box logging to Logstash and Elasticsearch using Syslog

Logging, like testing is never a developers first priority… until you wasted enough time debugging errors in production (or any other environment) re-actively; User reports an error or malfunctioning, in our case CakePHP 3, application, ops guys getting asked for webserver and application logs, mailing you the logs, you looking at the logs and trying to reproduce it. Let’s be honest, that sucks and is no way of (developer) life. It doesn’t work when you have one application you maintain and it makes you professional (and personal) life a living hell if you have dozens of applications or application instances (e.g. containers).

So lets fix it. In this post I will walk you through my experiences using Syslog to ship CakePHP application logs to Elasticsearch via Logstash.

All code examples are located here.

Meet the team

The components used in this post.

The components used in this post.

First lets define the components we use to make it work. For the sake of simplicity I won’t go into operating systems, webservers, clusters, storage cetera, just the bare essentials. Examples and file locations will be Ubuntu based and for those wondering, this example was created on two Ubuntu 16.04 VM’s though it can work on or inside containers or other distributions just as easy.

As you can see in the figure on the side we have, in this example, two servers. Server X containing the CakePHP webapplication and syslog (default on e.g. Ubuntu) and a server Y containing Logstash and Elasticsearch.

Caveat: I am not covering the K in the famous ELK-stack, Kibana. Just to keep this post as small and focused as I can.

 

Configuring CakePHP 3

The first component to configure is CakePHP. The CakePHP cookbook is pretty straight-forward about using Syslog for CakePHP logging instead of the default FileLog engine.

So first apply the instructions from the cookbook with some extra configuration keys, add this block to your  config/bootstrap.php :

Below the keys that aren’t super obvious are explained:

'flag' => LOG_ODELAY | LOG_PERROR means PHP should delay opening the connection tot syslog until the first message is logged AND it should print log message also to standard error (nice when using containers for example). Take a look at the PHP docs for openlog.

'facility' => LOG_LOCAL7  uses local7 as a dedicated syslog facilty for this app so it doesn’t get mixed up with other (server) log messages.

This simple addition to your bootstrap.php makes all CakePHP logs be pushed tot syslog.

Tip: When having trouble configuring the steps above, just try to implement the example as stated in the cookbook (no flag, no facility) and you should see the CakePHP log messages among all other logs in  /var/log/syslog

Setting up syslog

Next up: configure syslog to ship the logmessages to Logstash && write them /var/log/my_app.log. To do so, add a syslog configuration to  /etc/rsyslog.d/ called  my_app.conf containing the following (adjust to your situation):

These two lines simply state: Send all incoming messages on local7 to host logstash.mydomain.com on port 5140 over TCP (@@, UDP is one @)

Restart syslog after you added the file ( $ sudo service rsyslog restart )

Your /var/log/my_app.log  file will now be filled with log messages looking like this:

 

Making Logstash ready to receive

Now we are going to configure Logstash to listen for syslog formatted messages on port 5140 and send them to Elasticsearch.

Extracting data from the messages will follow in the next chapter.

Ok, so lets create a new Logstash config, assuming a default install, create a file called my_app.conf  in  /etc/logstash/conf.d/ and add this content:

Lets talk about the three sections in this Logstash configuration file, first the input section; This defines the protocol (tcp) and port (5140) to listen on as you can see in the comments. The next phase of the Logstash pipeline is the filter, here we can manipulate the incoming event in many different ways, we will discuss this in the next chapter as it is a subject on its own. Finally there is the output section, here we can define multiple output plugins, in our case we use the elasticsearch plugin to send data to ES (and a default stdout debug plugin).

Grok it like it’s hot

So, now we receive every line of syslog as an event in Logstash and Logstash will send this line to elasticsearch. *Did he say every line?*. Yes I did, and yes that sucks, since (sys)logging is a single line event and CakePHP logs error and context like stack traces etc the log message as a single entity is logged as individual lines to syslog and thus every line as a single event to Logstash en Elasticsearch. In our example log (shown earlier) we would have 14 different total meaningless events in Elasticsearch since no line on its own means anything (ok.. except for the first line.. BUT I WANT IT ALL!).

Now I am not the only one with that issue so Logstash has just what we need: the multiline codec, codecs are ‘translators’ or ‘encoders / decoders’ we can use in the input section, the multiline codec will collapse multiline messages and merge them into a single event. The original goal of this codec was to allow joining of multiline messages from files into a single event and thats what er are going to use it for:

You can see we define a codec in the input section called multiline and configure it with three keys:

  1. pattern => "\[%{GREEDYDATA}\]" The pattern should match what you believe to be an indicator that the field is part of a multi-line event, in our case the  [Cake\Network\Exception\InternalErrorException] in our log message is the type of exception, present in every CakePHP exception message. We match it by using the  %{GREEDYDATA} grok pattern between [ and ].
  2. negate => "true" The negate can be true or false (defaults to false). If true, a message not matching the pattern will constitute a match of the multiline filter and the what will be applied. (vice-versa is also true)
  3. what => "previous" The what must be previous or next and indicates the relation to the multi-line event.

This simple codec will thus make sure the different syslog messages will again be merged into one string / event.

Now that we have our original message again (with a lot of extra syslog date, facility, etc. prefixing) we want to extract all the different interesting pieces of information we want to aggregate in elasticsearch. How we are going to do that?

Grok

Yes, grok. If I had to define grok without reading any descriptions of it I would define it as easy-to-use regex that non-developers can understand and apply <insert ops joke here>. And basically thats what is is, predefined (regex) statements, for example %{LOGLEVEL}  that matches the severity levels of log messages.

And you have to say,  %{LOGLEVEL} is a lot more understandable than the corresponding regex:  ([Aa]lert|ALERT|[Tt]race|TRACE|[Dd]ebug|DEBUG|[Nn]otice|NOTICE|[Ii]nfo|INFO|[Ww]arn?(?:ing)?|WARN?(?:ING)?|[Ee]rr?(?:or)?|ERR?(?:OR)?|[Cc]rit?(?:ical)?|CRIT?(?:ICAL)?|[Ff]atal|FATAL|[Ss]evere|SEVERE|EMERG(?:ENCY)?|[Ee]merg(?:ency)?)

Expend the filter with the following code to match the usefull information:

Its important to understand the grok match statement makes you define the order of in which the message is formatted, using  %{GREEDYDATA} as a wildcard for crap you dont need. Examine the match statement above in combination with the available grok patterns and the example log message and it should make sense.

In the example above the  %{LOGLEVEL:loglevel} syntax means: extract the log level and add it to an event property called ‘loglevel’.

You can play with grok on several websites, creating the match above I used http://grokdebug.herokuapp.com/.

Cleaning up

The final part of our journey is cleaning up the crap, and by that I mean stripping the  Jan 4 15:50:01 ubuntu MY_APP: error: from the stacktrace capture (final part of our grok->match).

For this we use the mutate filter to mostly strip of as many garbage as we can so we get a nice and clean stacktrace. For this we add the final part to our filter:

Result

With this final part the complete Logstash config wil look like this:

Restart Logstash to activate this config, generate errors and check Elasticsearch for the result.

Final words

First: There are many ways to get your (Cake)PHP logs to something like Logstash, this is only one of them. There are some reasons I picked this for our applications:

  • I wanted to keep my application independent of Logstash so using Syslog as middle man makes it in dependable of Logstash and its service availability/
  • I want both syslog / logstash logging as direct logging to stderr. The syslog options to send the logs to both a file (OPS likes this) and Logstash makes it I can log to three different places; file, stderr and logstash.
  • Using monolog is a serious and valid option you might want to consider, for the sake of simplicity I used a default CakePHP function instead.

Second and finaly, this is only one part of the feedback loop, using Kibana and e.g. the Elasticsearch watcher (alerting) functions to visualize / dashboard and react on log messages is a logical next step.

Thank you for reading.

0 likes no responses