How many posts have you seen about logging? And how many have your read about logging? Recently logging became cool again. Nowadays everybody talks about logstash, elasticsearch and kibana. It feels like everybody is playing with these tools. If you are not among the people playing around with it, than this is your blog post. I am going to help you get started with logstash, get familiar with the configuration and configuring the input as well as output. Than when you are familiar with the concepts and know how to play around with logstash, I move on to storing things in elasticsearch. There are some interesting steps to take there as well. When you have a way to put data in elasticsearch we move on to looking at the data. Before you can understand the power of Kibana, you have to create some queries on your own. I’ll help you there as well. In the end we will also have a look at Kibana.


Logstash

Logstash comes with a number of different components. You can run them all using the executable jar. But logstash is very pluggable, therefore you can also use other components to replace the internal logstash components. Logstash contains the following components:

  • Shipper – sends events to logstash
  • broker/indexer – sends events to an output, elasticsearch for instance
  • search/storage – provides search capabilities using an internal elasticsearch
  • Web interface – provided a guy using a version of kibana

Logstash is created using jRuby, so you need a jvm to run logstash. When you have the executable jar all you need to do is create a basic config file and you can start experimenting. The config file consists of three main parts:

  • Input – the way we receive messages or events
  • Filters – how we leave out or convert messages
  • Output – the way to send out messages

The next code block gives the most basic config, use the standard input from the terminal where you run logstash and output the messages to the same console.

input {
  stdin { }
}
output {
  stdout { }
}

Time to run logstash using this config:

java -jar logstash-1.2.2-flatjar.jar agent -v -f basic.conf

Then when typing Hello World!! we get (I did remove some debug info):

2013-11-08T21:58:13.178+0000 jettro.gridshore.nl Hello World!!

With the 1.2.2 release it is still annoying that you cannot just stop the agent using ctrl+c. I have to really kill it with the -9 option.

It is important to understand that the input and output contents are plugins. We can add other plugins to handle other input sources as well as plugins for outputting data. One is to output data to elasticsearch we will see later on.

Elasticsearch

I am not going to explain how to install elasticsearch. There are so many resources available online, especially on the elasticsearch website. So I tak it you have a running elasticsearch installation. Now we are going to update the logstash configuration to send all events to elasticsearch. The following code block shows the config for sending events as entered in the standard in to elasticsearch.

input {
  stdin { }
}
output {
  stdout { }

  elasticsearch {
    cluster => "logstash"
  }
}

Now when we have elasticsearch running with auto discovery enabled and a cluster name equal to logstash we can start logstash again. The output should resemble the following.

[~/javalibs/logstash-indexer]$ java -jar logstash-1.2.2-flatjar.jar agent -v -f basic.conf 
Pipeline started {:level=>:info}
New ElasticSearch output {:cluster=>"logstash", :host=>nil, :port=>"9300-9305", :embedded=>false, :level=>:info}
log4j, [2013-11-09T17:51:30.005]  INFO: org.elasticsearch.node: [Nuklo] version[0.90.3], pid[32519], build[5c38d60/2013-08-06T13:18:31Z]
log4j, [2013-11-09T17:51:30.006]  INFO: org.elasticsearch.node: [Nuklo] initializing ...
log4j, [2013-11-09T17:51:30.011]  INFO: org.elasticsearch.plugins: [Nuklo] loaded [], sites []
log4j, [2013-11-09T17:51:31.897]  INFO: org.elasticsearch.node: [Nuklo] initialized
log4j, [2013-11-09T17:51:31.897]  INFO: org.elasticsearch.node: [Nuklo] starting ...
log4j, [2013-11-09T17:51:31.987]  INFO: org.elasticsearch.transport: [Nuklo] bound_address {inet[/0:0:0:0:0:0:0:0:9301]}, publish_address {inet[/192.168.178.38:9301]}
log4j, [2013-11-09T17:51:35.052]  INFO: org.elasticsearch.cluster.service: [Nuklo] detected_master [jc-server][wB-3FWWEReyhWYRzWyZggw][inet[/192.168.178.38:9300]], added {[jc-server][wB-3FWWEReyhWYRzWyZggw][inet[/192.168.178.38:9300]],}, reason: zen-disco-receive(from master [[jc-server][wB-3FWWEReyhWYRzWyZggw][inet[/192.168.178.38:9300]]])
log4j, [2013-11-09T17:51:35.056]  INFO: org.elasticsearch.discovery: [Nuklo] logstash/5pzXIeDpQNuFqQasY6jFyw
log4j, [2013-11-09T17:51:35.056]  INFO: org.elasticsearch.node: [Nuklo] started

Now when typing Hello World !! in the terminal the following is logged to the standard out.

output received {:event=>#"Hello World !!", "@timestamp"=>"2013-11-09T16:55:19.180Z", "@version"=>"1", "host"=>"jettrocadiesmbp.fritz.box"}>, :level=>:info}
2013-11-09T16:55:19.180+0000 jettrocadiesmbp.fritz.box Hello World !!

This time however, the same thing is also send to elasticsearch. When we can check that by doing the following query to determine the index that is created.

curl -XGET "http://localhost:9200/_mapping?pretty"

The response than is the mapping. The index is the date of today, there is a type called logs and it has the fields that are also written out in the console.

{
  "logstash-2013.11.09" : {
    "logs" : {
      "properties" : {
        "@timestamp" : {
          "type" : "date",
          "format" : "dateOptionalTime"
        },
        "@version" : {
          "type" : "string"
        },
        "host" : {
          "type" : "string"
        },
        "message" : {
          "type" : "string"
        }
      }
    }
  }
}

Now that we know the name of the index, we can create a query to see if our message got through.

curl -XGET "http://localhost:9200/logstash-2013.11.09/logs/_search?q=message:hello&pretty"

The response for this query now is

{
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "failed" : 0
  },
  "hits" : {
    "total" : 1,
    "max_score" : 0.19178301,
    "hits" : [ {
      "_index" : "logstash-2013.11.09",
      "_type" : "logs",
      "_id" : "9l8N2tIZSuuLaQhGrLhg7A",
      "_score" : 0.19178301, "_source" : {"message":"Hello World !!","@timestamp":"2013-11-09T16:55:19.180Z","@version":"1","host":"jettrocadiesmbp.fritz.box"}
    } ]
  }
}

There you go, we can enter messages in the console where logstash is running and query elasticsearch to see the messages are actually in the system. Not sure if this is useful, but at least you have seen the steps. Next step is to have a look at our data using a tool called kibana.

Kibana

There are a multitude of ways to install kibana. Depending on your environment one is easier than the other. I like to install kibana as a plugin in elasticsearch on a development machine. So in the plugins folder create the folder kibana/_site and store all the content of the downloaded kibana tar in there. Now browse to http://localhost:9200/_plugin/kibana. In the first screen look for the logstash dashboard. When you open the dashboard it looks a bit different than mine, I made some changes to make it easier to present on the screen. Later on I will show how to create your own dashboard and panels. The following screen shows Kibana.

Screen Shot 2013 11 09 at 18 43 34

logstash also comes with an option to run kibana from the logstash executable. I personally prefer to have it as a separate install, that way you can always use the latest and greatest version.

Using tomcat access logs

This is all nice, but we are not implementing a system like this to enter a few messages, therefore we want to attach another input source to logstash. I am going to give an example with tomcat access logs. If you want to obtain access logs in tomcat you need to add a valve to the configured host in server.xml.

<Valve className="org.apache.catalina.valves.AccessLogValveDC" 
       directory="/Users/jcoenradie/temp/logs/" 
       prefix="localhost_access_log" 
       suffix=".txt" 
       renameOnRotate="true"
       pattern="%h %t %S &quot;%r&quot; %s %b" />

An example output from the logs than is, the table shows what the pattern we have means

0:0:0:0:0:0:0:1 [2013-11-10T16:28:00.580+0100] C054CED0D87023911CC07DB00B2F8F75 "GET /admin/partials/dashboard.html HTTP/1.1" 200 988
0:0:0:0:0:0:0:1 [2013-11-10T16:28:00.580+0100] C054CED0D87023911CC07DB00B2F8F75 "GET /admin/api/settings HTTP/1.1" 200 90
0:0:0:0:0:0:0:1 [2013-11-10T16:28:02.753+0100] C054CED0D87023911CC07DB00B2F8F75 "GET /admin/partials/users.html HTTP/1.1" 200 7160
0:0:0:0:0:0:0:1 [2013-11-10T16:28:02.753+0100] C054CED0D87023911CC07DB00B2F8F75 "GET /admin/api/users HTTP/1.1" 200 1332
h remote host
t timestamp
S session id
r first line of request
s http status code of response
b bytes send

If you want mote information about the logging options check the tomcat configuration.

First step is get the contents of this file into logstash. Therefore we have to make a change to add an input coming from a file.

input {
  stdin { }
  file {
    type => "tomcat-access"
    path => ["/Users/jcoenradie/temp/dpclogs/localhost_access_log.txt"]
  }
}
output {
  stdout { }

  elasticsearch {
    cluster => "logstash"
  }
}

The debug output now becomes.

output received {:event=>#"0:0:0:0:0:0:0:1 [2013-11-10T17:15:11.028+0100] 9394CB826328D32FEB5FE1F510FD8F22 \"GET /static/js/mediaOverview.js HTTP/1.1\" 304 -", "@timestamp"=>"2013-11-10T16:15:20.554Z", "@version"=>"1", "type"=>"tomcat-access", "host"=>"jettro-coenradies-macbook-pro.fritz.box", "path"=>"/Users/jcoenradie/temp/dpclogs/localhost_access_log.txt"}>, :level=>:info}

Now we have stuff in elasticsearch, but we have just one string, the message. We now we have more interesting data in the message. Let us move on to the following component in logstash, filtering.

Logstash filtering

You can use filters to enhance the received events. The following configuration shows how to extract client, timestamp, session id, method, uri path, uri param, protocol, status code and bytes. As you can see we use grok to match these fields from the input.

input {
  stdin { }
  file {
    type => "tomcat-access"
    path => ["/Users/jcoenradie/temp/dpclogs/localhost_access_log.txt"]
  }
}
filter {
  if [type] == "tomcat-access" {
    grok {
      match => ["message","%{IP:client} \[%{TIMESTAMP_ISO8601:timestamp}\] (%{WORD:session_id}|-) \"%{WORD:method} %{URIPATH:uri_path}(?:%{URIPARAM:uri_param})? %{DATA:protocol}\" %{NUMBER:code} (%{NUMBER:bytes}|-)"]
    }
  }
}
output {
  stdout { }

  elasticsearch {
    cluster => "logstash"
  }
}

Now compare the new output.

output received {:event=>#"0:0:0:0:0:0:0:1 [2013-11-10T17:46:19.000+0100] 9394CB826328D32FEB5FE1F510FD8F22 \"GET /static/img/delete.png HTTP/1.1\" 304 -", "@timestamp"=>"2013-11-10T16:46:22.112Z", "@version"=>"1", "type"=>"tomcat-access", "host"=>"jettro-coenradies-macbook-pro.fritz.box", "path"=>"/Users/jcoenradie/temp/dpclogs/localhost_access_log.txt", "client"=>"0:0:0:0:0:0:0:1", "timestamp"=>"2013-11-10T17:46:19.000+0100", "session_id"=>"9394CB826328D32FEB5FE1F510FD8F22", "method"=>"GET", "uri_path"=>"/static/img/delete.png", "protocol"=>"HTTP/1.1", "code"=>"304"}>, :level=>:info}

Now if we go back to kibana, we can see we have more fields. The message is now replace with the mentioned fields. So now we can easily filter on for instance session_id. The following image shows that we can select the new fields.

Screen Shot 2013 11 10 at 17 56 24

That is it for now, later on I’ll blog about more logstash options and creating dashboards with kibana.