Introducing Metriks

I was very inspired by Coda Hale’s Metrics Metrics Everywhere talk at CodeConf 2011 and have spent a lot of time over the past year thinking about it. After seeing rack-statsd and how it kept important process stats in the proctitle, I wanted the same thing for the background tasks that run Papertrail.

I hadn’t been able to find a metrics library for ruby that provided the calculations I was looking for, so I decided to experiment with creating one myself.

The end result was an API for measurement that I’m really proud of. For example, to calculate how much work a process is doing, use a meter to measure how many times an event has happened by calling the mark() method on the meter:

def perform(job)
  Metriks.meter('tasks').mark
  # the work
end

The meter provides methods to give a 1, 5 and 15 minute average rate-per-second. The process title can be updated in with the rate returned by one_minute_average() in another thread:

meter = Metriks.meter('tasks')
loop do
  $0 = "worker: #{meter.one_minute_average.to_i} tasks/sec"
  sleep 5
end

…which will give you the output from ps ax that looks like this:

22665 ?        S     17:09 worker: 273 tasks/sec

Once you are tracking those metrics in-process it becomes very easy to start sending them to remote services in all sorts of ways.

The library is called Metriks, an experiment in creating a ruby metrics library with a simple interface and the ability to send the metrics to a number of services.

What sort of stuff are we tracking?

Error rates, database insertion times, cache hits vs misses, messages-per-second processed by workers.

Overview

The main components of the library are the metrics and reporters.

  • Metrics are responsible for doing a specific kind of measurement
  • Reporters are responsible for sending metrics to a specific destination

Today you can send to Graphite, Librato Metrics and a log file.

Here’s a quick review of what it’s like to use it and what the important pieces of the library are. It isn’t a huge amount of code, so please check it out.

If you have any questions or comments, feel free to say hi on twitter.

What does it look like to use it?

Using a meter to track web requests

To track the number of requests per second a rack app was doing but didn’t care about timing info, use a Metriks::Meter which can be created by calling Metriks.meter('name').

Here’s a very simple example of what it would look like to track the number of requests per second a rack app was doing.

class MetriksMiddleware < Rack::Middleware
  def initialize(app)
    @app = app
  end
  def call(env)
    Metriks.meter('rack.requests').mark
    @app.call(env)
  end
end

Sending that metric to Librato Metrics could get you a pretty graph looking like this:

Librato Metrics Meter

Using a timer to measure how long a method takes to run

To track how long it took to run a method, use a Metriks::Timer which can be created with Metriks.timer('name').

To time how long it took to run a fib() method, all it takes is:

def fib(n)
  n < 2 ? n : fib(n-1) + fib(n-2)
end

Metriks.timer('fib.time').time do
  puts fib(10)
end

Sending that metric to Librato Metrics could get you a pretty graph looking like this:

Librato Metrics Timer

Try it

To install the gem just add this to your Gemfile:

gem 'metriks'

The source is available on GitHub.

Metrics supported

  • Meter: used to measure the rate that something is happening (number of times per second a method is called).
  • Timer: used to measure how long it takes to perform something. Also contains a meter to track how many times it’s happening.
  • Counter: used to keep track of how many times something has happened since the process started. This is mostly used by the other metrics and isn’t often used directly.

Meter

Meters are used to keep track of a rate of an action (how many times per second it happens).

To mark when an action is performed:

Metriks.meter('calls').mark

Timer

Timers are used to keep track of how long it takes for an action to take. It also contains a Meter in it to track how often it happens.

To measure how long an action takes:

Metriks.timer('fib.duration').time do
  fib(10)
end

It’s also possible to use it without making it a block:

timer = Metriks.timer('fib.duration').time
fib(10)
timer.stop

Counter

Counters are used to keep track of an absolute number. They can be incremented and decremented. This metric is generally used as the basis for other metrics instead of being one that would be used directly.

To increment:

Metriks.counter('calls').increment

Reporters

Reporters take a Registry and report the metrics to a remote store.

For a detailed overview of the reporter API, it’s available on the wiki.

Metriks::Reporter::Graphite

Sends metrics to graphite on a set interval. It takes a host and port of the carbon agent as required arguments. Example:

Metriks::Reporter::Graphite.new('localhost', 3309).start

Metriks::Reporter::LibratoMetrics

Send metrics to Librato Metrics on a specified interval. It takes the API credentials as two required arguments: email and token. Example:

Metriks::Reporter::LibratoMetrics.new('user@metriks.local', '186dbe1cf215').start

Metriks::Reporter::Logger

Sends metrics to a logger on a specified interval. Example:

logger = Logger.new('log/metrics.log')
Metriks::Reporter::Logger.new(:logger => logger).start

The main reason behind this reporter was that I wanted to be able to aggregate the metrics collected by multiple processes on the same system before they were sent to Librato Metrics similarly to how StatsD does. To facilitate that, I created a Papertrail webhook receiver that took the logs, parsed the metrics, and submitted them to Librato Metrics. The work resulted in a Sinatra app I’ve posted on GitHub at metriks_log_webhook.

Metriks::Reporter::ProcTitle

Being inspired by rack-statsd I realized there are many metrics that are deep in my processes that would be very interesting to keep track of how my workers are running.

This reporter isn’t really like the others. It reports metrics by updating your proctitle so you can see select metrics when you run ps aux.

Because space in the process title is limited, it requires configuration to specify what metrics are reported.

Example usage:

reporter = Metriks::Reporter::ProcTitle.new
reporter.add 'reqs', 'sec' do
  Metriks.meter('rack.requests').one_minute_rate
end
reporter.start

It would allow you to see the metric when you run ps ax:

22665 ?        S     17:09 thin reqs: 273.3/sec

Implementation Details

Metriks is thread-safe. It uses a combination of mutexes and the atomic gem. Using atomic reduces the need for mutexes without sacrificing thread safety.

It uses the hitimes gem to get high granularity timing data without having to call Time.now.to_f frequently.

Metriks doesn’t tie the gathering of metrics to how they are reported. This allows for swapping out where metrics are reported to without having to change any of the instrumentation.

The metrics classes themselves are mostly ports of the metrics from Coda Hale’s Java metrics library.

Thanks

Thanks to Troy Davis, Joe Ruscio and Mathias Meyer for their help reviewing drafts of this post.

Posted Tuesday, March 6 2012 (∞).

written by Eric Lindvall

I also appear on the internet on GitHub and Twitter as @lindvall and work hard to make Papertrail awesome.

themed by Adam Lloyd.