vondrak's soapbox a tech blog

Writing a Tracing Framework, Part 1: Structured Events

Disclaimer: this blog and the minitrace gem are my personal works. They are not affiliated with my employer, Honeycomb. Views expressed are my own.

A while back, I wrote a talk about the path from logs to traces explaining how flat, human-readable log files can be transformed into rich, machine-parseable data structures. It’s an important conceptual overview, but leveraging traces requires the proper tooling. So I want to go one step further by building a tracing library from the ground up.

Specifically, we’ll look at the development of minitrace, a minimalist tracing framework written as a Ruby gem. The aim isn’t to be minimal in functionality, but in implementation. We’ll get to the same features found in “big” libraries like the Honeycomb Beeline or OpenTelemetry, but we’ll do it with less code and make it easier to understand. Don’t worry too much about knowing Ruby; the lessons are broadly applicable to most major object-oriented programming languages.

Events

As covered in the talk, the basic unit of data we work with is a structured event. At its core, an event is composed of fields that map arbitrary string names to arbitrarily complex values. Developers are probably most familiar with this structure in the form of JSON, but in principle we could serialize an event any number of ways: XML, Protobuf, URI query strings, whatever. As long as we can map names to values, we’re said to have “structure”. In contrast, log files are composed of lines that have no inherent structure, since a line is just a single arbitrary string—usually a message in a natural language like English.

The idea is that your running code will dynamically add fields to an ongoing event. The game of tracing, then, is to develop conventions around which fields we add to which events.

To represent events in Ruby, we’ll make a class named Minitrace::Event that maintains a good ol’ fashioned hash table. It’s Ruby, so types are dynamic, but it’s understood that the hash is from String (the field name) to Object (i.e., any type). While you could manipulate this hash of fields directly, it’s good practice to define a standard interface. So to start, we have the following class definition:

class Minitrace::Event
  attr_reader :fields

  def initialize
    @fields = {}
  end

  def add_field(field, value)
    @fields[field] = value
    self
  end

  def add_fields(fields)
    @fields.merge!(fields)
    self
  end
end

An event object is initialized with an empty hash table of fields, which are exposed using the #fields attribute:

event = Minitrace::Event.new
event.fields #=> {}

Fields are added one at a time by #add_field:

event.add_field("name", "value")
event.fields #=> { "name" => "value" }

The method returns the current Minitrace::Event instance so that you can easily chain calls:

event = Minitrace::Event.new
event.add_field("a", 1).add_field("b", 2)
event.fields #=> { "a" => 1, "b" => 2 }

You can also merge in another hash of name-value pairs in one shot with #add_fields:

event = Minitrace::Event.new
event.add_fields("a" => 1, "b" => 2)
event.fields #=> { "a" => 1, "b" => 2 }

Backends

Of course, if all we needed was a hash table, we would have little use for the Minitrace::Event class. The point of events is to fire them to some backend once they’re finished. The most useful sort of backend will process each event’s data and store it for later querying. Such backends include services like Honeycomb, LightStep, AWS X-Ray, and several others.

But the mechanics of firing events will differ between vendors, protocols, etc. For a generic tracing library, we don’t want to be tied down to anything specific. So we’ll define an abstract base class called Minitrace::Backend:

class Minitrace::Backend
  def process(event)
    raise NotImplementedError
  end
end

Its sole interface is the #process method, which takes in a Minitrace::Event instance and does something (anything!) with it. The base class doesn’t actually implement anything—it just raises an error. Concrete backends inherit from the base class and provide their own implementations of the #process method (which they have to do, or else they’ll inherit the error-raising version).

For now, I don’t want to worry about sending data to a service. However, we need a concrete backend, and it’ll be useful for unit tests to “spy” on the events we try to process. So let’s add a backend that just keeps an array of events it has already seen:

class Minitrace::Backends::Spy < Minitrace::Backend
  def processed
    @processed ||= []
  end

  def process(event)
    processed << event
  end
end

Here, the #processed method returns the value of an instance variable, which is initially an empty array (using ||= like this is a common Rubyism). The #process implementation just pushes the event onto the end of that array.

We’ll then configure the minitrace library with a global backend:

module Minitrace
  class << self
    attr_accessor :backend
  end
end

This defines the Minitrace.backend attribute, which we can assign to any instance of a Minitrace::Backend. Since we only have the one concrete class right now, we’d go with this:

Minitrace.backend = Minitrace::Backends::Spy.new

Finally, we can hook events up to the backend by way of a Minitrace::Event#fire method:

class Minitrace::Event
  def fire
    return if @fired
    Minitrace.backend.process(self)
    @fired = true
  end
end

This takes the current event instance and processes it with the global minitrace backend. For our spy backend, this just means the event gets pushed onto the #processed array:

event = Minitrace::Event.new
event.add_field("test", "event")
event.fire

Minitrace.backend.processed
#=> [#<Minitrace::Event:0x00007fe951969118 @fields={"test"=>"event"}, @fired=true>]

Also notice that there’s an extra guard in the #fire method to prevent double-firing. This keeps our implementation from accidentally processing the same event twice:

100.times { event.fire }
Minitrace.backend.processed.count #=> 1

If necessary, individual backends might incorporate their own retry logic to send the event data again. But this would be a specialized case handled outside the general event-firing interface.

Up Next

In the future, we’ll build more elaborate backends and get our events into a service like Honeycomb (with which I’m most familiar). But before we get to that, we have enough here to start fleshing out how to turn plain events into rich traces. In the next installment, we’ll devise a friendlier API for creating & manipulating events.

You can find the code from this post—including tests—in the following commits on GitHub:

  • a578df8 Add Minitrace::Event
  • 478dab8 Sketch out the structure for backends
  • 924c072 Fire events to backend