Wednesday, July 20, 2011

Zarkov is an event logger

Over the past few weeks I've been working on a service in Python that I'm calling, in the tradition of naming projects after characters in Flash Gordon, Zarkov. So what exactly is Zarkov? Well, Zarkov is many things (and may grow to more):

  • Zarkov is an event logger
  • Zarkov is a lightweight map-reduce framework
  • Zarkov is an aggregation service
  • Zarkov is a webservice

In the next few posts, I'll be going over each of the components of Zarkov and how they work together. Today, I'll focus on Zarkov as an event logger.



Technologies

So there are just a few prerequisite technologies you should know something about before working with Zarkov. I'll give a brief overview of these here.

  • ZeroMQ: ZeroMQ is used for Zarkov's wire and buffering protocol all over the place. Generally you'll use PUSH sockets to send data and events to Zarkov, and REQ sockets to talk to the Zarkov map-reduce router.
  • MongoDB: Zarkov uses MongoDB to store events and aggregates, so you should have a MongoDB server handy if you'll be doing anything with Zarkov. We also use Ming, an object-document mapper developed at SourceForge, to do most of our interfacing with MongoDB.
  • Gevent: Internally, Zarkov uses gevent's "green threads" to keep things nice and lightweight. If you're just using Zarkov, you probably don't need to know a lot about gevent, but if you start hacking on the source code, it's all over the place (as well as special ZeroMQ and Ming adapters for gevent). So it's probably good to have at least a passing familiarity.

Installation

In order to install Zarkov, you'll need to be able to install ZeroMQ and gevent, which probably means installing the zeromq and libevent development libs. In Ubuntu, I had to install zeromq2-1 from source (which isn't too tough):

$ wget http://download.zeromq.org/zeromq-2.1.7.tar.gz
$ tar xzf zeromq-2.1.7.tar.gz
$ cd zeromq-2.1.7
$ ./configure --prefix=/usr/local && make
$ sudo make install
$ # if you're on ubuntu, this next line will work 
$  sudo apt-get install libevent-dev
$ # otherwise you need to 
$ wget http://monkey.org/~provos/libevent-1.4.13-stable.tar.gz
$ tar xzf libevent-1.4.13-stable.tar.gz
$ cd libevent-1.4.13-stable
$ ./configure --prefix=/usr/local && make
$ sudo make install

Now you should be able to do a regular pip install to get everything else:

$ virtualenv zarkov
$ source zarkov/bin/activate
(zarkov) $ pip install Zarkov

Next, you should customize your development.yaml file. Here's a convenient example we use in testing:

bson_bind_address: tcp://0.0.0.0:6543 
json_bind_address: tcp://0.0.0.0:6544 
web_port: 8081 
backdoor: 127.0.0.1:6545 
mongo_uri: mongodb://localhost:27017 
mongo_database: zarkov 
verbose: true 
incremental: 0 
zmr: 
        req_uri: tcp://127.0.0.1:5555 
        req_bind: tcp://0.0.0.0:5555 
        worker_uri: tcp://0.0.0.0 
        local_workers: 2 
        job_root: /tmp/zmr 
        map_page_size: 250000000 
        map_job_size: 10000 
        outstanding_maps: 16 
        outstanding_reduces: 16 
        request_greenlets: 16 
        compress: 0 # compression level 
        src_port: 0 # choose a random port 
        sink_port: 0 # choose a random port 
        processes_per_worker: null # default == # of cpus 

Zarkov defines a format for an event stream which tries to be fairly generic (though our main use-case is logging SourceForge events for later aggregation). A Zarkov event is a BSON object containing the following data:

  • timestamp (datetime) : when did the event occur?
  • type (str): what is the type of event?
  • context (object): in what context did the event occur? On SourceForge, this includes the project context, the user logged in, the IP address, etc.
  • extra (whatever): this is purely up to the event generator. It might be a string, integer, object, array, whatever. (It should be supported by BSON, of course.)

The Zarkov events are stored in a MongoDB database (again with the Flash Gordon references). Assuming you've already installed Zarkov, to run the server you'd execute the following command::

(zarkov) $ zcmd -y development.yaml serve

Now to test, you can use the file zsend.py (included with Zarkov) to send a message to the server::

(zarkov) $ echo '{"type":"nop"}' | zsend.py tcp://localhost:6543

To confirm it got there, you can use the 'shell' subcommand from zcmd:

(zarkov) $ zcmd -y development.yaml shell

Then, in the shell you're given, execute the following commands:

In [1]: ZM.event.m.find().all() 
Out[1]: 
[{'_id': ObjectId('4e2723eeb240217416000001'), 
  'aggregates': [], 
  'context': {}, 
  'jobs': [], 
  'timestamp': datetime.datetime(2011, 7, 20, 18, 52, 30, 272000), 
  'type': u'nop'}] 

(Your _id value will probably be different). To actually use Zarkov as an event logger, you'll probably want to actually send the ZeroMQ messages yourself. Zarkov includes a client to do just that. From the zcmd shell:

In [1]: from zarkov import client 
In [2]: conn = client.ZarkovClient('tcp://localhost:6543') 
In [3]: conn.event('nop', {'sample_context_key': 'sample_context_val'}) 
In [4]: ZM.event.m.find().all() 
Out[4]: 
[{'_id': ObjectId('4e2723eeb240217416000001'), 
  'aggregates': [], 
  'context': {}, 
  'jobs': [], 
  'timestamp': datetime.datetime(2011, 7, 20, 18, 52, 30, 272000), 
  'type': u'nop'}, 
 {'_id': ObjectId('4e2725a8b240217483000001'), 
  'aggregates': [], 
  'context': {u'sample_context_key': u'sample_context_val'}, 
  'extra': None, 
  'jobs': [], 
  'timestamp': datetime.datetime(2011, 7, 20, 18, 59, 52, 756000), 
  'type': u'nop'}, 

If you want to customize things further, the ZarkovClient code is actualy quite
short:

'''Python client for zarkov.''' 
import zmq 
 
import bson 
 
class ZarkovClient(object): 
 
    def __init__(self, addr): 
        context = zmq.Context.instance() 
        self._sock = context.socket(zmq.PUSH) 
        self._sock.connect(addr) 
 
    def event(self, type, context, extra=None): 
        obj = dict( 
            type=type, context=context, extra=extra) 
        self._sock.send(bson.BSON.encode(obj)) 
 
    def event_noval(self, type, context, extra=None): 
        from zarkov import model 
        obj = model.event.make(dict( 
                type=type, 
                context=context, 
                extra=extra)) 
        obj['$command'] = 'event_noval' 
        self._sock.send(bson.BSON.encode(obj)) 
 
    def _command(self, cmd, **kw): 
        d = dict(kw) 
        d['$command'] = cmd 
        self._sock.send(bson.BSON.encode(d)) 

4 comments:

  1. "... a service in Python that I'm calling, in the tradition of naming projects after characters in Flash Gordon, Zarkov."

    I was not aware of this tradition. Can you elaborate?

    ReplyDelete
  2. It's just my own tradition. Since Mongo is the planet ruled by the emperor Ming the Merciless in Flash Gordon, I named my MongoDB library Ming. Zarkov is the scientist that helps Flash and Dale get to Mongo, so I figured I'd name an event logger that puts stuff into Mongo "Zarkov".

    ReplyDelete
  3. Install Zarkov's dependencies with Homebrew on a Mac:

    $ brew install zeromq
    $ brew install libevent

    After that you'll need to create a journal directory before starting up zarkov:

    (zarkov) $ mkdir journal
    (zarkov) $ zcmd -y development.yaml serve

    ReplyDelete
  4. A helpful reader pointed out that you can get into trouble installing ZeroMQ in /usr/local and trying to just 'pip install Zarkov'. Here's the solution I was able to come up with to get this to work. Just before the 'pip install Zarkov' command, you'll need to install pyzmq using the following command line:

    pip install pyzmq --install-option='--zmq=/usr/local'

    Hopefully that gets you around any installation hiccups you run into.

    ReplyDelete