17:53, June 12, 2012
Recently, I’ve been playing with message queue-based interprocess communication in Python. I’ve got an application idea which uses a client / worker-pool distributed processing concept, and wanted to test out a few of the options available.
Tested libraries/services include:
Yes, I know that ZooKeeper isn’t really an MQ, but it’s pretty easy to implement one with it.
0MQ is exactly what they say: “sockets on steroids”. There’s no 0MQ “daemon”, no extra processes, just sockets and a library to talk via them. This is both good and bad.
Without a daemon, it means that there’s no extra point of failure to rely upon, you get full control over your queues and how they’re used. But also, you don’t get clustering unless you do it yourself, you don’t get any administration tools unless you do them yourself, no access control unless you do it yourself, &c.
Working with 0MQ
It took me a while to figure out how many queues I needed for my concept, and how to route them. I designed separate client and worker processes, with a single broker to manage the worker pool. This setup was a bit complex, with client having an output to the broker’s input queue, and an input queue for responses from workers. The broker has a single input queue, with various outputs to each of the workers in the pool. The workers each had an input queue.
Actual code for 0MQ is reasonably simple. I ended up making a few objects to wrap the low-level stuff, but it’s not like 0MQ is overly verbose. There’s a lot of different queue options, all of which do things slightly differently, I ended up using PULL for the worker and client input, PUSH for client -> broker, and PULL into the broker and PUSH out of the broker.
Decent. A bit limited. Not very robust unless you want to do your own heartbeat stuff, it’s a little hard to tell that you’ve actually connected and sent a message and that message was definitely received. I probably won’t use it for this.
Celery is a task/job queue built on message passing, which uses one of a handful of MQ systems. I used RabbitMQ via amqp, because I had it installed already.
Working with Celery
Celery is a very high-level library, which uses decorators to indicate task procedures. You can do a lot with very minimal code.
I ended up stopping work on Celery, I just didn’t like how the system works. For one, I’m not really a fan of decorators which do a lot behind the scenes, because it’s not really obvious what is going on at a glance.
Also, there’s no clear way to have a callback when a task is complete, you can only check on the status of jobs from the caller. You can chain tasks, but the subtasks will run in the worker process, not the caller, which just isn’t going to be useful for me.
As I said, ZooKeeper isn’t really a MQ system, it’s more of a shared storage system for maintaining configuration, naming, synchronization, and the like, but it’s easy to make a message queue. You just make a node for the queue, and therein make a sub-node for each message, letting ZooKeeper name them sequentially. Your worker can process them in order.
ZooKeeper is designed from the start to be robust and distributed, which allows the application developer to not really worry about these things.
Working with ZooKeeper
The python library included in the ZooKeeper source is a bit low-level, so I wrote a basic wrapper library to provide object- oriented access to ZNodes, then built a “ZQueue” object on top of my ZNode class. I found that treating the ZooKeeper tree as a tree of objects works very well when building higher level applications.
My method of making a ZNode for the queue and then ZNodes for each message means ZooKeeper acts as the broker, and there’s no need to write a middle layer.
I like ZooKeeper.
It’s not really a MQ, so I probably won’t use it as such, but I’m glad I tried it out, I can think of lots of other uses for it.
Pika and RabbitMQ
Pika is a pure-Python implementation of AMQP 0.9.1, the standard MQ protocol used by various MQ brokers, including my choice here, RabbitMQ.
RabbitMQ is a robust message broker with a management interface and bindings for many languages.
Pika is reasonably flexible, not too low-level but not too high. RabbitMQ is quite full-featured, including a full management interface which lets you inspect queues, and a full access-control system which is entirely optional.
Working with Pika 0.9.5
The Pika library is a little lower level than say, Celery, but it’s still reasonably easy to work with. Doing blocking work is rather easy, but doing anything asynchronous or CPS is a little more complex, you have to define mutliple callbacks. I just created objects for both my client and worker.
With RabbitMQ acting as the broker, there’s no need to build a broker layer like we did with 0MQ.
Pika has lots of nice features. You can create named, permanent queues, temporary unnamed queues, full message exchanges, tag messages with types and filter on those types, require or discard ACKs, &c. I see lots of possibilites there.
RabbitMQ servers can be clustered to form a single logical broker, and you can mirror queues across machines in a cluster to provide high availability.
In fact, Pika/Rabbit seems to have far more features than I’ll actually need for the project I have in mind - which is good, really.
So far, as should be obvious, I’m going to use Pika and RabbitMQ. I’ve only been playing with the combo for a day, so expect more to come here.