PyCassa vs Lazyboy (updated)

Update

As Hans points out in the comment below, it appears pycassa natively supports authentication with org.apache.cassandra.auth.SimpleAuthenticator. Lazyboy on the other hand doesn’t by default.

It’s not too hard to do it though. Intuitively, we could do something like this.

NB: Untested code!! I might create a patch for this when I get the time, so this is just an outline.

# Add this to lazyboy's connection package
from cassandra.ttypes import AuthenticationRequest

And in lazyboy’s _connect() function, add another parameter called logins, that is a dict of keyspaces and credentials which looks like the following.

# logins format
{'Keyspace1' : {'username':'myuser', 'password':'mypass'}}

def _connect(self, logins):
"""Connect to Cassandra if not connected."""

    client = self._get_server()
    if client.transport.isOpen() and self._recycle:
        if (client.connect_time + self._recycle) > time.time():
            return client
        else:
            client.transport.close()
    
    elif client.transport.isOpen():
        return client
    
    try:
        client.transport.open()
        # Login code 
        # Remember that client is an instance of Cassandra.Client(protocol)
        if logins is not None:
            for keyspace, credentials in logins.iteritems():
                request = AuthenticationRequest(credentials=credentials)
            client.login(keyspace, request)
    
        client.connect_time = time.time()
    except thrift.transport.TTransport.TTransportException, ex:
        client.transport.close()
        raise exc.ErrorThriftMessage(
            ex.message, self._servers[self._current_server])

Original Post
I’ve been looking to answer which Python library is currently more fully featured to use to communicate with Cassandra.

From Reddit,

API-wise, both look like they are pretty much basic wrappers around the Cassandra Thrift bindings. I’d prefer lazyboy over pycassa though, given that firstly, it’s being used in production right now at Digg, and because it looks like lazyboy’s connection code is more featured than pycassa.

and

The connection code (Lazyboy) seems to be much more suited for use in production (use of auto pooling, auto load balancing, integrated failover/retry, etc.) (than PyCassa)

Thanks to GitHub, I was able to do some analysis of their traffic and commits,

Traffic Data


LazyBoy


Pycassa

Commit Data


LazyBoy


Pycassa

A larger number of people know about LazyBoy but code commits on it are currently on a stand still. Pycassa on the other hand seems to be growing at a pretty fast rate.

It looks like LazyBoy is probably a better library to start with, for now. I’ll talk about my experiences with both in another post.