Moving on from the question of which NoSQL database you should choose, after reading these excellent posts from Digg and Twitter, I recently asked a question on StackOverflow regarding the pros and cons of moving from MySQL to Cassandra.
Stackoverflow Question is here [http://stackoverflow.com/questions/2332113/switching-from-mysql-to-cassandra-pros-cons]
I got some excellent insight and feedback, primarily from Jonathan Ellis, one of the maintainers of Cassandra, and a systems architect at Rackspace.
He’s also written a post on the Rackspace blog today as a follow up on the question.
I wanted to highlight a great tip he mentions (via Ian Eure of Digg, and also the creator of a Python Cassandra lib called LazyBoy) that was mentioned at the latest PyCon ’10,
Ian Eure from Digg (also switching to Cassandra) gave a great rule of thumb last week at PyCon: “if you’re deploying memcache on top of your database, you’re inventing your own ad-hoc, difficult to maintain NoSQL database,” and you should seriously consider using something explicitly designed for that instead.
Also mentioned are a couple of general caveats in using NOSQL vs Relational databases,
The price of scaling is that Cassandra provides poor support for ad-hoc queries, emphasizing denormalization instead. For analytics, the upcoming 0.6 release (in beta now) offers Hadoop map/reduce integration, but for high volume, low-latency queries you will still need to design your app around denormalization.
Looks like the Cassandra 0.6 beta is coming out tomorrow, and can already be built from repositories in case anyone’s interested in doing so (and telling me about their experiences!).