I had the pleasure of attending Riptano‘s very-well-executed Cassandra Summit on Tuesday. Having always been interested in big databases, not to mention my stint as a committer on HBase, I’ve long been roughly aware that Cassandra exists, but never managed to learn about it very deeply. As such, the Summit provided some really interesting opportunities to learn about what’s going on in this apparently quite vibrant community.
Here are some of the highlights from my experience:
Jonathan Ellis‘s “state of the union” talk was interesting for a variety of reasons. The struggles they’re having with hinted handoff seems to be one of the classic symptoms of trying to build a project around someone else’s whitepaper – it’s good to hear that they’re starting to overcome the difficulties and actually get a good feature into play. I was also really pleased to see that we’d managed to take care of two out of three of Cassandra’s chief complaints about Thrift. (Cassandra’s near-switch to Avro has me nervous.) Not to mention the fact that I got a shout out for contributing Thrift performance enhancements (1, 2) that should give Cassandra servers a 20% boost. Always nice to be recognized.
Kelvin Kakugawa from Digg gave a really cool talk on how he’s implementing distributed counters on Cassandra. It was a very tech-heavy discussion with a lot of material to digest, but he clearly knows his stuff. One of the most interesting things about the talk is that it allowed me talk to Twitter’s Ryan King about how they’re using this technology already to handle up to 100,000 (!) counter updates per second for an as-yet unannounced application. It’s incredibly attractive to have a system with that kind of capacity available – will definitely be on my mind as a tool for future use.
Finally, self-identified troublemaker Cliff Moon gave a mildly controversial talk (to paraphrase one commenter, “I think that you’re wrong, and it’s because your premises are wrong.”) about his view on the failings of the Cassandra API, which happens to be implemented in Thrift. He had some fairly insightful things to say about making sure that the API doesn’t “hide the capabilities” of the underlying service, which I think is a great point to make. He also drew some contrast between Cassandra’s API and HBase’s, which is made up of the native Java API and cross-platform (REST, Thrift) gateway components. I thought this was a pretty unique take on the situation, especially considering that, as the original author of the REST gateway for HBase, the reason I had to go that route was due to the lack of a sane cross-language RPC layer. The conversation sparked was good, and showed that people have pretty strong feelings about the whole API situation.
The most useful part of this latter talk, though, was the discussion that I managed to trigger about Cassandra’s use of Thrift. It turns out that Cassandra’s users and developers are fairly frustrated with Thrift’s community. They feel turned off by the tepid and occasionally combative responses to bug reports and patches, and rightly pointed out that some of the libraries (PHP in particular) are fairly poorly supported. I think that there is truth to some of these criticisms. Thrift is currently treading water – we’re struggling with disinterested user and developer communities as well as trying to work our way through the bureaucratic Apache process and out of the Incubator.
Despite the somewhat critical tone, I think I managed to convey that the key thing Thrift needs is more people coming in and helping out. One person I talked to, Thomas Gideon, asked a really simple question that I realize hadn’t ever answered: “How can people help Thrift?” The answer to that question is, at the very least, open a ticket. Better would be submitting a patch with a fix, or even just submitting a patch with a test that exercises the bug you’ve found. I’d actually even be really happy with you applying, testing, or reviewing patches that others have submitted so that I know what the community thinks should be committed. (I would say that you can help out with the documentation on the wiki too, but I don’t want too sound greedy.)
All in all, I thought that the Summit was a great experience and I gained a lot. Great work to all the organizers and presenters!
Follow Bryan on Twitter: @bryanduxbury







