Ruby and HBase

Lately, I’ve been investigating HBase, the Bigtable-like massively parallel database that runs on top of the Hadoop distributed file system, as an alternative to the traditional MySQL. We have some very high write throughput applications that just plain suffer under MySQL’s master-slave replication, since that model still requires the data to be written to the entire cluster eventually. In contrast, a write to HBase only requires a disk write to a number of machines equal to the replication factor – usually three – a significant improvement in large clusters.

With all the benefits of HBase, it’s nearly a no-brainer to try and make use of it in our applications. Aside from the maturity of the project occasionally being an issue, one big problem was the lack of an API that’s accessible from non-Java languages. HBase, and Hadoop in general, uses an internal Java-Java RPC framework that relies on too much internal Java magic to be reversed engineered for use by other languages.

So, after a bit of discussion with the current maintainers over at Powerset, Michael Stack and Jim Kellerman, I built a tiny Java REST servlet that maps resource URIs onto the HBase client API. It’s pretty short, only a few hundred lines of Java. (Many thanks to Stack for the framework he wrote for me.) The result is that in HBase trunk, you can start up the REST servlet external to the other running HBase servers and interact with your cluster from any language! Victory!

I’ve actually taken it one step further. I created a new gem called ruby-hbase that wraps the REST interface with a class structure for easier access. You get the HBase::HTable ruby object, which has a number of methods that mimic the HBase Java client API, but Rubyized, of course. The documentation is currently fairly lacking, but you can take a look at the specs to see some use cases. Here’s a teensy bit of sample code:

require 'rubygems'
require 'ruby-hbase'

# instantiate a table with its full URL
@table = Hbase::HTable.new("http://rs1:60050/api/test_table")
# put a row into HBase
@table.put("bryan_test", {"name:first" => "bryan"})
# get an entire row out of HBase
@table.get("bryan_test") # => {"name:first" => "bryan"}
# completely delete a row
@table.delete("bryan_test")

This is very much a beta version of the gem, as there are things that aren’t implemented and probably more than a few edge cases that haven’t been addressed. I welcome the input of others as we continue to evolve this gem. A further step I’d like to see in the future of Ruby and HBase integration is a very simple ActiveRecord-like base class so you can build your models against HBase instead of a SQL database. We’ll see how that one develops.

(I recently gave a brief talk at the HBase Meetup sponsored by Rapleaf and Powerset about this topic. Here are the slides.)

This entry was posted in HBase, Mysql, Ruby. Bookmark the permalink. Post a comment or leave a trackback: Trackback URL.

8 Comments

  1. Posted January 2, 2008 at 10:21 pm | Permalink

    Very cool. Thanks for taking that on. I’m looking at switching from MySQL to HBase for http://www.reintegrate.us, have you made any progress on adapting ActiveRecord::Base to (perhaps) ActiveRecord::HBase? I’d be interested in helping you out on that project or taking it on if you don’t feel you’ll have time for it.

  2. Posted January 3, 2008 at 11:00 pm | Permalink

    Thanks! Using the gem now. By the way, in your code example, it should be HBase instead of Hbase in “Hbase::HTable.new.”

  3. Posted January 4, 2008 at 11:26 am | Permalink

    @Peter: We’re definitely interested in a ORM-style wrapper on top of ruby-hbase, though it hasn’t become a priority for us just yet. Lately I’ve been more focused on contributing to HBase itself, trying to make it faster and more stable.

    If you start to put something together, by all means let me know.

  4. Posted January 4, 2008 at 5:47 pm | Permalink

    @bryan: That’s understandable. It looks like Quinn Slack and I will be tackling the Ruby ORM mapping, which should be very useful for a new backend for reintegrate.us, which is effectively a large feedfetching system (exactly what a system like HBase is good at).

    Would it be possible for us all to get access to the ruby-hbase source via SVN or CSV? I looked around a bit but couldn’t find any repository links.

  5. Paul
    Posted February 3, 2009 at 9:35 am | Permalink

    The last update to this thread was over a year ago. Since rhino has been released, but is FAR from usable IMHO. Care to give us a status update on the ORM project?

  6. Posted April 18, 2009 at 11:02 pm | Permalink

    This gem no longer works with latest habse server.

    For people who are need a working hbase ruby client, I recommend sishen’s hbase-ruby:
    http://github.com/sishen/hbase-ruby/tree/master

  7. Dude
    Posted April 23, 2009 at 5:09 am | Permalink

    Wont the servlet serve as a bottleneck during the high write environment?..though it can be scaled using suitable hardware and app server but that would mean another additional headache to take care of..
    Did you try jruby + hbase?

  8. Posted August 26, 2009 at 12:49 pm | Permalink

    I’m working on a project called BigRecord that’s attempting to bridge the gap between Hbase and RoR. You can find it on github: http://github.com/openplaces/bigrecord/tree/master

    Bigrecord is an OM (not really relational anymore) with a similar API and codebase to ActiveRecord. The changes that needed to be done on ActiveRecord in order to work with Hbase were non-trivial, so it wasn’t just a matter of creating a new adapter for AR.

    Bigrecord was being used internally at openplaces.org for nearly 2 years now, and when I joined the company, one of my tasks was to open source the tool. It’s lacking any useful documentation, but for the most part, it works along with a spec suite.

    Something worth warning about is the stack required to get it all working. We don’t use the thrift or rest interface of Hbase, but instead went with a JRuby/DRb approach. This is because we found that using JRuby/DRb to communicate natively via the Java API was the fastest method to communicate with Hbase. (Disclaimer: don’t ask me to provide any real data proving this… I don’t have it :P )

    Feel free to e-mail me greg.lu [at] gmail.com or find me on github: http://www.github.com/enell with any questions.

2 Trackbacks

  1. [...] the Bigtable-like distributed storage system from Apache Hadoop, had no Ruby API. That changed when Bryan Duxbury released ruby-hbase, an interface to the REST API he wrote for HBase. But there’s no Ruby ORM for HBase [...]

  2. By Engineering Rapleaf - Ruby-HBase ORM Already? on January 4, 2008 at 3:19 pm

    [...] only took a few weeks for someone to decide to start working on an ORM layer to go over the ruby-hbase gem. Here’s Quinn Slack’s announcment for Rhino. It looks pretty new, and I’m [...]

Post a Comment

Your email is never published nor shared. Required fields are marked *

*
*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>