Author Archives: Bryan Duxbury

Dealing with skewed key sizes in Cascading

Rapleaf indexes data from a wide variety of sources and across all different sorts of people. As a result, some of the people we analyze end up having a lot more data about them stored in our systems than others. For instance, your average person has around a hundred friends, whereas Ashton Kutcher’s Twitter account [...]

Posted in Cascading, MapReduce | 2 Comments

The Wrath of DrWho, or Unpredictable Hadoop Memory Usage

A while back we encountered a really peculiar problem with one of our Hadoop apps. The app did a bunch of HDFS operations before launching a small Map/Reduce job and going on to do a bunch of memory intensive operations. It would run very happily for a few days to a week at a time, [...]

Posted in Hadoop, HDFS | 7 Comments

Thrift Unions Part II, or How I Reduced Memory Usage by 95%

In a previous post, we discussed the Thrift Union pattern of struct definition. To quickly summarize, the benefits are simplicity, flexibility, and low disk usage, but the downside (at least in Java) is high memory usage. Well, the downside is soon to be a downside no more. Over on THRIFT-409, I have been working on [...]

Posted in Hadoop, Thrift | 1 Comment

Thrift Union Pattern

At Rapleaf, we use Thrift structs as the basic cornerstone of many of our processes. I won’t go into great detail talking about Thrift in general, but suffice it to say that Thrift is easy and flexible enough to be used as the primary means of storing data and communicating between our various components. One [...]

Posted in Miscellaneous, Thrift | 1 Comment

Graceful shutdown, Hadoop, and black magic

Recently, while working on the Collector, I noticed that we had an issue with graceful shutdown of our servers. The Collector uses a JVM shutdown hook to catch the SIGTERM and take some cleanup actions before allowing the exit to go on. However, every time I would try to gracefully shut down a server, I’d [...]

Posted in Hadoop | 2 Comments

Rent or Own: Amazon EC2 vs. Colocation Comparison for Hadoop Clusters

For some time now, Rapleaf has been hard at work converting a critical portion of our infrastructure from a MySQL-based system to a Hadoop-based one. We see it as a much more obvious path to linear scalability of our processing pipeline. Since scalability is our goal, a technology that has obviously found its way into [...]

Posted in Hadoop | 28 Comments

Hadoop Meetup Presentation Videos

Here are links to the videos of all the presentations we had last week. The Collector – Multi-Writer Appends into HDFS by Bryan Duxbury, Software Engineer at Rapleaf http://www.vimeo.com/2084824 Katta – Distributed Lucene Index in Production by Stefan Groshupf, Founder/CTO at 101tec Inc. and Co-Founder at Scale Unlimited Inc. http://www.vimeo.com/2085140 Debugging and Tuning Map-Reduce Applications [...]

Posted in Hadoop | Leave a comment

The Collector

Last night at the Rapleaf-hosted Hadoop meetup, I talked about a project we’ve created here at Rapleaf called the Collector. Basically, Rapleaf is starting to Hadoopify our workflow, and like a lot of people out there, we’ve found the need to manage many processes writing to HDFS so that our data can be processed by [...]

Posted in Hadoop | 4 Comments

Thrift TextMate Bundle

Rapleaf intern Kevin Ballard has been hard at work this summer making Thrift more pleasant to work with for those of us who are Ruby-inclined. Between sweeping API refactoring and calls to deprecate!, he’s found the time to create a TextMate bundle for the Thrift IDL! It supports syntax highlighting and tab-completion snippets for a [...]

Posted in Thrift | Leave a comment

HBase Interview on InfoQ

Jim Kellerman, Michael Stack, and I recently responded to an email interview about HBase and related topics. You can find the result here.

Posted in HBase | Leave a comment
  • Rapleaf Is Hiring!

    We are looking for engineers who want to solve challenging problems.

    We have great people, do great work, and have great perks.

    Know someone who might be interested? Refer a friend and get $5,000 for successful hires.

    See our current openings at
    www.rapleaf.com/careers