Author Archives: Bryan Duxbury

More Compact Than CompactProtocol: TupleProtocol

Rapleaf makes extensive use of Thrift‘s CompactProtocol to save space for long-term data storage and for communicating between services. However, this summer, star Rapleaf intern Armaan Sarkar took us to a new level of compact-ness with his work on the new TupleProtocol. While we are completely happy with the CompactProtocol for permanent data storage and [...]

Posted in Uncategorized | 2 Comments

Google Calendar + Arduino = The Roominator

Last week was our quarterly Hackleaf, and this time around a handful of us set out to solve a slightly more physical problem than we usually tackle: conference room abuse. We have nine conference rooms in our office these days, and even though we regularly use Google Calendar to schedule meetings, we still struggle with [...]

Posted in Uncategorized | 6 Comments

Java Performance: synchronized() vs Lock

Yesterday, I noticed that one of our systems was using a Lock where a plain old synchronized() block would suffice, and I thought to myself, does this matter? Since the Lock was already fulfilling the same role, the only real question was performance. My gut told me that there should be a performance difference between [...]

Posted in java | Tagged , | 7 Comments

Lightweight Trie

One of the most interesting things that we do at Rapleaf is use our existing data to deduce or infer new data. For example, a person’s name is often highly correlated with a specific gender. After doing lots and lots of regression, we usually end up with a simple HashMap loaded from a file that [...]

Posted in Uncategorized | 3 Comments

Bringing Ruby’s ActiveRecord to Java

Rapleaf started out using Ruby and Rails extensively to build out our systems. We loved the flexibility that it gave us to quickly put together a functional application. Tools like ActiveRecord are huge productivity boosters, saving us the trouble of hand-coding database interaction and letting us focus directly on our application. However, we evolved to [...]

Posted in Uncategorized | 15 Comments

Announcing Hank: A Fast, Open-Source, Batch-Updatable, Distributed Key-Value Store

We’re really excited to announce the open-source debut of a cool piece of Rapleaf’s internal infrastructure, a distributed database project we call Hank. Our use case is very particular: we have tons of data that needs to get processed, producing a lot of data points for individual people, which then need to be made randomly [...]

Posted in Uncategorized | 7 Comments

Memory-efficient sparse bitsets

A bitset is a data structure designed to store a vector of boolean values very compactly – one bit per value. In practice, they’re a really handy way to save memory. However, we had a situation in one of our extremely memory-intensive applications where a simple bitset wouldn’t cut it. We have over 2500 variables [...]

Posted in Miscellaneous | 2 Comments

Striving for zero copies with Thrift 0.5

“Zero copies” is a common optimization principle used in high-performance applications. The gist of the technique is to have the smallest number of byte array copies necessary for the server to perform its task. Byte array copies are one of those insidious time-wasters that are hard to understand or even detect until you start looking [...]

Posted in Miscellaneous | 3 Comments

Analyzing some interesting networks for Map/Reduce clusters

In a previous post, I described how Rapleaf had built a conceptual model for determining the average aggregate peak throughput (which we’ll call T) that a given network architecture could support. This post applies that model to a variety of network topologies you might consider for your cluster. Just as a brief refresher, T represents [...]

Posted in Miscellaneous | 2 Comments

Analyzing network load in Map/Reduce

Hadoop Map/Reduce can put a heavy toll on your network. Just how heavy, though, isn’t obvious. This is an especially important consideration when you are expanding your cluster. Rapleaf recently encountered this situation, and in the process we devised a neat theoretical model for analyzing how network topology affects Map/Reduce. When does Hadoop put the [...]

Posted in Hadoop | Tagged , | Leave a comment
  • Rapleaf Is Hiring!

    We are looking for engineers who want to solve challenging problems.

    We have great people, do great work, and have great perks.

    Know someone who might be interested? Refer a friend and get $5,000 for successful hires.

    See our current openings at
    www.rapleaf.com/careers