Author Archives: Bryan Duxbury

Thrift 0.4.0 Released

During the time it took for us to get everything sorted out for the 0.3.0 release, we accumulated more than enough changes to justify 0.4.0. There are a ton more changes in this release. (You can find the full summary here.) In addition to the usual bug fixes and performance improvements, there are two really [...]

Posted in Miscellaneous | Leave a comment

Cassandra Summit

I had the pleasure of attending Riptano‘s very-well-executed Cassandra Summit on Tuesday. Having always been interested in big databases, not to mention my stint as a committer on HBase, I’ve long been roughly aware that Cassandra exists, but never managed to learn about it very deeply. As such, the Summit provided some really interesting opportunities [...]

Posted in Miscellaneous | Leave a comment

Reading files quickly in Java

I came across a really interesting, well-done blog post today about the quickest way to do high-performance file IO in Java. It does a really good job of breaking down the alternatives of how to get bytes into memory, covering both traditional and NIO options in a good amount of detail. It’s a must-read for [...]

Posted in Miscellaneous | Tagged , , | Leave a comment

Thrift 0.3.0 Released

After seven separate release candidates, Thrift 0.3.0 is finally released! This version includes many, many fixes over Thrift 0.2 in areas of stability, features, and performance. If you’ve been holding off on upgrading, then now is the perfect opportunity. You can find the distribution here.

Posted in Miscellaneous | Tagged | Leave a comment

Fully async Thrift client in Java

Thrift has had an asynchronous server implementation for Java for quite some time, but users have been asking for a way to have an asynchronous client since the very beginning. The motivation behind this style of client is usually performance. Imagine you take a bunch of time and make a highly optimized web application that [...]

Posted in Thrift | Tagged | 8 Comments

Avoiding Java varargs snafus

Since Java 1.5, Java has allowed you to take advantage of “varargs“, a usability feature that many other languages support. It lets you write really clean code and support some pretty cool use cases. However, there is at least one possible pitfall of using varargs. Consider the method below: public boolean filter() { (do some [...]

Posted in Miscellaneous | Tagged | 1 Comment

Parallelized bloom filter creation with Map/Reduce

As we’ve mentioned in the past, bloom filters are an important part of our workflow. They allow us to quickly skip a large portion of the records that we’re not interested in, thinning out the amount of data that has to be CoGrouped in our Cascading flows. Up until recently, we’ve just been creating our [...]

Posted in Miscellaneous | 2 Comments

Faster string to UTF-8 encoding in Java

Update: It turns out that after further investigation, the performance improvements didn’t hold up when some uncovered correctness bugs forced some code changes. The patch was rolled back, so we’re stuck with the same old encoding mechanism. Sigh. I’ve spent a lot of time profiling Thrift serialization and deserialization, and one thing that has always [...]

Posted in Thrift | 3 Comments

Thrift and pseduo-RDF schemas

BackType‘s Nathan Marz recently wrote a really great post about using Thrift and an RDF-like schema to get type-safe, extensible, high-performance schemas for use in Hadoop environments. He really hit the nail on the head describing the use pattern and the positives and negatives. A variation on this approach is something we’ve been doing at [...]

Posted in Hadoop, Thrift | 1 Comment

Accelerate your test suite with Cascading 1.1

One big downside of using Cascading for our applications has been the runtime of our regression test suite. We test with quantities of data nowhere near our regular production volume, but we still end up running lots of jobs. In our experience, this ends up making our tests take a long time (in the tens [...]

Posted in Cascading | Tagged , | 1 Comment
  • Rapleaf Is Hiring!

    We are looking for engineers who want to solve challenging problems.

    We have great people, do great work, and have great perks.

    Know someone who might be interested? Refer a friend and get $5,000 for successful hires.

    See our current openings at
    www.rapleaf.com/careers