Author Archives: bryan

Analyzing some interesting networks for Map/Reduce clusters

In a previous post, I described how Rapleaf had built a conceptual model for determining the average aggregate peak throughput (which we’ll call T) that a given network architecture could support. This post applies that model to a variety of network topologies you might consider for your cluster. Just as a brief refresher, T represents the [...]
Posted in Miscellaneous | 2 Comments

Analyzing network load in Map/Reduce

Hadoop Map/Reduce can put a heavy toll on your network. Just how heavy, though, isn’t obvious. This is an especially important consideration when you are expanding your cluster. Rapleaf recently encountered this situation, and in the process we devised a neat theoretical model for analyzing how network topology affects Map/Reduce. When does Hadoop put the most [...]
Posted in Hadoop | Tagged , | Leave a comment

Thrift 0.4.0 Released

During the time it took for us to get everything sorted out for the 0.3.0 release, we accumulated more than enough changes to justify 0.4.0. There are a ton more changes in this release. (You can find the full summary here.) In addition to the usual bug fixes and performance improvements, there are two really [...]
Posted in Miscellaneous | Leave a comment

Cassandra Summit

I had the pleasure of attending Riptano’s very-well-executed Cassandra Summit on Tuesday. Having always been interested in big databases, not to mention my stint as a committer on HBase, I’ve long been roughly aware that Cassandra exists, but never managed to learn about it very deeply. As such, the Summit provided some really interesting opportunities [...]
Posted in Miscellaneous | Leave a comment

Reading files quickly in Java

I came across a really interesting, well-done blog post today about the quickest way to do high-performance file IO in Java. It does a really good job of breaking down the alternatives of how to get bytes into memory, covering both traditional and NIO options in a good amount of detail. It’s a must-read for [...]
Posted in Miscellaneous | Tagged , , | Leave a comment

Thrift 0.3.0 Released

After seven separate release candidates, Thrift 0.3.0 is finally released! This version includes many, many fixes over Thrift 0.2 in areas of stability, features, and performance. If you’ve been holding off on upgrading, then now is the perfect opportunity. You can find the distribution here.
Posted in Miscellaneous | Tagged | Leave a comment

Fully async Thrift client in Java

Thrift has had an asynchronous server implementation for Java for quite some time, but users have been asking for a way to have an asynchronous client since the very beginning. The motivation behind this style of client is usually performance. Imagine you take a bunch of time and make a highly optimized web application that makes [...]
Posted in Thrift | Tagged | 4 Comments

Avoiding Java varargs snafus

Since Java 1.5, Java has allowed you to take advantage of “varargs“, a usability feature that many other languages support. It lets you write really clean code and support some pretty cool use cases. However, there is at least one possible pitfall of using varargs. Consider the method below: public boolean filter() { (do some filtering) } filter(); Let’s say you [...]
Posted in Miscellaneous | Tagged | Leave a comment

Parallelized bloom filter creation with Map/Reduce

As we’ve mentioned in the past, bloom filters are an important part of our workflow. They allow us to quickly skip a large portion of the records that we’re not interested in, thinning out the amount of data that has to be CoGrouped in our Cascading flows. Up until recently, we’ve just been creating our bloom [...]
Posted in Miscellaneous | 2 Comments

Faster string to UTF-8 encoding in Java

Update: It turns out that after further investigation, the performance improvements didn’t hold up when some uncovered correctness bugs forced some code changes. The patch was rolled back, so we’re stuck with the same old encoding mechanism. Sigh. I’ve spent a lot of time profiling Thrift serialization and deserialization, and one thing that has always stood [...]
Posted in Thrift | 3 Comments