Tag Archives: Hadoop

Command-line auto completion for Hadoop DFS commands

We like to keep things simple here at Rapleaf. One small tweak we made right after we installed hadoop was to alias ‘hadoop dfs’ to ‘hdfs’. It rolls off the fingers nicely. We are also constantly typing ‘hdfs -ls this’ or ‘hdfs -du that’. If we are not sure what this/that is, we type ‘hdfs [...]

Posted in bash, Hadoop, HDFS | Also tagged , , , , , | 16 Comments

Dead Simple MapReduce Workflow Configuration

If you use MapReduce for any real-world application, chances are your workflow consists of more than one MapReduce job. Rapleaf has workflows consisting of over one hundred jobs. A lot of times, you need to make configurations to the workflow that should apply to every job. For example, you may want each job to run [...]

Posted in Hadoop, MapReduce | Also tagged | Leave a comment

Getting the serial terminal to work over IPMI on a Dell R410

As avid readers of the blog know, we use Hadoop a lot and talk about it quite a bit. We are in the process of expanding our Hadoop cluster and decided to go with the new Dell R410 1U machines.  From talks with other Hadoop users the sweet-spot is one spindle (drive) for every 2 [...]

Posted in Hadoop | Also tagged | Leave a comment

A Glance at the Hadoop Failure Model

Hadoop is designed to be a fault tolerant system. Jobs should be resilient to nodes going down and other random failures. Hadoop isn’t perfect however, as I still see jobs failing due to random causes every now and again. I decided to investigate the significance of the different factors that play into a job failing. [...]

Posted in Hadoop, MapReduce | Also tagged | 3 Comments
  • Rapleaf Is Hiring!

    We are looking for engineers who want to solve challenging problems.

    We have great people, do great work, and have great perks.

    Know someone who might be interested? Refer a friend and get $5,000 for successful hires.

    See our current openings at
    www.rapleaf.com/careers