-
Recent Posts
Popular Posts
- Rent or Own: Amazon EC2 vs. Colocation Comparison for Hadoop Clusters 27 comment(s) | 10834 view(s)
- Mysql Replication Adapter 26 comment(s) | 6656 view(s)
- Making sure Ruby Daemons die 20 comment(s) | 7353 view(s)
- Matching Impedance: When to use HBase 19 comment(s) | 22272 view(s)
- Goodbye MapReduce, Hello Cascading 17 comment(s) | 9671 view(s)
- Rapleaf Challenge Problem 12 comment(s) | 3792 view(s)
- BloomFilter 11 comment(s) | 5441 view(s)
- Using random numbers in Hadoop MapReduce is dangerous 11 comment(s) | 4028 view(s)
- Ruby and HBase 10 comment(s) | 5264 view(s)
- Cycles of Doom in Batch Processing Workflows 10 comment(s) | 2660 view(s)
Categories
- Anonymouse (1)
- Apache (1)
- bash (1)
- Cascading (6)
- Daemons (1)
- encryption (1)
- Extensions (2)
- Google (1)
- Grub (1)
- Hadoop (22)
- HBase (6)
- HDFS (4)
- Kickstart (1)
- MapReduce (9)
- mcrypt (1)
- Miscellaneous (26)
- Mongrel (2)
- Mysql (2)
- OpenSocial (1)
- Operations (1)
- Ruby (7)
- Security (2)
- Thrift (6)
- Xen (1)
Archives
- August 2010
- July 2010
- June 2010
- May 2010
- April 2010
- March 2010
- February 2010
- January 2010
- December 2009
- November 2009
- October 2009
- September 2009
- August 2009
- July 2009
- June 2009
- May 2009
- March 2009
- February 2009
- December 2008
- November 2008
- October 2008
- September 2008
- August 2008
- July 2008
- April 2008
- March 2008
- February 2008
- January 2008
- December 2007
- November 2007
- October 2007
- September 2007
- August 2007
Tag Archives: Hadoop
Dead Simple MapReduce Workflow Configuration
If you use MapReduce for any real-world application, chances are your workflow consists of more than one MapReduce job. Rapleaf has workflows consisting of over one hundred jobs. A lot of times, you need to make configurations to the workflow that should apply to every job. For example, you may want each job to run [...]
Getting the serial terminal to work over IPMI on a Dell R410
As avid readers of the blog know, we use Hadoop a lot and talk about it quite a bit. We are in the process of expanding our Hadoop cluster and decided to go with the new Dell R410 1U machines. From talks with other Hadoop users the sweet-spot is one spindle (drive) for every [...]
A Glance at the Hadoop Failure Model
Hadoop is designed to be a fault tolerant system. Jobs should be resilient to nodes going down and other random failures. Hadoop isn’t perfect however, as I still see jobs failing due to random causes every now and again. I decided to investigate the significance of the different factors that play into a job failing.
A [...]

Command-line auto completion for Hadoop DFS commands