The Wrath of DrWho, or Unpredictable Hadoop Memory Usage

A while back we encountered a really peculiar problem with one of our Hadoop apps. The app did a bunch of HDFS operations before launching a small Map/Reduce job and going on to do a bunch of memory intensive operations. It would run very happily for a few days to a week at a time, at which point it would suddenly start failing with exceptions that look like this:

org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.security.AccessControlException: Permission denied: user=DrWho, access=EXECUTE, inode="data":rapleaf:supergroup:rwxr-x---

Who is this curious DrWho user, and why is he trying to access our data? A little digging revealed that DrWho was the user that Hadoop picks when it can’t figure out who you’re supposed to be. Not the most helpful username to pick for that scenario, to say the least.

All the info we found on the issue indicated that when Hadoop can’t figure out user/group info, it’s because of some issue with the machine’s configuration, the “whoami” command, or things along those lines. We checked everything we could think of – our LDAP config, NFS mounts, network connectivity, other HDFS commands on the same machine. Nothing ever solved the issue like we expected. The only way we found to make the app start working again was the literally reboot the machine. So Windows 95, isn’t it?

Finally, with the help of Cloudera, we were able to figure it out. In order to call the “whoami” command to get user info, the Hadoop code would shell out, which would fork the entire JVM process for an instant. As previously stated, the app in question was a memory-intensive app which was configured to pre-allocate about 95% of the machine’s physical memory. Most of the time, when the machine was fresh, this would work with no issue. However, after a few days of running, I’m guessing some other things would accumulate in memory, so when the fork call would happen, the OS would fail to spawn the shell command’s process because there wouldn’t be enough memory for it.

Ultimately, we were able to fix this problem by pre-allocating less of the machine’s physical memory at JVM startup – more like 75%. Since then we’ve not had the problem.

However, we have run into a related class of problems. In another memory-intensive application, again near the machine’s physical memory limit, we found that we’d very consistently get OutOfMemoryExceptions when there were a number of concurrent copyToLocal calls running. Again, after some frustration, we were able to track the problem down to a shell command, in this case “chmod”, that was being called at the end of copyToLocal. In this case, we were able to reorganize our code to alleviate the issue.

The bottom line is that Hadoop can have some unpredictable behavior when it makes shell calls, so beware.

This entry was posted in HDFS, Hadoop. Bookmark the permalink. Post a comment or leave a trackback: Trackback URL.

6 Comments

  1. Posted January 6, 2010 at 6:09 am | Permalink

    This is interesting. Shouldn’t copy-on-write semantics of fork() prevent this problem?

  2. Posted January 6, 2010 at 9:00 am | Permalink

    @harish: it definitely should, and most of the time, it does. However, I think there are some oddities around what happens when you ask the OS for 2x the physical memory of the machine. There’s this “overcommit” parameter that governs this behavior which I must admit to not fully understanding. Additionally, I have a suspicion that when there’s a reasonable amount of concurrency, the garbage collector could somehow interfere with copy on write.

  3. Charles Glommen
    Posted January 6, 2010 at 4:31 pm | Permalink

    I am experiencing a similar, if not identical, issue with the OutOfMemoryError. Can you provide more details into how you fixed this?

  4. Posted January 6, 2010 at 4:40 pm | Permalink

    @Charles: Sadly, it’s fairly application-specific. In our instance, we just made sure that we did all of our copyToLocal calls before we started doing the memory-intensive side of the application. This way, the shell calls happened before the JVM had accumulated a large heap size. I’d recommend trying this, and if that won’t work, you could consider not using copyToLocal (that is, streaming the files directly).

  5. Charles Glommen
    Posted January 6, 2010 at 4:54 pm | Permalink

    Ah, I see. I’m not explicitly using copyToLocal, but have noticed that call in the stack trace when this occurs. It seems to occur when I provide tens of thousands of input files for a single map/reduce.

  6. Posted January 10, 2010 at 9:47 pm | Permalink

    Here’s a bit of further explanation (actually mostly copy-pasted from our response to Rapleaf’s support ticket):

    Since Hadoop uses fork and not vfork, it actually needs to make another copy of the memory space of your process. Fork uses copy-on-write, so it’s still fairly efficient, but if the process that’s forking has a large memory footprint, it can get ENOMEM if there isn’t a ton of swap available. Basically, Linux will overcommit memory up to a certain configurable amount, but if the total vmem footprint of your processes exceeds a certain threshold, it will stop allowing it.

    One way to confirm that this is the issue is to temporarily enable complete overcommitting of memory by tweaking /proc/sys/vm/overcommit_memory to the value 1 (“always overcommit”). I cannot recommend running with this setting – the default heuristic (value 0) is much safer.

    Here are some short-term workarounds:

    The first is to simply configure this node with a ton of swap space. This will figure into the memory overcommit calculations so that it won’t return this error. Alternatively, you could fiddle with the value of /proc/sys/vm/overcommit_ratio – perhaps setting it to 100 or 200 instead of the default 50 would solve the issue.

    The second workaround would be to force the UserGroupInformation to get cached earlier in your process, before you allocate large amounts of memory. You should be able to do this by calling UserGroupInformation.login(conf) at the top of your application.

    Long term, this is an acknowledged bug/issue with Hadoop. http://issues.apache.org/jira/browse/HADOOP-4998 is the JIRA that will hopefully solve it – it proposes to implement a JNI library that will allow access to these unixy things without having to fork separate processes. Eli from Cloudera has started working on this ticket and we hope to include it in CDH once it’s finished and committed.

    -Todd from Cloudera

Post a Comment

Your email is never published nor shared. Required fields are marked *

*
*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>