Command-line auto completion for Hadoop DFS commands

We like to keep things simple here at Rapleaf. One small tweak we made right after we installed hadoop was to alias 'hadoop dfs' to 'hdfs'. It rolls off the fingers nicely. We are also constantly typing 'hdfs -ls this' or 'hdfs -du that'. If we are not sure what this/that is, we type 'hdfs -ls /this/what', then 'hdfs -ls /this/what/ever', followed by a copy and paste or two. Thanks to our recent HackLeaf day and Nathan’s great idea, we no longer have to go through all of that. Just type 'hdfs -ls [tab]' and it works just like bash command-line completion.

This was easy to implement once I found the programmable completion tool by Ian Macdonald. I just added the following section to bash_completion:

# hdfs(1) completion
#
have hadoop &&
_hdfs()
{
  local cur prev

  COMPREPLY=()
  cur=${COMP_WORDS[COMP_CWORD]}
  prev=${COMP_WORDS[COMP_CWORD-1]}

  if [[ "$prev" == hdfs ]]; then
    COMPREPLY=( $( compgen -W '-ls -lsr -du -dus -count -mv -cp -rm \
      -rmr -expunge -put -copyFromLocal -moveToLocal -mkdir -setrep \
      -touchz -test -stat -tail -chmod -chown -chgrp -help'
-- $cur ) )
  fi

  if [[ "$prev" == -ls ]] || [[ "$prev" == -lsr ]] || \
     [[ "$prev" == -du ]] || [[ "$prev" == -dus ]] || \
     [[ "$prev" == -cat ]] || [[ "$prev" == -mkdir ]] || \
     [[ "$prev" == -put ]] || [[ "$prev" == -rm ]] || \
     [[ "$prev" == -rmr ]] || [[ "$prev" == -tail ]] || \
     [[ "$prev" == -cp ]]; then
    if [[ -z "$cur" ]]; then
      COMPREPLY=( $( compgen -W "$( hdfs -ls / 2>-|grep -v ^Found|awk '{print $8}' )" -- "$cur" ) )
    elif [[ `echo $cur | grep \/$` ]]; then
      COMPREPLY=( $( compgen -W "$( hdfs -ls $cur 2>-|grep -v ^Found|awk '{print $8}' )" -- "$cur" ) )
    else
      COMPREPLY=( $( compgen -W "$( hdfs -ls $cur* 2>-|grep -v ^Found|awk '{print $8}' )" -- "$cur" ) )
    fi
  fi
} &&
complete -F _hdfs hdfs

I’m sure there are some ways to make the code more elegant, but it is called HackLeaf, after all. This bit of code builds on top of other functions in the script, but the basic idea is pretty simple. cur contains the current word you are typing, so this would be a partial command or partial path. prev contains the previous word. If the previous word is hdfs, then we present the user with valid arguments to hdfs. If the previous word is -ls (or any other command where you want a path/file), then present the user with the possibilities for that path or partial path. HDFS defaults to the user’s home directory if no path is provided, so we override that by presenting the user with the possibilities under “/”. Finally, COMPREPLY returns the possibilities to the user on the command-line.

Be sure to check out some of the other features of bash_completion, particularly ssh and chkconfig.

This entry was posted in HDFS, Hadoop, bash and tagged , , , , , , . Bookmark the permalink. Post a comment or leave a trackback: Trackback URL.

4 Comments

  1. Posted November 18, 2009 at 11:42 pm | Permalink

    I never considered how much time I waste typing complete hadoop commands until seeing this. Now, I can only image how much time I’ll save!

    Grabbing this immediately! Thanks for sharing!

  2. Drew Farris
    Posted November 20, 2009 at 3:17 pm | Permalink

    Thanks for the great tip! It is something I’ve always wished for, but never managed to put together.

    One thing I noticed: it appears your blog software might have eaten an ampersand because the lines like the following:

    COMPREPLY=( $( compgen -W "$( hdfs -ls / 2>-|grep -v ^Found|awk '{print $8}' )" -- "$cur" ) )

    Seem to need 2>&- (2>ampersand-) to properly close stderr. Without the ampersand the shell seems gets confused (at least in my case on ubuntu 9.04 running bash 3.2.48(1)

  3. Meng Mao
    Posted November 21, 2009 at 2:42 am | Permalink

    I was shortcutting things as:
    hals
    hacat
    harm (actually I was too chicken to make this one)

    But your approach is way cooler.

  4. Posted November 23, 2009 at 12:44 pm | Permalink

    @Drew: I don’t think the blog ate my ‘&’, but you’re right. The proper way to close STDERR is 2>&-. You could also do 2> /dev/null. We’re running CentOS and 2>- works, but I’m not sure why. Good catch!

Post a Comment

Your email is never published nor shared. Required fields are marked *

*
*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>