Command-line auto completion for Hadoop DFS commands

We like to keep things simple here at Rapleaf. One small tweak we made right after we installed hadoop was to alias 'hadoop dfs' to 'hdfs'. It rolls off the fingers nicely. We are also constantly typing 'hdfs -ls this' or 'hdfs -du that'. If we are not sure what this/that is, we type 'hdfs -ls /this/what', then 'hdfs -ls /this/what/ever', followed by a copy and paste or two. Thanks to our recent HackLeaf day and Nathan’s great idea, we no longer have to go through all of that. Just type 'hdfs -ls [tab]' and it works just like bash command-line completion.

This was easy to implement once I found the programmable completion tool by Ian Macdonald. I just added the following section to bash_completion:

[cc lang="bash" tab_size="2"]
# hdfs(1) completion
#
have hadoop &&
_hdfs()
{
local cur prev

COMPREPLY=()
cur=${COMP_WORDS[COMP_CWORD]}
prev=${COMP_WORDS[COMP_CWORD-1]}

if [[ "$prev" == hdfs ]]; then
COMPREPLY=( $( compgen -W ‘-ls -lsr -du -dus -count -mv -cp -rm
-rmr -expunge -put -copyFromLocal -moveToLocal -mkdir -setrep
-touchz -test -stat -tail -chmod -chown -chgrp -help’ — $cur ) )
fi

if [[ "$prev" == -ls ]] || [[ "$prev" == -lsr ]] ||
[[ "$prev" == -du ]] || [[ "$prev" == -dus ]] ||
[[ "$prev" == -cat ]] || [[ "$prev" == -mkdir ]] ||
[[ "$prev" == -put ]] || [[ "$prev" == -rm ]] ||
[[ "$prev" == -rmr ]] || [[ "$prev" == -tail ]] ||
[[ "$prev" == -cp ]]; then
if [[ -z "$cur" ]]; then
COMPREPLY=( $( compgen -W “$( hdfs -ls / 2>-|grep -v ^Found|awk ‘{print $8}’ )” — “$cur” ) )
elif [[ `echo $cur | grep /$` ]]; then
COMPREPLY=( $( compgen -W “$( hdfs -ls $cur 2>-|grep -v ^Found|awk ‘{print $8}’ )” — “$cur” ) )
else
COMPREPLY=( $( compgen -W “$( hdfs -ls $cur* 2>-|grep -v ^Found|awk ‘{print $8}’ )” — “$cur” ) )
fi
fi
} &&
complete -F _hdfs hdfs
[/cc]

I’m sure there are some ways to make the code more elegant, but it is called HackLeaf, after all. This bit of code builds on top of other functions in the script, but the basic idea is pretty simple. cur contains the current word you are typing, so this would be a partial command or partial path. prev contains the previous word. If the previous word is hdfs, then we present the user with valid arguments to hdfs. If the previous word is -ls (or any other command where you want a path/file), then present the user with the possibilities for that path or partial path. HDFS defaults to the user’s home directory if no path is provided, so we override that by presenting the user with the possibilities under “/”. Finally, COMPREPLY returns the possibilities to the user on the command-line.

Be sure to check out some of the other features of bash_completion, particularly ssh and chkconfig.

  • Facebook
  • HackerNews
  • Reddit
  • Twitter
  • del.icio.us
  • Digg
  • Slashdot
  • StumbleUpon
This entry was posted in bash, Hadoop, HDFS and tagged , , , , , , . Bookmark the permalink. Post a comment or leave a trackback: Trackback URL.

15 Comments

  1. Posted November 18, 2009 at 11:42 pm | Permalink

    I never considered how much time I waste typing complete hadoop commands until seeing this. Now, I can only image how much time I’ll save!

    Grabbing this immediately! Thanks for sharing!

  2. Drew Farris
    Posted November 20, 2009 at 3:17 pm | Permalink

    Thanks for the great tip! It is something I’ve always wished for, but never managed to put together.

    One thing I noticed: it appears your blog software might have eaten an ampersand because the lines like the following:


    COMPREPLY=( $( compgen -W "$( hdfs -ls / 2>-|grep -v ^Found|awk '{print $8}' )" -- "$cur" ) )

    Seem to need 2>&- (2>ampersand-) to properly close stderr. Without the ampersand the shell seems gets confused (at least in my case on ubuntu 9.04 running bash 3.2.48(1)

  3. Meng Mao
    Posted November 21, 2009 at 2:42 am | Permalink

    I was shortcutting things as:
    hals
    hacat
    harm (actually I was too chicken to make this one)

    But your approach is way cooler.

  4. Posted November 23, 2009 at 12:44 pm | Permalink

    @Drew: I don’t think the blog ate my ‘&’, but you’re right. The proper way to close STDERR is 2>&-. You could also do 2> /dev/null. We’re running CentOS and 2>- works, but I’m not sure why. Good catch!

  5. Amol
    Posted August 25, 2011 at 2:00 pm | Permalink

    FOr some reason, its not working for me. When I try to tab after the hdfs -ls command, it says permission denied.

    How do I fix this ? I chmodded the file to 777 and changed the ownership to the file directory name.

    Thanks..

    • ckline
      Posted August 25, 2011 at 2:57 pm | Permalink

      Is the permission denied coming from bash or hadoop? Can you type ‘hdfs -ls’ without a problem?

      • Amol
        Posted August 25, 2011 at 5:35 pm | Permalink

        Yes I can type hdfs -ls, it gives me the list of hadoop commands, however, when I try to enter a tab after hdfs -ls it shows ” – bash : permission denied”

        • Amol
          Posted August 25, 2011 at 5:36 pm | Permalink

          I also forgot to mention that I am running it in Ubuntu hadoop 0.20

        • ckline
          Posted August 25, 2011 at 8:04 pm | Permalink

          ‘hdfs -ls’ should list out the contents of your home directory on hdfs, not give you a list of hadoop commands. That may actually be the problem. hdfs is also an alias, so we should eliminate that variable. Try typing ‘hadoop dfs -ls’. This should give you a listing of your home directory. If it doesn’t, I would guess that something is wrong with the hadoop configuration.

          • Amol
            Posted August 26, 2011 at 5:32 am | Permalink

            Sorry, I understand that hdfs is an alias for hadoop fs.

            I think I unwittingly wrote that it spits out the list of commands. It does not. when I type hdfs -l and then tab, it shows -ls and -lsr.

            Note that its after I type hdfs -ls and then press a tab, it should be accessing the contents inside the hadoop directory.

            The script wrote above does not actually work at all, I found a script taken from here on this website:

            http://code.google.com/p/hadoopenv/source/browse/hdenv.sh

            ## Bash Autocompletion for HDFS
            # hdfs(1) completion
            # taken from: http://blog.rapleaf.com/dev/2009/11/17/command-line-auto-completion-for-hadoop-dfs-commands/
            have()
            {
            unset -v have
            PATH=$PATH:/sbin:/usr/sbin:/usr/local/sbin type $1 &>/dev/null &&
            have=”yes”
            }
            have hadoop &&
            _hdfs()
            {
            local cur prev

            COMPREPLY=()
            cur=${COMP_WORDS[COMP_CWORD]}
            prev=${COMP_WORDS[COMP_CWORD-1]}

            if [[ "$prev" == hdfs ]]; then
            COMPREPLY=( $( compgen -W ‘-ls -lsr -du -dus -count -mv -cp -rm \
            -rmr -expunge -put -copyFromLocal -moveToLocal -mkdir -setrep \
            -touchz -test -stat -tail -chmod -chown -chgrp -help’ — $cur ) )
            fi

            if [[ "$prev" == -ls ]] || [[ "$prev" == -lsr ]] || \
            [[ "$prev" == -du ]] || [[ "$prev" == -dus ]] || \
            [[ "$prev" == -cat ]] || [[ "$prev" == -mkdir ]] || \
            [[ "$prev" == -put ]] || [[ "$prev" == -rm ]] || \
            [[ "$prev" == -rmr ]] || [[ "$prev" == -tail ]] || \
            [[ "$prev" == -cp ]]; then
            if [[ -z "$cur" ]]; then
            COMPREPLY=( $( compgen -W “$( hdfs -ls / 2>-|grep -v ^Found|awk ‘{print $8}’ )” — “$cur” ) )
            elif [[ `echo $cur | grep \/$` ]]; then
            COMPREPLY=( $( compgen -W “$( hdfs -ls $cur 2>-|grep -v ^Found|awk ‘{print $8}’ )” — “$cur” ) )
            else
            COMPREPLY=( $( compgen -W “$( hdfs -ls $cur* 2>-|grep -v ^Found|awk ‘{print $8}’ )” — “$cur” ) )
            fi
            fi
            } &&
            complete -F _hdfs hdfs
            unset have

          • ckline
            Posted August 26, 2011 at 6:25 am | Permalink

            Glad you found a script that works for you. Not sure what’s going on, but it may be that the code above got messed up by a WordPress upgrade. For example, it looks like it’s missing the original backslashes.

            Thanks for the posting the code URL from google. People should definitely use that link going forward. I found it ironic that the code that worked for you actually did originally come from this blog post (note the “taken from” line in the code you posted).

            Happy auto-completing @Amol.

  6. Posted January 11, 2012 at 12:17 am | Permalink

    Excellent article… I got inspired and did more changes on it…
    checkout here – http://cloudblog.8kmiles.com/2012/01/09/hadoop-autocomplete-for-hadoop-dfs-commands/

    • ckline
      Posted January 11, 2012 at 10:40 am | Permalink

      Wow, looks great @dhamu. Definitely an upgrade in functionality.

  7. Posted January 14, 2012 at 1:07 am | Permalink

    Thanks ckline. I have published another bash completion script for hadoop commands at http://cloudblog.8kmiles.com/2012/01/14/hadoop-autocomplete-for-hadoop-commands/

One Trackback

  1. [...] What I needed was an bash auto completion for those commands. A simple google search led me to rapleaf, but I wanted more… so here it is.. I made hadoop dfs autocompletion scripts.Plans: 1) To [...]

Post a Comment

Your email is never published nor shared. Required fields are marked *

*
*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

  • Rapleaf Is Hiring!

    We are looking for engineers who want to solve challenging problems.

    We have great people, do great work, and have great perks.

    Know someone who might be interested? Refer a friend and get $5,000 for successful hires.

    See our current openings at
    www.rapleaf.com/careers