Making sure Ruby Daemons die

We use the Daemons Ruby Gem for a variety of applications. It has served us well, but we found ourselves wrapping the “stop” command with a shell script that makes sure the process actually dies. This behavior is necessary for our deploy scripts which restart daemons. Thanks to the magic of Ruby, we were able to eliminate these extra scripts with a simple Daemons extension.

Before extension: stop command returns immediately, pid file is deleted and we have no clue if the process is dead.
After extension: stop command blocks until the process is dead, giving feedback along the way.

To make sure the stop command doesn’t hang indefinitely, we send the TERM signal, then send the KILL signal (kill -9) if the process hasn’t died after a configurable amount of time. To use this extension, specify :force_kill_wait in seconds as part of the Daemons options hash:

require 'daemons_extension'
Daemons.run_proc('dantalion', :force_kill_wait => 30) { ...

The implementation starts with making sure the pid file matches the UNIX ‘ps’ command. Let’s crack open the Daemons::ApplicationGroup class and redefine the find_applications method:

# We want to redefine find_applications to not rely on
# pidfiles (e.g. find application if pidfile is gone)
# We recreate the pid files if they're not there.
def find_applications(dir)
  # Find pid_files, like original implementation
  pid_files = PidFile.find_files(dir, app_name)
  @monitor = Monitor.find(dir, app_name + '_monitor')
  pid_files.reject! {|f| f =~ /_monitor.pid$/}
 
  # Find the missing pids based on the UNIX pids
  pidfile_pids = pid_files.map {|pf| PidFile.existing(pf).pid}
  missing_pids = unix_pids - pidfile_pids

  # Create pidfiles that are gone
  if missing_pids.size > 0
    puts "[daemons_ext]: #{missing_pids.size} missing pidfiles: " +
          "#{missing_pids.inspect}... creating pid file(s)."
    missing_pids.each do |pid|
      pidfile = PidFile.new(dir, app_name, multiple)
      pidfile.pid = pid # Doesn't seem to matter if it's a string or Fixnum
    end
  end

  # Now get all the pid file again
  pid_files = PidFile.find_files(dir, app_name)

  return pid_files.map {|f|
    app = Application.new(self, {}, PidFile.existing(f))
    setup_app(app)
    app
  }
end

Now we can be sure that we’ll send a signal to our process even if the pid file was initially missing. You can reference the attached code to see how unix_pids is implemented. Next, we redefine Daemons::ApplicationGroup.stop_all:

# Specify :force_kill_wait => (seconds to wait) and this method will
# block until the process is dead.  It first sends a TERM signal, then
# a KILL signal (-9) if the process hasn't died after the wait time.
def stop_all(force = false)
  @monitor.stop if @monitor
 
  wait = options[:force_kill_wait].to_i
  if wait > 0
    puts "[daemons_ext]: Killing #{app_name} with force after #{wait} secs."

    # Send term first, don't delete PID files.
    @applications.each {|a| a.send_sig('TERM')}

    begin
      started_at = Time.now
      Timeout::timeout(wait) do
        num_pids = unix_pids.size
        while num_pids > 0
          time_left = wait - (Time.now - started_at)
          puts "[daemons_ext]: Waiting #{time_left.round} secs on " +
                "#{num_pids} #{app_name}(s)..."
          sleep 1
          num_pids = unix_pids.size
        end
      end
    rescue Timeout::Error
      @applications.each {|a| a.send_sig('KILL')}
    ensure
      # Delete Pidfiles
      @applications.each {|a| a.zap!}
    end

    puts "[daemons_ext]: All #{app_name}(s) dead."
  else
    @applications.each {|a|
      if force
        begin; a.stop; rescue ::Exception; end
      else
        a.stop
      end
    }
  end
end

Now we can be sure that our process is dead when the stop command returns… at least as sure as kill -9 will kill a process (I have seen where kill -9 didn’t kill a process, but that was 8 years ago on SunOS). The extension will also work for :multiple => true. Note that because of the system calls, this extension will not work on all operating systems… also a good reason not to patch Daemons.

Download daemons_extension.rb
Download New Daemons Extension (tested with 1.0.10 and 1.0.8)
Download CHANGELOG

This entry was posted in Daemons, Extensions, Ruby. Bookmark the permalink. Post a comment or leave a trackback: Trackback URL.

18 Comments

  1. Posted May 29, 2008 at 3:00 am | Permalink

    Just what I was looking for! The extensions work great. Thanks!

  2. Posted June 9, 2008 at 2:47 pm | Permalink

    hey guys,
    I am having this *exact* same issue… monit sends stop, then starts, blows away the pid file, and the daemons gem does not clean up after itself.

    This looks perfect!

    btw, have you submitted this to the maintainers of the gem?
    Adam

  3. Michael Russo
    Posted June 16, 2008 at 2:55 pm | Permalink

    Hi Chris,

    Thanks very much for this!

    I was still having a problem with some daemons not dying — hopefully this will help anyone with the same issue.

    I poked around and noticed that by the time the call is made to send the KILL signal to any remaining processes, all of the pid files were already deleted, regardless of the daemon’s status.

    I fixed this by adding the following line to the beginning of the exception handler that handles Timeout::Error in stop_all:

    find_applications(pidfile_dir())

  4. chris
    Posted July 2, 2008 at 3:08 pm | Permalink

    Chris, can I ask which version of daemons you are using?

  5. Posted July 2, 2008 at 7:00 pm | Permalink

    Hi Chris/Adam – We’re currently running on version 1.0.10, but we’ve also updated the extension (works fine on 1.0.8 too). I’ll upload the new code and a changelog.

    We considered submitting a patch, but decided not to because this extension is OS dependent. Although, these are just options you can use with warning. Maybe it’s worth a shot.

  6. chris
    Posted July 3, 2008 at 11:29 am | Permalink

    @Chris : Thanks for the reply. Looking forward to seeing the updated code.

  7. limey
    Posted July 16, 2008 at 10:36 am | Permalink

    looks good, maybe it could help me out ?

    I need to run 2 cmds in the background so they do not “block” (Linux 2.6) while a 3rd cmd does its own thing, when cmd3 is complete i want a full tear down of the other runners.

    system(‘cmdx’)

    cmd1 netstat 1 >> /netstat.out
    cmd2 dstat –output /dstat.out
    cmd3 bonnie++ /mnt/files

    The cmds run fine but I can never control “netstat” or “dstat” they always become orphaned :( even when “daemonized”

  8. Posted January 9, 2009 at 8:27 am | Permalink

    sudo gem sources -a http://gems.github.com
    sudo gem install seamusabshere-daemons

    just a quick-and-dirty github gem with Chris’s fix.

    (note: the gem version is set to 1.0.11 or higher in order to supersede 1.0.10.0, the last version available on rubyforge at time of posting)

  9. Posted January 9, 2009 at 9:13 am | Permalink

    Seamus – Thanks for posting the link. Thomas Uehlinger and I were discussing getting this fix into the next version, but that was back in July 2008. I’ve sent him an email to see what his thoughts are.

  10. Greg Hazel
    Posted February 26, 2009 at 7:14 pm | Permalink

    We have the same issue with daemons not dying properly. “stop” does not block, and deletes the pidfile before the daemon does. Starting a new daemon before the first dies can result in the first daemon deleting the second daemon’s pidfile as it exits.

    According to the ruby Process.kill docs, Process.wait should be used to wait for the process to die: http://www.ruby-doc.org/core/classes/Process.html#M003183

    So I believe the daemon module’s bug could be fixed quite easily with only one line: Process.wait

    Sending KILL after a fixed amount of time does technically solve the issue, although it’s sort of overkill (excuse the pun).

  11. Greg Hazel
    Posted February 27, 2009 at 2:14 am | Permalink

    Ok, more than one line. Process.wait doesn’t work because the pid is not a child of the process that is stopping everything. Instead you need:

    def fancy_wait pid
    begin
    while (1) do
    Process.kill(0, pid)
    sleep 0.1
    end
    rescue Errno::ESRCH, Errno::ECHILD
    end
    end

  12. Posted March 8, 2009 at 10:21 am | Permalink

    Hi Greg,
    We’re always looking for ways to simplify code, so thanks for the ideas. I have a couple of questions:

    1. What signal does ‘0′ correspond to (in ‘Process.kill(0, pid)’)? On both my Mac and CentOS, TERM is ‘15′ and KILL is ‘9′. Maybe I’m just missing something…
    2. Does this work with :multiple => true? Perhaps it would help to see where this fits into the Daemons code.

    Our use cases might be very different, but it is important for us to receive feedback via STDOUT along the way. In your solution, we could just add a ‘puts’ in the while loop, but we probably wouldn’t want that message to appear 10 times a second. That brings up another subtle difference. In your solution, a signal is sent after every sleep period. In our solution we send a TERM signal, then wait a configurable amount of time for the process to exit gracefully before sending the KILL signal.

  13. Mike Ayers
    Posted July 20, 2009 at 4:51 pm | Permalink

    Signal 0 is, as best I can tell, a meta-signal that causes an exception if the process does not exist. I saw it in some Ruby code, which I copied, then discovered that this signal is not only not POSIX, it’s not even uniformly supported on Linux! I think Greg meant “Process.kill(‘TERM’, pid)”, as signal 0 would not stop anything.

  14. Tony Payne
    Posted July 21, 2009 at 8:06 pm | Permalink

    Note that Greg’s code is merely a wait function, not a kill. It continues to loop as long as the process exists. kill -0 does not signal the process, but instead returns true if the process is alive, false otherwise. It’s a pretty standard kill value.

  15. Posted August 11, 2009 at 6:43 pm | Permalink

    Thanks for posting this, I was having the same problem.

    I ended up switching to the daemons-spawn gem instead using instructions here and that also fixed it:
    http://rwldesign.com/journals/1-solutions/posts/24-working-with-delayed-job

    One drawback is the daemons-gem script doesn’t (yet) support multiple workers with a -n option (it only launches one). Probably wouldn’t be too hard to modify it. Anyway, just wanted to post another solution.

    delayed_job should really update their daemonizing instructions on github! As evidenced by this thread, what they have posted is not stable in production.

    Thanks!
    Brian
    http://feedmailpro.com

  16. Greg Hazel
    Posted August 26, 2009 at 12:59 pm | Permalink

    As Tony said, that is just a wait function. In fact, the Daemons library already uses kill(0, pid) to implement Daemons::Pid.running?(pid), and as noted there kill(0, pid) does not send a signal; it’s just a syscall to see if a signal may be sent.

    So you can see where it fits into the Daemons code, and so there’s a gem, I’ve made a github repo out of the Daemons SVN repository:
    http://github.com/ghazel/daemons/

    Here is the relevant commit:
    http://github.com/ghazel/daemons/commit/3e91f91c5a95409bdbd54039e4163ea509d66619

  17. Posted December 15, 2009 at 4:03 pm | Permalink

    Thanks Chris and Seamus, this fix & gem was exactly what I needed :-)

  18. Posted June 20, 2010 at 5:04 pm | Permalink

    great extension , thanks

2 Trackbacks

  1. [...] This problem has thankfully been solved by the use of RapLeafs’ daemon_extension code which is basically a bundle of hacks to kill -9 a daemon that refuses to die after a certain timeout period. This isn’t perfect by any stretch of the imagination, but from a pragmatists point of view: It’ll do! [...]

  2. By links for 2010-02-09 « Bloggitation on February 9, 2010 at 11:05 pm

    [...] Making sure Ruby Daemons die (tags: ruby sysadmin) [...]

Post a Comment

Your email is never published nor shared. Required fields are marked *

*
*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>