We were recently fighting with some long-running requests hogging-up the single-threaded mongrel processes and causing requests to timeout. To be fair to mongrel, the only reason it is effectively single-threaded is because of Rails. Anyway, we were having a tough time figuring out how to determine where in our Rails code we were sitting there waiting. We were fairly certain the request thread (the thread with the Rails mutex in mongrel) was waiting on some sort of I/O. Was it the database? disk? memcache? network? All were possible yet we could not figure out what was the cause.
After some hoops we were able to find where our request was getting locked up. Here’s what we saw: The active thread was stuck at a particular line in net/http.rb (Ruby v1.8.6 line #560 if you wanted to know). What was this line? Well, diving into ruby standard lib we found the line in the connect method in Net::HTTP. Here it is:
[cc lang="ruby" tab_size="2"]
…
timeout(@open_timeout) { TCPSocket.open(…) }
…
[/cc]
How could the thread sit there when there is a timeout block encapsulating it? Well, skimming through the code further we noticed that the @open_timeout, an attribute on Net::HTTP, was initialized to nil. That’s right, nil. Hmm, finding that somewhat odd, we went further to ask, well “Whats the behavior of timeout when given a nil value?” Here’s what it does:
[cc lang="ruby" tab_size="2"]
def timeout(value)
return yield if value.nil? || value == 0
…
end
[/cc]
It does NOT timeout! So if TCPSocket.open hangs for any reason, YOU WILL WAIT FOREVER. Another notable observation, if you use Net::HTTP.start (class method) there is an implicit instantiation of a Net::HTTP object, which calls the connect method. There is a potential threat here for the developer to wonder why the block is stalling when there is nothing wrong with the block itself only that the http instance made available to it can hang at the call to TCPSocket.open.
Whats the fix? Well with the wonders of Ruby, we popped open Net::HTTP and initialized @open_timeout:
[cc lang="ruby" tab_size="2"]
require ‘net/http’
module Net
class HTTP
alias
rig_initialize :initialize
# Override HTTP#initialize to set a default @open_timeout to 10 secs. Original
# initialize method sets @open_timeout to nil, causing connect to wait until
# able to open a TCPSocket.
def initialize(*args)
orig_initialize(*args)
@open_timeout ||= 10
end
end
end
[/cc]
We chose 10 secs as some arbitrary number, no special reason. Now, in this particular I/O case, we will not block forever.








One Comment
Hey, I was running into issues w/ these timeouts when using tor. (There’s also read_timeout, which is initialized to a sane value.) Kind of dumb that open_timeout is initialized to nil, note that both have accessors so you could potentially just set them after doing Http.new (unless as you said you’re using Http#start)
Beware though, these little guys show up on WWW::Mechanize too! Re-opening the class might fix it though.
agent = WWW::Mechanize.new
agent.open_timeout = 60
agent.read_timeout = 30
In my case I was trying to turn *off* all the timeouts, actually (setting both to nil) because I always wrap any HTTP calls with an attempt { } block (that both retries & times out.)
Fun.. hope you guys used thread dump for that one!
One Trackback
[...] Finally, even the stalwart Net/HTTP standard library appears to be a victim of this behavior. Periodically, when we attempt to connect to various web services, the call to TCPSocket.open will hang forever. The real irony of this one is that Net/HTTP even has the code for connection timeouts built into the class, but the standard #start method doesn’t give you a way to pass a value. (This particular issue is treated on in much greater detail here). [...]