FooBarWidget/daemon_controllerPublic

NotificationsYou must be signed in to change notification settings
Fork13
Star212

A library for implementing daemon management capabilities.

blog.phusion.nl/2008/08/25/daemon_controller-a-library-for-robust-daemon-management/

License

MIT license

212 stars 13 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 173 Commits
.github/workflows		.github/workflows
doc		doc
lib		lib
spec		spec
.editorconfig		.editorconfig
.gitignore		.gitignore
.rspec		.rspec
.standard.yml		.standard.yml
Gemfile		Gemfile
Gemfile.lock		Gemfile.lock
LICENSE.txt		LICENSE.txt
README.md		README.md
Rakefile		Rakefile
daemon_controller.gemspec		daemon_controller.gemspec

Repository files navigation

Introduction

daemon_controller is a library for starting and stopping specific daemonsprogrammatically in a robust, race-condition-free manner.

It's not a daemon monitoring system like God or Monit. It's also not a libraryfor writing daemons.

It provides the following functionality:

Starting daemons. If the daemon fails to start then an exception will beraised.daemon_controller can even detect failures that occur after thedaemon has already daemonized.
Starting daemons is done in a race-condition-free manner. If anotherprocess usingdaemon_controller is trying to start the same daemon,thendaemon_controller will guarantee serialization.
daemon_controller also raises an exception if it detects that the daemonis already started.
Connecting to a daemon, starting it if it's not already started. This toois done in a race-condition-free manner. If the daemon fails to start thenan exception will be raised.
Stopping daemons.
Checking whether a daemon is running.

Installation

gem install daemon_controller

What is it for?

There is a lot of software (both Rails related and unrelated) which rely onservers or daemons. To name a few, in no particular order:

Ultrasphinx, aRails library for full-text searching. It makes use theSphinx searchsoftware for indexing and searching. Indexingis done by running a command, while searching is done by querying the Sphinxsearch server.
acts_as_ferret, anotherRails library for full-text searching. It uses the Ferret search software.On production environments, it relies on the Ferret DRB server for bothsearching and indexing.
BackgrounDRb, a Ruby job server andscheduler. Scheduling is done by contacting the BackgrounDRb daemon.
mongrel_cluster, whichstarts and stops multiple Mongrel daemons.

Relying on daemons is quite common, but not without problems. Let's go oversome of them.

Starting daemons is a hassle

If you've used similar software, then you might agree that managing thesedaemons is a hassle. If you're using BackgrounDRb, then the daemon must berunning. Starting the daemon is not hard, but it is annoying. It's alsopossible that the system administrator forgets to start the daemon. Whileconfiguring the system to automatically start a daemon at startup is not hard,it is an extra thing to do, and thus a hassle. We thought, why can't suchdaemons be automatically started? Indeed, this won't be possible if the daemonis to be run on a remote machine. But in by far the majority of use cases, thedaemon runs on the same host as the Rails application. If a Rails application -or indeed,any application - is configured to contact a daemon on thelocal host, then why not start the daemon automatically on demand?

Daemon starting code may not be robust or efficient

We've also observed that people write daemon controlling code over and overagain. Consider for example UltraSphinx, which provides arake sphinx:daemon:start Rake task to start the daemon. The time that adaemon needs to initialize is variable, and depends on things such as thecurrent system load. The Sphinx daemon usually needs less than a second beforewe can connect to it. However, the way different software handles starting of adaemon varies. We've observed that waiting a fixed amount of time is by far themost common way. For example, UltraSphinx's daemon starting code looks likethis:

system"searchd --config '#{Ultrasphinx::CONF_PATH}'"sleep(4)# give daemon a chance to write the pid fileifultrasphinx_daemon_running?say"started successfully"elsesay"failed to start"end

This is in no way a slam against UltraSphinx. However, if the daemon starts in200 miliseconds, then the user who issued the start command will be waiting for3.8 seconds for no good reason. This is not good for usability or for theuser's patience.

Startup error handling

Different software handles daemon startup errors in different ways. Some mightnot even handle errors at all. For example, consider 'mongrel_cluster'. Ifthere's a typo in one of your application source files, then 'mongrel_cluster'will not report the error. Instead, you have to check its log files to see whathappened. This is not good for usability: many people will be wondering whythey can't connect to their Mongrel ports after issuing amongrel_rails cluster::start - until they realize that they should read thelog file. But the thing is, not everybody realizes this. And typing in an extracommand to read the log file to check whether Mongrel started correctly, isjust a big hassle. Why can't the daemon startup code report such errorsimmediately?

Stale or corrupt Pid files

Suppose that you're running a Mongrel cluster, and your server suddenly powersoff because of a power outage. When the server is online again, it fails tostart your Mongrel cluster because the PID file that it had written stillexists, and wasn't cleaned up properly (it's supposed to be cleaned up whenMongrel exits). mongrel_cluster provides the--clean option to check whetherthe PID file isstale, and will automatically clean it up if it is. But notall daemon controlling software supports this. Why can't all software check forstale PID files automatically?

Implementation issues

From the problem descriptions, it would become apparent that our wishlist is asfollows. Why is this wishlist often not implemented? Let's go over them.

A daemon should be automatically started on demand, instead of requiring the user to manually start it.
The most obvious problems are related to concurrency. Suppose that your webapplication has a search box, and you want to start the search daemon if itisn't already started, then connect to. Two problems will arise:
- Suppose that Rails process A is still starting the daemon. At the sametime, another visitor tries to search something, and Rails process Bnotices that the daemon is not running. If B tries to start the daemonwhile it's already being started by A, then things can go wrong.A robust daemon starter must ensure that only one process at the same time may start the daemon.
- It's not a good idea to wait a fixed amount of time for the daemon tostart, because you don't know in advance how long it will take for it tostart. For example, if you wait 2 seconds, then try to connect to thedaemon, and the daemon isn't done initializing yet, then it will seem asif the daemon failed to start.
These are the most probable reasons why people don't try to writeauto-starting code, and instead require the user to start the daemonmanually.
These problems, as well as several less obvious problems, are closelyrelated to the next few points.
The daemon starter must wait until the daemon is done initializing, no longer and no shorter
Because only after the daemon is fully initialized, is it safe to connectto it. And because the user should not have to wait longer than he reallyhas to. During startup, the daemon will have to be continuously checkedwhether it's done initializing or whether an error occured. Writing thiscode can be quite a hassle, which is why most people don't do it.
The daemon starter must report any startup errors
If the daemon starting command - e.g.sphinx -c config_file.conf,apachectl start ormongrel_rails cluster::start - reports startuperrors, then all is fine as long as the user is starting the command from aterminal. A problem occurs when the error occurs after the daemon hasalready gone into the background. Such errors are only reported to the logfile.The daemon starter should also check the log file for any startup errors.
Furthermore, it should be able to raise startup errors as exceptions. Thisallows the the application to decide what to do with the error. For lessexperienced system administrators, the error might be displayed in thebrowser, allowing the administrators to become aware of the problem withoutforcing them to manually check the log files. Or the error might be emailedto a system administrator's email address.
The daemon starter must be able to correct stale or corrupted PID files
If the PID file is stale, or for some reason has been corrupted, then thedaemon starter must be able to cope with that.It should check whether the PID file contains a valid PID, and whether the PID exists.

Introducing daemon_controller

daemon_controller is a library for managing daemons in a robust manner. It isnot a tool for managing daemons. Rather, it is a library which lets you writeapplications that manage daemons in a robust manner. For example,'mongrel_cluster' or UltraSphinx may be adapted to utilize this library, formore robust daemon management.

daemon_controller implements all items in the aforementioned wishlist. Itprovides the following functionalities:

Starting a daemon

This ensures that no two processes can start the same daemon at the same time.It will also reports any startup errors, even errors that occur after thedaemon has already gone into the background but before it has fully initializedyet. It also allows you to set a timeout, and will try to abort the daemon ifit takes too long to initialize.

The start function won't return until the daemon has been fully initialized,and is responding to connections. So if the start function has returned, thenthe daemon is guaranteed to be usable.

Stopping a daemon

It will stop the daemon, but only if it's already running. Any errorsare reported. If the daemon isn't already running, then it will silentlysucceed. Just like starting a daemon, you can set a timeout for stopping thedaemon.

Like the start function, the stop function won't return until the daemon is nolonger running. This makes it save to immediately start the same daemon againafter having stopped it, without worrying that the previous daemon instancehasn't exited yet and might conflict with the newly started daemon instance.

Connecting to a daemon, starting it if it isn't running

Every daemon has to be connected to using a different way. As a developer, youtell 'daemon_controller' how to connect to the daemon. It will then attempt todo that, and if that fails, it will check whether the daemon is running. If itisn't running, then it will automatically start the daemon, and attempt toconnect to the daemon again. Failures are reported.

Checking whether a daemon is running

This information is retrieved from the PID file. It also checks whether the PIDfile is stale.

All failures are reported via exceptions

So that you can exactly determine how you want to handle errors.

Lots and lots of error checking

So that there are very few ways in which the system can screw up.

daemon_controller's goal is to make daemon management less of a hassle, and asautomatic and straightforward as possible.

What about Monit/God?

daemon_controller is not a replacement forMonitorGod. Rather, it is a solution to the followingproblem:

Hongli: hey Ninh, do a 'git pull', I just implemented awesome searchingfeatures in our application!Ninh: cool.pulls from repositoryNinh: hey Hongli, it doesn't work.Hongli: what do you mean, it doesn't work?Ninh: it says "connection refused", or somethingHongli: oh I forgot to mention it, you have to run the Sphinx searchdaemon before it works. type "rake sphinx:daemon:start" to dothatNinh: great. but now I get a different error. something aboutBackgrounDRb.Hongli: oops, I forgot to mention this too. you need to start theBackgrounDRb server with "rake backgroundrb:start_server"Ninh: okay, so every time I want to use this app, I have to type"rake sphinx:daemon:start", "rake backgroundrb:start_server" and"./script/server"?Hongli: yep

Imagine the above conversation becoming just:

Hongli: hey Ninh, do a 'git pull', I just implemented awesome searchingfeatures in our application!Ninh: cool.pulls from repositoryNinh: awesome, it works!

This is not something that can be achieved with Monit/God. Monit/God are formonitoring daemons, auto-restarting them when they use too much resources.daemon_controller's goal is to allow developers to implement daemonstarting/stopping and daemon auto-starting code that's robust. daemon_controlleris intended to be used to make daemon-dependent applications Just Work(tm)without having to start the daemons manually.

Tutorial #1: controlling Apache

Suppose that you're aPhusion Passenger developer,and you need to write tests for the Apache module. In particular, you want totest whether the different Phusion Passenger configuration directives areworking as expected. Obviously, to test the Apache module, the Apache webserver must be running. For every test, you will want the unit test suite to:

Write an Apache configuration file, with the relevant configurationdirective set to a specific value.
Start Apache.
Send an HTTP request to Apache and check whether the HTTP response matchesyour expectations.
Stop Apache.

That can be done with the following code:

require"daemon_controller"File.open("apache.conf","w")do |f|f.write("PidFile apache.pid\n")f.write("LogFile apache.log\n")f.write("Listen 1234\n")f.write(...otherrelevantconfigurationoptions ...)endcontroller=DaemonController.new(identifier:"Apache web server",start_command:"apachectl -f apache.conf -k start",ping_command:[:tcp,"localhost",1234],pid_file:"apache.pid",log_file:"apache.log")controller.start# .... apache is now started ....# .... some test code here ....controller.stop

TheFile.open line is obvious: it writes the relevant Apache configurationfile.

The next line is for creating a new DaemonController object. We pass ahuman-readable identifier for this daemon ("Apache web server") to theconstructor. This is used for generating friendlier error messages.We also tell it how Apache is supposed to be started (start_command:), how tocheck whether it can be connected to (ping_command:), and where its PID fileand log file is. If Apache failed with an error during startup, then it will bereported. If Apache failed with an error after it has gone into the background,then that will be reported too: the given log file is monitored for new errormessages.Finally, a timeout of 25 seconds is given. If Apache doesn't start within 25seconds, then an exception will be raised.

The ping command specifies which socket to connect to in order to check whetherthe daemon is ready. It can also be aProc which returns true or false. If the ProcraisesErrno::ECONNREFUSED, then that's also interpreted by DaemonControlleras meaning that the daemon isn't responding yet.

Aftercontroller.start has returned, we can continue with the test case. Atthis point, we know that Apache is done with initializing.When we're done with Apache, we stop it withcontroller.stop. This does notreturn until Apache has fully stopped.

The cautious reader might notice that the socket returned by the ping commandis never closed. That's true, because DaemonController will close itautomatically for us, if it notices that the ping command proc's return valueresponds to#close.

From this example, it becomes apparent that for daemon_controller to work, youmust know how to start the daemon, how to contact the daemon, and you must knowwhere it will put its PID file and log file.

Tutorial #2: Sphinx indexing and search server management

We at Phusion are currently developing a web application with full-text searchcapabilities, and we're using Sphinx for this purpose. We want to make thelives of our developers and our system administrators as easy as possible, sothat there's little room for human screw-up, and so we've developed thislibrary. Our Sphinx search daemon is completely managed through this libraryand is automatically started on demand.

Our Sphinx config file is generated from an ERB template. This ERB templatewrites different values in the config file, depending on whether we're indevelopment, test or production mode. We will want to regenerate this configfile every time, just before we start the search daemon.But there's more. The search daemon will fail if there is no search index. If anew developer has just checked out the application's source code, then there isno search index yet. We don't want him to go through the pain of having togenerate the index manually. (That said, it isn't that much of a pain, but it'sjust yet-another-thing to do, which can and should be automated.) So beforestarting the daemon, we will also want to check whether the index exists. Ifnot, then we'll generate it, and then start the daemon. Of course, no two Railsprocesses may generate the config file or the index at the same time.

When querying the search server, we will want to automatically start it if itisn't running.

This can be achieved with the following code:

require"daemon_controller"classSearchServerSEARCH_SERVER_PORT=1234definitialize@controller=DaemonController.new(identifier:"Sphinx search server",start_command:"searchd -c config/sphinx.conf",before_start:method(:before_start),ping_command:[:tcp,"localhost",SEARCH_SERVER_PORT],pid_file:"tmp/pids/sphinx.pid",log_file:"log/sphinx.log")enddefquery(search_terms)socket=@controller.connectdoTCPSocket.new("localhost",SEARCH_SERVER_PORT)endsend_query(socket,search_terms)retrieve_results(socket)endprivatedefbefore_startgenerate_configuration_fileif !index_exists?generate_indexendend# ...end

Notice thebefore_start: option. We pass a block of code which is to be run,just before the daemon is started. This block, along with starting the daemon,is completely serialized. That is, if you're inside the block, then it'sguaranteed that no other process is running this block at the same time as well.

The#query method is the method for querying the search server with searchterms. It returns a list of result. It usesDaemonController#connect: onepasses a block of that method, which contains code for connecting to thedaemon. If the block returns nil, or if it raisesErrno::ECONNREFUSED, thenDaemonController#connect will automatically take care of auto-starting theSphinx daemon for us.

A little bit of history

The issue of managing daemons has been a thorn in our eyes for quite some timenow. Until now, we've solved this problem by equipping any daemons that wewrite with the ability to gracefully handle being concurrently started, theability to initialize as much as possiblebefore forking into the background,etc. However, equipping all this robustness into our code over and over is alot of work. We've considered documenting a standard behavior for daemons sothat they can properly support auto-starting and such.

However, we've recently realized that that's probably a futile effort.Convincing everybody to write a lot of code for a bit more robustness isprobably not realistic. So we took the pragmatic approach and developed alibrary which adds more robustness on top of daemons' existing behavior. Andthus, daemon_controller was born. It is a little bit less efficient compared towhen the daemon is designed from the beginning with such abilities in mind, butit's compatible with virtually all daemons, and is easy to use.

Concurrency and compatibility notes

DaemonController uses a lock file and the RubyFile#flock API to guaranteesynchronization. This has a few implications:

On most Ruby implementations, including MRI,File#flock is implementedwith the POSIXflock() system call or the Windows file locking APIs.This kind of file locking works pretty much the way we expect it would.Multiple threads can safely use daemon_controller concurrently. Multipleprocesses can safely use daemon_controller concurrently. There will be norace conditions.
Howeverflock() is not implemented on Solaris. daemon_controller, ifused in MRI does not currently work on Solaris. You need to use JRubywhich does not useflock() to implementFile#flock.
On JRubyFile#flock is implemented through the Java file locking API,which on Unix is implemented with thefcntl() system calls. This is adifferent kind of lock with very strange semantics.
- Ifany process/thread closes the lock file, then the lock on that filewill be removed even if that process/thread never requested a lock.
- Fcntl locks are usually implemented indepedently fromflock() locks soif a file is already locked withflock() thenfcntl() will not blockwhen.
- The JVM's file locking API only allows inter-process synchronization. Itcannot be used to synchronize threads. If a thread has obtained a filelock, then another thread within the same JVM process will not block upontrying to lock the same file.
In other words, if you're on JRuby then don't concurrently accessdaemon_controller from multiple threads without manual locking. Also becareful with mixing MRI processes that use daemon_controller with JRubyprocesses that use daemon_controller.

API documentation

Detailed API documentation is available here:

Configuration options
Stop flow
Inline comments inlib/daemon_controller.rb.

About

A library for implementing daemon management capabilities.

blog.phusion.nl/2008/08/25/daemon_controller-a-library-for-robust-daemon-management/

Releases4

release-3.0.2 Latest

Nov 18, 2025

+ 3 releases

Packages

No packages published

Movatterモバイル変換

License

FooBarWidget/daemon_controller

Folders and files

Latest commit

History

Repository files navigation

Introduction

Installation

What is it for?

Starting daemons is a hassle

Daemon starting code may not be robust or efficient

Startup error handling

Stale or corrupt Pid files

Implementation issues

Introducing daemon_controller

Starting a daemon

Stopping a daemon

Connecting to a daemon, starting it if it isn't running

Checking whether a daemon is running

All failures are reported via exceptions

Lots and lots of error checking

What about Monit/God?

Tutorial #1: controlling Apache

Tutorial #2: Sphinx indexing and search server management

A little bit of history

Concurrency and compatibility notes

API documentation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases4

Packages0

Uh oh!

Contributors7

Uh oh!

Languages

Packages