- Notifications
You must be signed in to change notification settings - Fork3
cloudfoundry/sipid
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
sipid
intends to give BOSH release authors an easier way to manage pidfiles. Pidfiles are used by Monit (and thereforeBOSH) to track which process should be monitored. It is the responsibility of a BOSH job to write its process ID (PID)to the pidfile during start, and reference the same pidfile to find the process to kill during stop.
Correct pidfile management has a couple potential pitfalls, since your scripts may be called multiple times and resultin race conditions. To make this simpler,sipid
provides simpleclaim
andkill
commands that manage the trickiestparts of pidfiles.
sipid claim --pid PID --pid-file PID_FILE
will write the given process's PID to the PID_FILE. It's algorithm looksroughly like this:
- Ensure the directory referenced by the PID_FILE exists. Create it if it does not
- Place a file lock (flock) on the PID_FILE. If the PID_FILE isalready locked, this implies another process is attempting to claim the PID_FILE, and the current process shouldabort.
- If the PID_FILE already exists, attempt to find a process with the PID in the PID_FILE. If that process is running,the current process should not attempt to claim the PID_FILE and continue with startup, and should abort.
- If the process has not yet aborted, it is safe to write the given PID to the PID_FILE and give up the file lock.It is now safe to continue starting up the BOSH job.
#!/usr/bin/env bashRUN_DIR="/var/vcap/sys/run/example-job"PIDFILE="$RUN_DIR/web.pid"mkdir -p "$RUN_DIR"sipid claim --pid "$$" --pid-file "$PIDFILE"exec chpst -u vcap:vcap /var/vcap/packages/example-job/bin/web
#!/usr/bin/env bashRUN_DIR="/var/vcap/sys/run/example-job"PIDFILE="$RUN_DIR/web.pid"mkdir -p "$RUN_DIR"start-stop-daemon \ --pidfile "$PIDFILE" \ --make-pidfile \ --chuid vcap:vcap \ --start \ --exec /var/vcap/packages/example-job/bin/web -- \ --extra arguments \ --to-your process
sipid kill --pid-file PID_FILE [--show-stacks]
will kill the process given by the PID_FILE. Monit only allows a shorttime to stop a process, so we must kill the process aggressively if it does not clean itself up within a 20-secondgrace period. The algorithm looks roughly like this:
- Get the PID in the PID_FILE
- If there is no running process with that PID, there is nothing to do, so exit.
- Send
SIGTERM
(i.e. a normalkill "$PID"
) to the process to give it time to clean up. - Poll the process for 20 seconds. If it has quit on its own, exit.
- If the process has not exited after 20 seconds, send a
SIGKILL
to the process to force it to exit immediately. - Finally, remove the pidfile
- This is to prevent a future
claim
from failing if the PID is reused by a different process later
- This is to prevent a future
If the--show-stacks
parameter is provided to sipid, before sendingSIGKILL
, it will attempt to get the process todump its stack traces by sendingSIGQUIT
(i.e.kill -3 "$PID"
) to aid with debugging a "stuck" process. Not allprocesses respond toSIGQUIT
, and if yours does not, you may wish to implement aSIGQUIT
handler to make debuggingmore consistent for operators.
#!/usr/bin/env bash# If a command fails, exit immediatelyset -ePIDFILE="/var/vcap/sys/run/example-job/web.pid"sipid kill --pid-file "$PIDFILE" --show-stacks
#!/usr/bin/env bash# If a command fails, exit immediatelyset -ePIDFILE="/var/vcap/sys/run/example-job/web.pid"start-stop-daemon \ --pidfile "$PIDFILE" \ --remove-pidfile \ --retry TERM/20/QUIT/1/KILL \ --oknodo \ --stop
sipid wait-until-healthy --url HEALTHCHECK_URL [--timeout DURATION (default 1m)] [--polling-frequency DURATION (default 5s)]
will continually poll a healthcheck endpoint (at the requested frequency, until the requested timeout) until it returnsan HTTP 200 status code. If the healthcheck is not healthy by the timeout deadline, the process will exit non-zero.
#!/usr/bin/env bash# If a command fails, exit immediatelyset -esipid wait-until-healthy --url https://127.0.0.1:58074/healthcheck --timeout 2m --polling-frequency 1s
To see examples ofsipid
in action, look at the scripts in theexample/ directory.