pt-stalk¶
NAME¶
pt-stalk - Collect forensic data about MySQL when problems occur.
SYNOPSIS¶
Usage¶
pt-stalk[OPTIONS]
pt-stalk waits for a trigger condition to occur, then collects datato help diagnose problems. The tool is designed to run as a daemon with rootprivileges, so that you can diagnose intermittent problems that you cannotobserve directly. You can also use it to execute a custom command, or tocollect data on demand without waiting for the trigger to occur.
RISKS¶
Percona Toolkit is mature, proven in the real world, and well tested,but all database tools can pose a risk to the system and the databaseserver. Before using this tool, please:
Read the tool’s documentation
Review the tool’s known “BUGS”
Test the tool on a non-production server
Backup your production server and verify the backups
DESCRIPTION¶
Sometimes a problem happens infrequently and for a short time, giving you nochance to see the system when it happens. How do you solve intermittent MySQLproblems when you can’t observe them? That’s whypt-stalk exists. In addition tousing it when there’s a known problem on your servers, it is a good idea to runpt-stalk all the time, even when you think nothing is wrong. You willappreciate the data it collects when a problem occurs, because problems such asMySQL lockups or spikes in activity typically leave no evidence to use in rootcause analysis.
pt-stalk does two things: it watches a MySQL server and waits for a triggercondition to occur, and it collects diagnostic data when that trigger occurs.To avoid false-positives caused by short-lived problems, the trigger conditionmust be true at least--cycles
times before a--collect
is triggered.
To usept-stalk effectively, you need to define a good trigger. A good triggeris sensitive enough to fire reliably when a problem occurs, so that you don’tmiss a chance to solve problems. On the other hand, a good trigger isn’tprone to false positives, so you don’t gather information when the serveris functioning normally.
The most reliable triggers for MySQL tend to be the number of connections to theserver, and the number of queries running concurrently. These are available inthe SHOW GLOBAL STATUS command as Threads_connected and Threads_running.Sometimes Threads_connected is not a reliable indicator of trouble, butThreads_running usually is. Your job, as the tool’s user, is to define anappropriate trigger condition for the tool. Choose carefully, because thequality of your results will depend on the trigger you choose.
You define the trigger with the--function
,--variable
,--threshold
, and--cycles
options. The default valuesfor these options define a reasonable trigger, but you should adjustor change them to suite your particular system and needs.
By default,pt-stalk tool watches MySQL forever until the trigger occurs,then it collects diagnostic data for a while, and sleeps afterwards to avoidrepeatedly collecting data if the trigger remains true. The general order ofoperations is:
whiletrue;doif--variablefrom--function>--threshold;thencycles_true++ifcycles_true>=--cycles;then--notify-by-emailif--collect;thenif--disk-bytes-freeand--disk-pct-freeok;then(--collectfor--run-timeseconds)&firmfilesin--destolderthan--retention-timefiiter++cycles_true=0fiifiter<--iterations;thensleep--sleepsecondselsebreakfielseifiter<--iterations;thensleep--intervalsecondselsebreakfifidonermold--destfilesolderthan--retention-timeif--collectprocessarestillrunning;thenwaitupto--run-time*3secondskillanyremaining--collectprocessesfi
The diagnostic data is written to files whose names begin with a timestamp, soyou can distinguish samples from each other in case the tool collects datamultiple times. The pt-sift tool is designed to help you browse and analyzethe resulting data samples.
Although this sounds simple enough, in practice there are a number ofsubtleties, such as detecting when the disk is beginning to fill up so that thetool doesn’t cause the server to run out of disk space. This tool handles thesetypes of potential problems, so it’s a good idea to use this tool instead ofwriting something from scratch and possibly experiencing some of the hazardsthis tool is designed to avoid.
CONFIGURING¶
You can use standard Percona Toolkit configuration files to set command lineoptions.
You will probably want to run the tool as a daemon and customize at least the--threshold
. Here’s a sample configuration file for triggering whenthere are more than 20 queries running at once:
daemonizethreshold=20
If you don’t run the tool as root, then you will need specify several options,such as--pid
,--log
, and--dest
, else the tool will probablyfail to start.
OPTIONS¶
- --ask-pass¶
Prompt for a password when connecting to MySQL.
- --collect¶
default: yes; negatable: yes
Collect diagnostic data when the trigger occurs. Specify
--no-collect
to make the tool watch the system but not collect data.See also
--stalk
.
- --collect-gdb¶
Collect GDB stacktraces. This is achieved by attaching to MySQL and printingstack traces from all threads. This will freeze the server for some period oftime, ranging from a second or so to much longer on very busy systems with a lotof memory and many threads in the server. For this reason, it is disabled bydefault. However, if you are trying to diagnose a server stall or lockup,freezing the server causes no additional harm, and the stack traces can be vitalfor diagnosis.
In addition to freezing the server, there is also some risk of the servercrashing or performing badly after GDB detaches from it.
- --collect-oprofile¶
Collect oprofile data. This is achieved by starting an oprofile session,letting it run for the collection time, and then stopping and saving theresulting profile data in the system’s default location. Please read yoursystem’s oprofile documentation to learn more about this.
- --collect-strace¶
Collect strace data. This is achieved by attaching strace to the server, whichwill make it run very slowly until strace detaches. The same cautions apply asthose listed in –collect-gdb. You should not enable this option together with–collect-gdb, because GDB and strace can’t attach to the server processsimultaneously.
- --collect-tcpdump¶
Collect tcpdump data. This option causes tcpdump to capture all traffic on allinterfaces for the port on which MySQL is listening. You can later usept-query-digest to decode the MySQL protocol and extract a log of query trafficfrom it.
- --config¶
type: string
Read this comma-separated list of config files. If specified, this must be thefirst option on the command line.
- --cycles¶
type: int; default: 5
How many times
--variable
must be greater than--threshold
before triggering--collect
. This helps prevent false positives, and makesthe trigger condition less likely to fire when the problem recovers quickly.
- --daemonize¶
Daemonize the tool. This causes the tool to fork into the background and logits output as specified in –log.
- --defaults-file¶
short form: -F; type: string
Only read mysql options from the given file. You must give an absolutepathname.
- --dest¶
type: string; default: /var/lib/pt-stalk
Where to save diagnostic data from
--collect
. Each time the toolcollects data, it writes to a new set of files, which are named with thecurrent system timestamp.
- --disk-bytes-free¶
type: size; default: 100M
Do not
--collect
if the disk has less than this much free space.This prevents the tool from filling up the disk with diagnostic data.If the
--dest
directory contains a previously captured sample of data,the tool will measure its size and use that as an estimate of how much data islikely to be gathered this time, too. It will then be even more pessimistic,and will refuse to collect data unless the disk has enough free space to holdthe sample and still have the desired amount of free space. For example, ifyou’d like 100MB of free space and the previous diagnostic sample consumed100MB, the tool won’t collect any data unless the disk has 200MB free.Valid size value suffixes are k, M, G, and T.
- --disk-pct-free¶
type: int; default: 5
Do not
--collect
if the disk has less than this percent free space.This prevents the tool from filling up the disk with diagnostic data.This option works similarly to
--disk-bytes-free
but specifies apercentage margin of safety instead of a bytes margin of safety.The tool honors both options, and will not collect any data unless bothmargins are satisfied.
- --function¶
type: string; default: status
What to watch for the trigger. The default value watches
SHOWGLOBALSTATUS
, but you can also watchSHOWPROCESSLIST
and specifya file with your own custom code. This function supplies the value of--variable
, which is then compared against--threshold
to see if thethe trigger condition is met. Additional options may be required aswell; see below. Possible values are:status
Watch
SHOWGLOBALSTATUS
for the trigger. The value of--variable
then defines which status counter is the trigger.processlist
Watch
SHOWFULLPROCESSLIST
for the trigger. The triggervalue is the count of processes whose--variable
column matches the--match
option. For example, to trigger--collect
when more than10 processes are in the “statistics” state, specify:--functionprocesslist\--variableState\--matchstatistics\--threshold10
In addition, you can specify a file that contains your custom triggerfunction, written in Unix shell script. This can be a wrapper that executesanything you wish. If the argument to
--function
is a file, then ittakes precedence over built-in functions, so if there is a file in the workingdirectory named “status” or “processlist” then the tool will use that fileeven though are valid built-in values.The file works by providing a function called
trg_plugin
, and the toolsimply sources the file and executes the function. For example, the filemight contain:trg_plugin(){mysql$EXT_ARGV-e"SHOW ENGINE INNODB STATUS"\|grep-c"has waited at"}
This snippet will count the number of mutex waits inside InnoDB. Itillustrates the general principle: the function must output a number, which isthen compared to
--threshold
as usual. The$EXT_ARGV
variablecontains the MySQL options mentioned in the “SYNOPSIS” above.The file should not alter the tool’s existing global variables. Prefix anyfile-specific global variables with
PLUGIN_
or make them local.
- --help¶
Print help and exit.
- --host¶
short form: -h; type: string
Host to connect to.
- --interval¶
type: int; default: 1
How often to check the if trigger is true, in seconds.
- --iterations¶
type: int
How many times to
--collect
diagnostic data. By default, the toolruns forever and collects data every time the trigger occurs.Specify--iterations
to collect data a limited number of times.This option is also useful with--no-stalk
to collect data once andexit, for example.
- --log¶
type: string; default: /var/log/pt-stalk.log
Print all output to this file when daemonized.
- --match¶
type: string
The pattern to use when watching SHOW PROCESSLIST. See
--function
for details.
- --password¶
short form: -p; type: string
Password to use when connecting.If password contains commas they must be escaped with a backslash: “exam,ple”
- --pid¶
type: string; default: /var/run/pt-stalk.pid
Create the given PID file. The tool won’t start if the PID file alreadyexists and the PID it contains is different than the current PID. However,if the PID file exists and the PID it contains is no longer running, thetool will overwrite the PID file with the current PID. The PID file isremoved automatically when the tool exits.
- --plugin¶
type: string
Load a plugin to hook into the tool and extend is functionality.The specified file does not need to be executable, nor does its first lineneed to be shebang line. It only needs to define one or more of theseBash functions:
before_stalk
Called before stalking.
before_collect
Called when the trigger occurs, before running a
--collect
subprocesses in the background.after_collect
Called after running a collector process. The PID of the collector processis passed as the first argument. This hook is called before
after_collect_sleep
.after_collect_sleep
Called after sleeping
--sleep
seconds for the collector process to finish.This hook is called afterafter_collect
.after_interval_sleep
Called after sleeping
--interval
seconds after each trigger check.after_stalk
Called after stalking. Sincept-stalk stalks forever by default,this hook is only called if
--iterations
is specified.For example, a very simple plugin that touches a file when
--collect
is triggered:before_collect(){touch/tmp/foo}
Since the plugin is completely sourced (imported) into the tool’s namespace,be careful not to define other functions or global variables that alreadyexist in the tool. You should prefix all plugin-specific functions andglobal variables with
plugin_
orPLUGIN_
.Plugins have access to all command line options but they should not modifythem. Each option is a global variable like
$OPT_DEST
which correspondsto--dest
. Therefore, the global variable for each command line optionisOPT_
plus the option name in all caps with hyphens replaced byunderscores.Plugins can stop the tool by setting the global variable
OKTORUN
to1
. In this case, the global variableEXIT_REASON
should alsobe set to indicate why the tool was stopped.Plugin writers should keep in mind that the file destination prefix currentlyin use should be accessed through the
$prefix
variable, rather than$OPT_PREFIX
.
- --mysql-only¶
Trigger only MySQL related captures, ignoring all others. The only not MySQL relatedvalue being collected is the disk space, because it is needed to calculate theavailable free disk space to write the result files.This option is useful for RDS instances.
- --port¶
short form: -P; type: int
Port number to use for connection.
- --prefix¶
type: string
The filename prefix for diagnostic samples. By default, all files createdby the same
--collect
instance have a timestamp prefix based on the currentlocal time, like2011_12_06_14_02_02
, which is December 6, 2011 at 14:02:02.
- --retention-count¶
type: int; default: 0
Keep the data for the last N runs. If N > 0, the program will keep the data for the lastN runs and will delete the older data.
- --retention-size¶
type: int; default: 0
Keep up to –retention-size MB of data. It will keep at least 1 run even if the size is biggerthan the specified in this parameter
- --retention-time¶
type: int; default: 30
Number of days to retain collected samples. Any samples that are older will bepurged.
- --run-time¶
type: int; default: 30
How long to
--collect
diagnostic data when the trigger occurs.The value is in seconds and should not be longer than--sleep
. It isusually not necessary to change this; if the default 30 seconds doesn’tcollect enough data, running longer is not likely to help because the systemor MySQL server is probably too busy to respond. In fact, in many cases ashorter collection period is appropriate.This value is used two other times. After collecting, the collect subprocesswill wait another
--run-time
seconds for its commands to finish. Somecommands can take awhile if the system is running very slowly (which canlikely be the case given that a collection was triggered). Since empty filesare deleted, the extra wait gives commands time to finish and write theirdata. The value is potentially used again just before the tool exits to waitagain for any collect subprocesses to finish. In most cases this won’thappen because of the aforementioned extra wait. If it happens, the toolwill log “Waiting up to N seconds for subprocesses to finish…” where N isthree times--run-time
. In both cases, after waiting, the tool killsall of its subprocesses.
- --sleep¶
type: int; default: 300
How long to sleep after
--collect
. This prevents the toolfrom triggering continuously, which might be a problem if the collection process is intrusive.It also prevents filling up the disk or gathering too much data to analyzereasonably.
- --sleep-collect¶
type: int; default: 1
How long to sleep between collection loop cycles. This is useful with
--no-stalk
to do long collections. For example, to collect data everyminute for an hour, specify:--no-stalk--run-time3600--sleep-collect60
.
- --socket¶
short form: -S; type: string
Socket file to use for connection.
- --stalk¶
default: yes; negatable: yes
Watch the server and wait for the trigger to occur. Specify
--no-stalk
to collect diagnostic data immediately, that is, without waiting for thetrigger to occur. You probably also want to specify values for--interval
,--iterations
, and--sleep
. For example, toimmediately collect data for 1 minute then exit, specify:--no-stalk--run-time60--iterations1
--cycles
,--daemonize
,--log
and--pid
have no effectwith--no-stalk
. Safeguard options, like--disk-bytes-free
and--disk-pct-free
, are still respected.See also
--collect
.
- --system-only¶
Trigger only operating system related captures, ignoring all others.
- --threshold¶
type: int; default: 25
The maximum acceptable value for
--variable
.--collect
istriggered when the value of--variable
is greater than--threshold
for--cycles
many times. Currently, there is no way to define a lowerthreshold to check for a--variable
value that is too low.See also
--function
.
- --user¶
short form: -u; type: string
User for login if not current user.
- --variable¶
type: string; default: Threads_running
The variable to compare against
--threshold
. See also--function
.
- --verbose¶
type: int; default: 2
Print more or less information while running. Since the tool is designedto be a long-running daemon, the default verbosity level only prints themost important information. If you run the tool interactively, you maywant to use a higher verbosity level.
LEVELPRINTS==========================================0Errors1Warnings2Matchingtriggersandcollectioninfo3Non-matchingtriggers
- --version¶
Print tool’s version and exit.
ENVIRONMENT¶
This tool does not require any environment variables for configuration,although it can be influenced to work differently by through severalvariables. Keep in mind that these are expert settings, and should notbe used in most cases.
Specifically, the variables that can be set are:
CMD_GDB
CMD_IOSTAT
CMD_MPSTAT
CMD_MYSQL
CMD_MYSQLADMIN
CMD_OPCONTROL
CMD_OPREPORT
CMD_PMAP
CMD_STRACE
CMD_SYSCTL
CMD_TCPDUMP
CMD_VMSTAT
For example, during collection iostat is called with a -dx argument, butbecause you have an NFS partition, you also need the -n flag there. Insteadof editing the source, you can callpt-stalk as
CMD_IOSTAT="iostat -n"pt-stalk...
which will do exactly what you need. Combined with the plugin hooks, thisgives you a fine-grained control of what the tool does.
It is possible to enabledebug
mode in mysqladmin specifying:
CMD_MYSQLADMIN='mysqladmindebug':program:`pt-stalk`params...
SYSTEM REQUIREMENTS¶
This tool requires Bash v3 or newer. Certain options require other programs:
--collect-gdb
requiresgdb
--collect-oprofile
requiresopcontrol
andopreport
--collect-strace
requiresstrace
--collect-tcpdump
requirestcpdump
BUGS¶
For a list of known bugs, seehttps://jira.percona.com/projects/PT/issues.
Please report bugs athttps://jira.percona.com/projects/PT.Include the following information in your bug report:
Complete command-line used to run the tool
Tool
--version
MySQL version of all servers involved
Output from the tool including STDERR
Input files (log/dump/config files, etc.)
If possible, include debugging output by running the tool withPTDEBUG
;see “ENVIRONMENT”.
ATTENTION¶
Using <PTDEBUG> might expose passwords. When debug is enabled, all command lineparameters are shown in the output.
DOWNLOADING¶
Visithttp://www.percona.com/software/percona-toolkit/ to download thelatest release of Percona Toolkit. Or, get the latest release from thecommand line:
wgetpercona.com/get/percona-toolkit.tar.gzwgetpercona.com/get/percona-toolkit.rpmwgetpercona.com/get/percona-toolkit.deb
You can also get individual tools from the latest release:
wgetpercona.com/get/TOOL
ReplaceTOOL
with the name of any tool.
AUTHORS¶
Baron Schwartz, Justin Swanhart, Fernando Ipar, Daniel Nichter,and Brian Fraser
ABOUT PERCONA TOOLKIT¶
This tool is part of Percona Toolkit, a collection of advanced command-linetools for MySQL developed by Percona. Percona Toolkit was forked from twoprojects in June, 2011: Maatkit and Aspersa. Those projects were created byBaron Schwartz and primarily developed by him and Daniel Nichter. Visithttp://www.percona.com/software/ to learn about other free, open-sourcesoftware from Percona.
COPYRIGHT, LICENSE, AND WARRANTY¶
This program is copyright 2011-2024 Percona LLC and/or its affiliates,2010-2011 Baron Schwartz.
THIS PROGRAM IS PROVIDED “AS IS” AND WITHOUT ANY EXPRESS OR IMPLIEDWARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OFMERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.
This program is free software; you can redistribute it and/or modify it underthe terms of the GNU General Public License as published by the Free SoftwareFoundation, version 2; OR the Perl Artistic License. On UNIX and similarsystems, you can issue `man perlgpl’ or `man perlartistic’ to read theselicenses.
You should have received a copy of the GNU General Public License along withthis program; if not, write to the Free Software Foundation, Inc., 59 TemplePlace, Suite 330, Boston, MA 02111-1307 USA.
VERSION¶
pt-stalk 3.7.0