Movatterモバイル変換

pgbadger
Prev	Up	G.1. Client Applications	Home	Next

pgbadger

pgbadger — rapidly analyzePostgres Pro logs, producing detailed reports and graphs

Synopsis

pgbadger [connection-option...] [option...] [logfile...]

Description

pgbadger is aPostgres Pro/PostgreSQL log analyzer, which rapidly provides detailed reports based on your log files.pgbadger is provided withPostgres Pro as a standalone Perl script.

logfile can be a single log file, a list of files or a shell command that returns a list of files. To get log content from the standard input, pass“-” aslogfile.

pgbadger can parse huge log files and compressed files. It can autodetect your log file format (syslog, stderr, csvlog or jsonlog) if the file is long enough. Supported compressed formats are gzip, bzip2, lz4, xz, zip and zstd. For the xz format, you must have an xz version higher than 5.05, which supports the--robot option. Forpgbadger to determine the uncompressed file size for the lz4 format, the file must be compressed with the--content-size option.

pgbadger supports any format of log-line prefixes that can be specified through thelog_line_prefix configuration setting of yourpostgresql.conf configuration file, provided that at least%t and%p are specified.

pgbouncer log files can also be parsed.

To speed up log parsing, you can use any of these multiprocessing modes: one core per log file and multiple cores per file. These modes can be combined.

pgbadger can also parse remote log files fetched using a passwordless SSH connection. This mode can be used with compressed files and even supports multiprocessing with multiple cores per file.

Examples of reports can be found athttps://pgbadger.darold.net/#reports.

Limitations

pgbadger currently has the following limitations:

Multiprocessing is not supported for compressed log files and CSV files, as well as on Windows.
CSV format of log files cannot be parsed remotely.
csvlog logs cannot be passed from the standard input.

Setup and Configuring

Once you havepgbadger installed, complete the setup described in sections below.

[Optional] Set up Parsing Specific Log Formats

If you plan to parse CSV log files, installText::CSV_XS Perl module.

[Optional] Set up Export of Statistics

If you want to export statistics as a JSON file, installJSON::XS Perl module:

To install this optional module:

On a Debian-based system, run:
```
sudo apt-get install libjson-xs-perl
```
On an RPM system, run:
```
sudo yum install perl-JSON-XS
```

[Optional] Set up Parsing Compressed Log Files

By default,pgbadger autodetects the compressed log file format from the file extension and uses decompression utilities accordingly:

zcat forgz
bzcat forbz2
lz4cat forlz4
zstdcat forzst
unzip orxz forzip orxz

If the needed utility is outside of yourPATH directories, use the--zcat command-line option to specify the path to the decompression utility. For example:

--zcat="/usr/local/bin/gunzip -c" or --zcat="/usr/local/bin/bzip2 -dc"--zcat="C:\tools\unzip -p"

Note

With the default autodetection of the compressed file format, you can mix gz, bz2, lz4, xz, zip and zstd log files. Once you specified a custom value of--zcat, mixing compressed files of different formats is longer possible.

Configure YourPostgres Pro Server

Set the values of certain configuration parameters in yourpostgresql.conf:

Set up logging SQL queries.
To enable SQL query logging and have the query statistics include actual query strings, set
```
log_min_duration_statement = 0
```
On a busy server you may want to increase this value to only log queries with a longer duration.
If you just want to report the duration and number of queries and do not want details of queries, setlog_min_duration_statement to -1, which disables logging statement durations, and enablelog_duration.
Enablinglog_min_duration_statement will add reports about slowest queries and queries that took the most time. Note that if you setlog_statement toall, the setting oflog_min_duration_statement will have no effect.
Warning
Avoid settinglog_min_duration_statement to a non-positive value together with enablinglog_duration andlog_statement as this will result in wrong counter values and drastically increase the size of your log. Always prefer settinglog_min_duration_statement.
Set the log line prefix string inlog_line_prefix.
It must at least include a time escape sequence (%t,%m or%n) and the process-related escape sequence (%p or%c). For example, for stderr logs, the setting must be at least
```
log_line_prefix = '%t [%p]: '
```
The log line prefix could also specify the user, database name, application name and client IP address. For example,
for stderr logs:
```
log_line_prefix = '%t [%p]: user=%u,db=%d,app=%a,client=%h '
```
or
```
log_line_prefix = '%t [%p]: db=%d,user=%u,app=%a,client=%h '
```
and for syslog logs:
```
log_line_prefix = 'user=%u,db=%d,app=%a,client=%h '
```
or
```
log_line_prefix = 'db=%d,user=%u,app=%a,client=%h '
```
To get more information from your log files, set certain configuration parameters as follows:
```
log_checkpoints = onlog_connections = onlog_disconnections = onlog_lock_waits = onlog_temp_files = 0log_autovacuum_min_duration = 0log_error_verbosity = default
```
To benefit from these settings, do not enablelog_statement aspgbadger does not parse the corresponding log format.
Set the language in which messages are displayed; messages must be in English with or without locale support:
```
lc_messages='C'
```
or
```
lc_messages='en_US.UTF-8'
```
Locales for other languages, such asru_RU.utf8, are not supported.

Usage

The following are simple examples to illustrate miscellaneouspgbadger usage details.

pgbadger /var/lib/pgpro/std-11/data/log/postgresql-2022-01-14_000000.logpgbadger /var/lib/pgpro/std-11/data/log/postgres.log.2.gz /var/lib/pgpro/std-11/data/log/postgres.log.1.gz /var/lib/pgpro/std-11/data/log/postgresql-2022-01-14_000000.logpgbadger /var/lib/pgpro/std-11/data/log/postgresql/postgresql-2022-01-*pgbadger --exclude-query="^(COPY|COMMIT)" /var/lib/pgpro/std-11/data/log/postgresql-2022-01-14_000000.logpgbadger -b "2022-01-25 10:56:11" -e "2022-01-25 10:59:11" /var/lib/pgpro/std-11/data/log/postgresql-2022-01-25-0000.log cat /var/lib/pgpro/std-11/data/log/postgresql-2022-01-14_000000.log | pgbadger -# Log line prefix with stderr log outputpgbadger --prefix '%t [%p]: user=%u,db=%d,client=%h' /var/lib/pgpro/std-11/data/log/postgresql-2022-08-21*pgbadger --prefix '%m %u@%d %p %r %a : ' /var/lib/pgpro/std-11/data/log/postgresql-2022-08-21-0000.log# Log line prefix with syslog log outputpgbadger --prefix 'user=%u,db=%d,client=%h,appname=%a' /var/lib/pgpro/std-11/data/log/postgresql-2022-08-21*# Use 8 CPUs to parse 10GB file fasterpgbadger -j 8 /var/lib/pgpro/std-11/data/log/postgresql-2022-08-21-0000.log# Use a cron job to report errors weekly30 23 * * 1 /usr/bin/pgbadger -q -w /var/lib/pgpro/std-11/data/log/postgresql-2022-01*.log -o /var/www/pg_reports/pg_errors.html

Specifying Remote Log Files

Specify remote log files to parse using a URI. Supported protocols are HTTP[S] and [S]FTP. Thecurl command will be used to download the file, and the file will be parsed during download. The SSH protocol is also supported and will use thessh command to get log files, as when the--remote-host option is used.

Use these URI notations for the remote log file:

pgbadger http://172.12.110.1//var/lib/pgpro/std-11/data/log/postgresql-2022-01-14_000000.logpgbadger ftp://username@172.12.110.14/postgresql-2022-01-14_000000.logpgbadger ssh://username@172.12.110.14:2222//var/lib/pgpro/std-11/data/log/postgresql-2022-01-14_000000.log*

You can parse a localPostgres Pro log and a remotepgbouncer log file together:

pgbadger /var/lib/pgpro/std-11/data/log/postgresql-2022-01-14_000000.log ssh://username@172.12.110.14/pgbouncer.log

Parallel Processing

To enable parallel processing, specify the-jN option, whereN is the number of cores to use.

Parallel processing inpgbadger follows the algorithm below:

For each log file  chunk size = int(file size / N)  look at start/end offsets of these chunks  fork N processes and seek to the start offset of each chunk    each process will terminate when the parser reaches the end offset of its chunk    each process writes stats into a binary temporary file  wait for all child processes to terminateAll binary temporary files generated will then be read and loaded intomemory to build the html output.

With this method, at start/end of chunkspgbadger may truncate or omit a maximum ofN queries per log file, which is an insignificant gap if you have millions of queries in your log file. The chance that the query that you were looking for is lost is near zero, so this gap can be considered suitable. Most of the time the query is counted twice, but truncated.

When you have many small log files and many CPUs, it is faster to dedicate one core to one log file at a time. To enable this behavior, specify the-JN option instead. Using this method, you can be sure not to lose any queries in the reports. With 200 log files of 10 MB each, the-J option starts being really efficient with 8 cores.

Here is a benchmark done on a server with 8 CPUs and a single file of 9.5 GB.

 Option |  1 CPU  | 2 CPU | 4 CPU | 8 CPU--------+---------+-------+-------+------   -j   | 1h41m18 | 50m25 | 25m39 | 15m58   -J   | 1h41m18 | 54m28 | 41m16 | 34m45

With 200 log files of 10 MB each, which is 2 GB in total, the results are slightly different:

 Option | 1 CPU | 2 CPU | 4 CPU | 8 CPU--------+-------+-------+-------+------   -j   | 20m15 |  9m56 |  5m20 | 4m20   -J   | 20m15 |  9m49 |  5m00 | 2m40

So it is recommended to use the-j option unless you have hundreds of small log files and can use at least 8 CPUs.

Important

During parallel parsing,pgbadger generates a lot of temporary files namedtmp_pgbadgerXXXX.bin in the/tmp directory and removes them at the end.

Building Incremental Reports

The following sample cron job builds a report every week with the incremental behavior assuming that your log file and HTML report are also rotated every week:

0 4 * * 1 /usr/bin/pgbadger -q `find /var/lib/pgpro/std-11/data/log/ -mtime -7 -name "postgresql.log*"` -o /var/www/pg_reports/pg_errors-`date +\%F`.html -l /var/reports/pgbadger_incremental_file.dat

But better turn onpgbadger's automatic building incremental reports by specifyng the-I/--incremental option. In this mode,pgbadger builds one report per day and a cumulative report per week. The output is first built in the binary format and saved to the output directory, specified by the-O/--outdir option, and then daily and weekly reports are built in the HTML format with the main index file. The main index file shows a dropdown menu per week with a link to the week's report and links to daily reports for that week. For example, runpgbadger as follows with the file rotated daily:

0 4 * * * /usr/bin/pgbadger -I -q /var/lib/pgpro/std-11/data/log/postgresql/postgresql.log.1 -O /var/www/pg_reports/

You will have all daily and weekly reports. In this modepgbadger will automatically create an incremental file in the output directory, so you do not have to use the-l option unless you want to change the path of that file. This means that you can runpgbadger in this mode every day on a log file rotated every week and it will not count the log entries twice. To save disk space, you may want to use the-X/--extra-files command-line option to forcepgbadger to write CSS and JavaScript files to the output directory as separate files. The resources will then be loaded using script and link tags.

In the incremental mode, you can also specify the number of weeks to keep in the report by using the-O/--retention option:

/usr/bin/pgbadger --retention 2 -I -q /var/lib/pgpro/std-11/data/log/postgresql/postgresql.log.1 -O /var/www/pg_reports/

Ifpg_dump is scheduled to run at 23:00 and 13:00 every day, you can exclude these periods from the report as follows:

pgbadger --exclude-time "2013-09-.* (23|13):.*" postgresql.log

This will help avoid havingCOPY statements generated bypg_dump on top of the list of slowest queries. Alternatively, you can use--exclude-appname "pg_dump" to solve this problem in a simpler way.

Rebuilding Reports

To update all HTML reports after fixing apgbadger report or adding a new feature to it, you can rebuild incremental reports. To rebuild all reports in the case a binary file is still available, run:

rm /path/to/reports/*.jsrm /path/to/reports/*.csspgbadger -X -I -O /var/www/pg_reports/ --rebuild

This will also update all the resource files (JavaScript and CSS). Use the-E/--explode option if the reports were built with this option.

Building Monthly Reports

By default, in the incremental modepgbadger only computes daily and weekly reports. To have monthly cumulative reports, you will need to use a separate command to specify the report to build. For example, to build a report for August 2021, run:

pgbadger -X --month-report 2021-08 /var/www/pg_reports/

This will add a link to the month name to the calendar view of incremental reports to look at the monthly report. The report for a current month can be run every day, and it is entirely rebuilt each time. The monthly report is not built by default because it could take too long. Like when rebuilding reports, if reports were built with the per-database option (-E/--explode), it must be used to build the monthly report:

pgbadger -E -X --month-report 2021-08 /var/www/pg_reports/

Choosing the Report File Format

Thepgbadger report file format is determined by the extension of the file passed to the-o/--outfile option.

Use the binary format (-o*.bin) to create custom incremental and cumulative reports.

For example, to refresh apgbadger report every hour from a daily log file, you can run the following commands every hour:

# Generate incremental data files in the binary formatpgbadger --last-parsed .pgbadger_last_state_file -o sunday/hourX.bin /var/lib/pgpro/std-11/data/log/postgresql-Sun.log# Build a fresh HTML report from the generated binary filepgbadger sunday/*.bin

For another example, assume that you generate one log file per hour. To have reports rebuilt each time the log file is rotated, run:

pgbadger -o day1/hour01.bin /var/lib/pgpro/std-11/data/log/postgresql-2022-01-23_10.logpgbadger -o day1/hour02.bin /var/lib/pgpro/std-11/data/log/postgresql-2022-01-23_11.logpgbadger -o day1/hour03.bin /var/lib/pgpro/std-11/data/log/postgresql-2022-01-23_12.log...

And to refresh the HTML report, for example, each time after a new binary file is generated, just run:

pgbadger -o day1_report.html day1/*.bin

Adjust the commands to your particular needs.

Use the JSON format (-o*.json) to share data with other languages and to facilitate integration ofpgbadger output with other monitoring tools, such asCacti orGraphite.

Select other output formats to meet your particular needs. For example, this command will generateTsung sessions XML file forSELECT queries only:

  pgbadger -S -o sessions.tsung --prefix '%t [%p]: user=%u,db=%d ' /var/lib/pgpro/std-11/data/log/postgresql-2022-01-14_000000.log

Options

This section describespgbadger command-line options.

-aminutes --averageminutes

Specifies the number of minutes for which to build average graphs of queries and connections.

Default: 5.

-Aminutes --histo-averageminutes

Specifies the number of minutes for which to build histogram graphs of queries.

Default: 60.

-bdatetime --begindatetime

Specifies the start date/time for the data to be parsed in logs.

-chost --dbclienthost

Only report on log entries for the specified client host.

-C --nocomment

Remove /* ... */ comments from queries.

-dname --dbnamename

Only report on log entries for the specified database.

-D --dns-resolv

Replace client IP addresses with their DNS names.

Warning

This can considerably slow downpgbadger.

-edatetime --enddatetime

Specifies the end date/time for the data to be parsed in logs.

-E --explode

Build one report per each database. Global information not related to any database gets added to thepostgres database report.

-flogtype --formatlogtype

Specifies the log type.

Possible values:syslog,syslog2,stderr,jsonlog,csv,pgbouncer,logplex,rds andredshift. Use whenpgbadger cannot detect the log format.

-G --nograph

Disables graphs in HTML output.

-h --help

Show detailed information aboutpbadger options and exit.

-Hpath --html-outdirpath

Specifies the path to the directory where the HTML report must be written in the incremental mode. Note that binary files remain in the directory specified by-O/--outdir.

-iname --identname

Specifies the program name used to identifyPostgres Pro messages in syslog logs.

Default:postgres.

-I --incremental

Use the incremental mode, where reports will be generated by days in a separate directory specified by the-O/--outdir option.

-jnumber --jobsnumber

Specifies the number of jobs to run at same time. When working with csvlog logs,pgbadger always runs as single job.

Default: 1.

-Jnumber --Jobsnumber

Specifies the number of log files to parse in parallel. When working with csvlog logs, one log at a time is processed.

Default: 1.

-lfilename --last-parsedfilename

Specifies the file where the last datetime and line parsed are registered to allow incremental log parsing. Useful to watch errors since the last run or to get one report per day with the log rotated weekly.

-Lfilename --logfile-listfilename

Specifies the file containing the list of log files to parse.

-msize --maxlengthsize

Specifies the maximum length of a query in reports. Longer queries will be truncated.

Default: 100000.

-M --no-multiline

Turns off collecting multiline statements to avoid reporting excessive information, especially on errors that generate a huge report.

-Nname --appnamename

Only report on log entries for the specified application.

-ofilename --outfilefilename

Specifies the filename for the output and determines the report file format. Can be used multiple times to output several formats. For thejson output, ensure that the Perl moduleJSON::XS is installed. To dump the output to stdout, use the value of“-” as filename.

Default:out.html,out.txt,out.bin,out.json orout.tsung,

for the respective output format.

-Opath --outdirpath

Specifies the directory where the out file must be saved.

-pstring --prefixstring

Specifies the value of your customlog_line_prefix string, as defined in yourpostgresql.conf. Only use if your log line prefix is different fromthe standardlog_line_prefix strings, for example, if your prefix includes additional variables, such as client IP or application name.

-P --no-prettify

Disables SQL query code prettifier.

-q --quiet

Disables printing anything to stdout, even the progress bar.

-Q --query-numbering

Add numbering of queries to the output when used together with--dump-all-queries or--normalized-only.

-raddress --remote-hostaddress

Specifies the host to execute thecat command on a remote log file to parse the file locally.

-Rnumber --retentionnumber

Specifies the number of weeks for which to keep reports in the output directory in the incremental mode. The directories for older weeks and days are automatically removed.

Default: 0 (no reports are removed).

-snumber --samplenumber

Specifies the number of query samples to store.

Default: 3.

-S --select-only

Only report on SELECT queries.

-tnumber --topnumber

Specifies the number of queries to store/display.

Default: 20.

-Tstring --titlestring

Specifies the title of the HTML report page.

-uusername --dbuserusername

Only report on log entries for the specified username.

-Uusername --exclude-userusername

Specifies the username to exclude log entries for from the report. Can be used multiple times.

-v --verbose

Enables the verbose or debug mode.

Default: off.

-V --version

Showpgbadger version and exit.

-w --watch-mode

Only report errors, just likeLogwatch can do.

-W --wide-char

Encode HTML output of queries in UTF8 to avoid Perl messages“Wide character in print”.

-xformat --extensionformat

Specfies the output format. Possible values:text,html,bin,json ortsung.

Default:html.

-X --extra-files

In the incremental mode, write CSS and JavaScript resources to the output directory as separate files.

-zcommand --zcatcommand

Specifies the full command to run thezcat program. Use ifzcat,bzcat orunzip is outside of yourPATH directories.

-Z+/-XX --timezone+/-XX

Specifies the number of hours from GMT for the timezone. Use to adjust date/time in JavaScript graphs.

--pie-limitnumber

Specifies the number such that instead of lower pie data the sum will be shown.

--exclude-queryregexp

Specifies the regular expression such that matching queries will be excluded from the report. For example:“^(VACUUM|COMMIT)”. Can be used multiple times.

--exclude-filefilename

Specifies the path to the file that contains regular expressions to use to exclude matching queries from the report, one expression per line.

--include-queryregexp

Specifies the regular expression such that only matching queries will be included in the report. Can be used multiple times. For example:“(tbl1|tbl2)”.

--include-filefilename

Specifies the path to the file that contains regular expressions to use to include matching queries in the report, one expression per line.

--disable-error

Turns off generation of an error report.

--disable-hourly

Turns off generation of an hourly report.

--disable-type

Turns off generation of a report on queries by type, database and user.

--disable-query

Turns off generation of query reports, such as slowest or most frequent queries, queries by users, by database and so on.

--disable-session

Turns off generation of a session report.

--disable-connection

Turns off generation of a connection report.

--disable-lock

Turns off generation of a lock report.

--disable-temporary

Turns off generation of a report on temporary files.

--disable-checkpoint

Turns off generation of checkpoint/restartpoint reports.

--disable-autovacuum

Turns off generation of an autovacuum report.

--charsetname

Specifies the HTML charset to be used.

Default: utf-8.

--csv-separatorchar

Specifies the CSV field separator.

Default:“,”.

--exclude-timeregexp

Specifies the regular expression such that log entries for any matching timestamp will be excluded from the report. For example:“2013-04-12 .*”. Can be used multiple times.

--include-timeregexp

Specifies the regular expression such that log entries for any matching timestamp will be included in the report. For example:“2013-04-12 .*”. Can be used multiple times.

--exclude-dbname

Specifies the name of the database to exclude related log entries from the report. For example:“outdated_db”. Can be used multiple times.

--exclude-appnamename

Specifies the name of the application to exclude related log entries from the report. For example:“pg_dump”. Can be used multiple time.

--exclude-lineregexp

Specifies the regular expression such that any matching log entry will be excluded from the report. Can be used multiple times.

--exclude-clientaddress

Specifies the client IP/name to exclude related log entries from the report. Can be used multiple times.

--anonymize

Obscure all literals in queries. Useful to hide confidential data.

--noreport

Prevents generation of reports in the incremental mode.

--log-duration

Associate log entries generated bylog_duration = on andlog_statement = all.

--enable-checksum

Add the MD5 sum under each query report.

--journalctlcommand

Specifies the command to produce the information similar to what thePostgres Pro log file contains. Usually, like this:journalctl -u postgrespro-std-11.

--pid-dirpath

Specifies the path to store the PID file.

Default:/tmp.

--pid-filefilename

Specifies the name of the PID file to manage concurrent execution ofpgbadger.

Default:pgbadger.pid.

--rebuild

Rebuild all HTML reports in incremental output directories that contain binary data files.

--pgbouncer-only

Only showpgbouncer-related menu in the header.

--start-monday

In the incremental mode, start calendar weeks on Monday. By default, they start on Sunday.

--iso-week-number

In the incremental mode, start calendar weeks on Monday with ISO 8601 week numbering: 01 to 53, where week 1 is the first week of a year that has at least 4 days.

--normalized-only

Only dump all normalized queries toout.txt.

--log-timezone+/-XX

Specifies the number of hours from GMT for the timezone to adjust date/time read from the log file before parsing. The use of this option makes log search with date/time more complicated.

--prettify-json

Prettify JSON output.

--month-reportYYYY-MM

Specifies the month (YYYY-MM) to create a cumulative HTML report for. Requires incremental output directories to be set and all the necessary binary data files available.

--day-reportYYYY-MM-DD

Specifies the day (YYYY-MM-DD) to create an HTML report for. Requires incremental output directories to be set and all the necessary binary data files available.

--noexplain

Avoid processing log lines generated byauto_explain.

--commandcmd

Specifies the command to run to retrieve log entries on stdin.pgbadger will open a pipe to this command and parse log entries that it generates.

--no-week

Avoid building weekly reports in the incremental mode. Use if building weekly reports takes too long.

--explain-urlURL

Specifes the URL to override the URL of the graphical explain tool.

Default:http://explain.depesz.com/?is_public=0&is_anon=0&plan=

--tempdirpath

Specifies the directory for temporary files.

Default:File::Spec->tmpdir() || '/tmp'.

--no-process-info

Disables changing thepgbadger process title to help identify this process. Useful for systems where changing process titles is not supported.

--dump-all-queries

Dump all queries found in the log file to a text file, replacing bind parameters in the queries at their respective placeholder positions.

--keep-comments

Retains comments in normalized queries. Useful to distinguish between same normalized queries.

--no-progressbar

Disables displaying the progress bar.

Remote Log Connection Options

pgbadger can parse a remote log file fetched using passwordless SSH connection. Use the-r/--remote-host option to set the IP address or name of the target host. More options to define SSH connection parameters are as follows:

--ssh-programssh

Specifies the path to the SSH program to use.

Default:ssh.

--ssh-portport

Specifies the SSH port for the connection.

Default: 22.

--ssh-userusername

Specifies the username for the connection.

Default: user runningpgbadger.

--ssh-identityfilename

Specifies the path to the identity file.

--ssh-timeoutseconds

Specifies the timeout in seconds in case of the SSH connection failure.

Default: 10.

--ssh-optionoptions

Specifies the list of options to define SSH connection parameters. The following options are always used:

-o ConnectTimeout=$ssh_timeout-o PreferredAuthentications=hostbased,publickey

-o PreferredAuthentications=hostbased,publickey

Author

Gilles Darold<gilles@darold.net>

Prev	Up	Next
oid2name	Home	pg_probackup

epub pdf