This repository was archived by the owner on Jul 6, 2021. It is now read-only.

postgres-ai/postgres-checkupPublic archive

NotificationsYou must be signed in to change notification settings
Fork15
Star175

Postgres Health Check and SQL Performance Analysis. 👉 THIS IS A MIRROR OFhttps://gitlab.com/postgres-ai/postgres-checkup

License

AGPL-3.0 license

175 stars 15 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 372 Commits
.ci		.ci
artifacts		artifacts
pghrep		pghrep
resources		resources
tests		tests
.gitattributes		.gitattributes
.gitignore		.gitignore
.gitlab-ci.yml		.gitlab-ci.yml
LICENSE		LICENSE
README.md		README.md
checkup		checkup

Repository files navigation

About

Postgres Checkup (postgres-checkup)is a new-generation diagnostics tool that allows users to collect deep analysisof the health of a Postgres database. It aims to detect and describe all currentand potential issues in the fields of database performance, scalability, andsecurity, providing advices how to resolve or prevent them.

Compared to a monitoring system, postgres-checkup goes deeper into the analysisof the database system and environment. It combines numerous internalcharacteristics of the database with data about resources and OS, producingmultiple comprehensive reports. These reports use formats which are easilyreadable both by humans and machines and which are extremely oriented to DBAproblem-solving. Monitoring systems constantly collect telemetry, help to reactto issues more quickly, and are useful for post-mortem analyses. At the sametime, checkups are needed for a different purpose: detect issues at a very earlystage, advising on how to prevent them. This procedure is to be done on aregular basis — weekly, monthly, or quarterly. Additionally, it is recommendedto run it immediately before and after any major change in the database server.

The three key principles behind postgres-checkup:

Unobtrusiveness: postgres-checkup’s impact on the observing system isclose to zero. It does not use any heavy queries, keeping resource usagevery low, and avoiding having the“observer effect.”
Zero install (on observed machines): it is able to analyze any Linuxmachine (including virtual machines), as well as Cloud Postgres instances(such as Amazon RDs or Google Cloud SQL), not requiring any additional setupor any changes. It does, hovewer, require a privileged access (a DBA usuallyhas it anyway).
Complex analysis: unlike most monitoring tools, which provide raw data,postgres-checkup combines data from various parts of the system (e.g.,internal Postgres stats are combined with knowledge about system resourcesin autovacuum setting and behavior analysis). Also, it analyzes the masterdatabase server together with all its replicas (e.g. to build the list ofunused indexes).

Reports Structure

Postgres-checkup produces two kinds of reports for every check:

JSON reports (*.json) — can be consumed by any program or service, orstores in some database.
Markdown reports (*.md) — the main format for humans, may contain lists,tables, pictures. Being of native format for GitLab and GitHub, such reportsare ready to be used, for instance, in their issue trackers, simplifyingworkflow. Markdown reports are derived from JSON reports.

Markdown reports can be converted to different formats such as HTML or PDF.

Each report consists of three sections:

"Observations": automatically collected data. This is to be consumed byan expert DBA.
"Conclusions": what we conclude from the Observations—what is good, whatis bad (right now, it is to be manually filled for most checks).
"Recommendations": action items, what to do to fix the discovered issues.Both "Conclusions" and "Recommendations" are to be consumed by engineers whowill make decisions what, how and when to optimize, and how to react to thefindings.

Installation and Usage

Requirements

The supported OS of the observer machine (those from which the tool is to beexecuted):

Linux (modern RHEL/CentOS or Debian/Ubuntu; others should work as well, butare not yet tested);
MacOS.

The following programs must be installed on the observer machine:

bash
psql
coreutils
jq >= 1.5
golang >= 1.8
awk
sed

Nothing special has to be installed on the observed machines. However, thesemachines must run Linux (again: modern RHEL/CentOS or Debian/Ubuntu; othersshould work as well, but are not yet tested).

⚠️ Only Postgres version 9.6 and higher are currently supported.

How to Install

Usegit clone. This is the only method of installation currently supported.

Example of Use

Let's make a report for a project namedprod1:Clusterslony contains two servers -db1.vpn.local anddb1.vpn.local.Postgres-checkup automatically detects which one is a master:

./checkup -h db1.vpn.local -p 5432 --username postgres --dbname postgres --project prod1

./checkup -h db2.vpn.local -p 5432 --username postgres --dbname postgres --project prod1 -e 1

Which literally means: "connect to the server with given credentials, save data intoprod1project directory as epoch of check1. Epoch is a numerical (integer) sign of current iteration.For example: in half a year we can switch to "epoch number2".

At the first run we can skip-e 1 because default epoch is1, but at the second argument-e
must exist: we don't want to overwrite historical results.

As a result of postgres-checkup we have got two directories with .json files and .md files:

./artifacts/prod1/json_reports/1_2018_12_06T14_12_36_+0300/./artifacts/prod1/md_reports/1_2018_12_06T14_12_36_+0300/

Each of generated files contains information about "what we check" and collected data forall instances of the postgres clusterprod1.

A human-readable report can be found at:

./artifacts/prod1/e1_full_report.md

Open it with your favorite Markdown files viewer or just upload to a service such as gist.github.com.

The Full List of Checks

А. General / Infrastructural

A001 System, CPU, RAM, disks, virtualization #6 , #56 , #57 , #86
A002 PostgreSQL Versions (Simple) #68, #21, #86
A003 Collect pg_settings #15, #167, #86
A004 General cluster info #7, #58, #59, #86, #162
A005 Extensions #8, #60, #61, #86, #167
A006 Config diff #9, #62, #63, #86
A007 Alter system vs postgresql.conf #18, #86
A008 Disk usage and file system type #19, #20
A010 Data checksums are not enabled + wal_log_hints #22
A011 Connection pooling. PgBouncer #23
A012 Anti crash checks #177

B. Backups and DR

B001 SLO/SLA, RPO, RTO #24
B002 File system, mount flags #25
B003 Full backups / incremental #26
B004 WAL archiving (GB/day?) - #27
B005 Restore checks, monitoring, alerting #28

C. Replication and HA

C001 SLO/SLA #29
C002 Sync/async, Streaming / wal transfer; logical decoding #30
C003 SPOFs; “-1 datacenter”, standby with traffic #31
C004 Failover #32
C005 Switchover #33
C006 Delayed replica (replay of 1 day of WALs) - #34

D. Monitoring / Troubleshooting

D001 Logging (syslog?), log_*** #35
D002 Useful Linux tools #36
D003 List of monitoring metrics #37
D004 pg_stat_statements, tuning opts, pg_stat_kcache #38
D005 track_io_timing, …, auto_explain #39
D006 Postgres_dba / other toolset - recommend #40
D007 Postgres-specific tools for troubleshooting #137

E. WAL, Checkpoints

E001 WAL/checkpoint settings, IO #41
E002 Bgwriter, IO #42

F. Autovacuum, Bloat

F001 < F003 Current autovacuum-related settings #108, #164
F002 < F007 Transaction wraparound check #16, #171
F003 < F006 Autovacuum dead tuples #164
F004 < F001 Heap Bloat estimation #87, #122
F005 < F002 Index bloat estimation #88
F006 < F004 Precise heap bloat analysis
F007 < F005 Precise index bloat analysis
F008 < F008 Resource usage (CPU, Memory, disk IO) #44

G. Performance / Connections / Memory-related Settings

G001 Memory-related settings #45, #190
G002 Connections #46
G003 Timeouts, locks, deadlocks (amount) #47
G004 Query planner (diff) #48
G005 I/O settings #49
G006 Default_statistics_target (per table?) #50

H. Index Analysis

H001 Indexes: invalid #192, #51
H002 < H001 Indexes: unused, redundant #51, #180, #170, #168
H003 < H002 Missing FK indexes #52, #142, #173

J. Capacity Planning

J001 Capacity planning - #54

K. SQL query Analysis

K001 Globally aggregated query metrics #158, #178, #182, #184
K002 Workload type ("first word" analysis) #159, #178, #179, #182, #184
K003 Top-50 queries by total_time #160, #172, #174, #178, #179, #182, #184, #193

L. DB Schema Analysis

L001 (was: H003) Current sizes of DB objects (tables, indexes, mat. views) #163
L002 (was: H004) Data types being used #53
L003 Integer (int2, int4) out-of-range risks in PKs // calculate capacity remained; optional: predict when capacity will be fully used)https://gitlab.com/postgres-ai-team/postgres-checkup/issues/237

TODO:

DB schema, DDL, DB schema migrations

Ideas 💡 💡 💡 :thinking_face:

analyze all FKs and check if data types of referencing column and referenced one match (same thing for multi-column FKs)
tables w/o PKs? tables not having even unique index?

PostgreSQL:

ready to archive WAL files (count) (need FS access) on master
standby lag in seconds

OS:

FS settings (mount command parsing)
meltdown/spectre patches
swap settings
memory pressure settings
overcommit settings
NUMA enabled?
Huge pages?
Transparent huge pages?

About

Postgres Health Check and SQL Performance Analysis. 👉 THIS IS A MIRROR OFhttps://gitlab.com/postgres-ai/postgres-checkup

Movatterモバイル変換

License

postgres-ai/postgres-checkup

Folders and files

Latest commit

History

Repository files navigation

About

Reports Structure

Installation and Usage

Requirements

How to Install

Example of Use

The Full List of Checks

А. General / Infrastructural

B. Backups and DR

C. Replication and HA

D. Monitoring / Troubleshooting

E. WAL, Checkpoints

F. Autovacuum, Bloat

G. Performance / Connections / Memory-related Settings

H. Index Analysis

J. Capacity Planning

K. SQL query Analysis

L. DB Schema Analysis

TODO:

Ideas 💡 💡 💡 :thinking_face:

PostgreSQL:

OS:

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages0

Uh oh!

Contributors8

Uh oh!

Languages

Packages