NotificationsYou must be signed in to change notification settings
Fork6
Star31

Commit2bb6b98

committed

Add a section about the planner's statistics, including a description

of the new pg_stats view.

1 parentdaf72bb commit2bb6b98Copy full SHA for 2bb6b98

File tree

2 files changed

+199

-3

lines changed

doc/src/sgml
- perform.sgml
- ref
  - analyze.sgml

2 files changed

+199

-3

lines changed

`‎doc/src/sgml/perform.sgml`

Lines changed: 195 additions & 1 deletion

Original file line number	Diff line number	Diff line change
`@@ -1,5 +1,5 @@`
`1`	`1`	`<!--`
`2`		`-$Header: /cvsroot/pgsql/doc/src/sgml/perform.sgml,v 1.12 2001/10/12 23:32:34 momjian Exp $`
	`2`	`+$Header: /cvsroot/pgsql/doc/src/sgml/perform.sgml,v 1.13 2001/10/16 01:13:44 tgl Exp $`
`3`	`3`	`-->`
`4`	`4`
`5`	`5`	`<chapter id="performance-tips">`
`@@ -315,6 +315,200 @@ Total runtime: 30.67 msec`
`315`	`315`	`</para>`
`316`	`316`	`</sect1>`
`317`	`317`
	`318`	`+ <sect1 id="planner-stats">`
	`319`	`+ <title>Statistics used by the Planner</title>`
	`320`	`+`
	`321`	`+ <para>`
	`322`	`+ As we saw in the previous section, the query planner needs to estimate`
	`323`	`+ the number of rows retrieved by a query in order to make good choices`
	`324`	`+ of query plans. This section provides a quick look at the statistics`
	`325`	`+ that the system uses for these estimates.`
	`326`	`+ </para>`
	`327`	`+`
	`328`	`+ <para>`
	`329`	`+ One component of the statistics is the total number of entries in each`
	`330`	`+ table and index, as well as the number of disk blocks occupied by each`
	`331`	`+ table and index. This information is kept in`
	`332`	`+ <structname>pg_class</structname>'s <structfield>reltuples</structfield>`
	`333`	`+ and <structfield>relpages</structfield> columns. We can look at it`
	`334`	`+ with queries similar to this one:`
	`335`	`+`
	`336`	`+<screen>`
	`337`	`+regression=# select relname, relkind, reltuples, relpages from pg_class`
	`338`	`+regression-# where relname like 'tenk1%';`
	`339`	`+ relname \| relkind \| reltuples \| relpages`
	`340`	`+---------------+---------+-----------+----------`
	`341`	`+ tenk1 \| r \| 10000 \| 233`
	`342`	`+ tenk1_hundred \| i \| 10000 \| 30`
	`343`	`+ tenk1_unique1 \| i \| 10000 \| 30`
	`344`	`+ tenk1_unique2 \| i \| 10000 \| 30`
	`345`	`+(4 rows)`
	`346`	`+</screen>`
	`347`	`+`
	`348`	`+ Here we can see that <structname>tenk1</structname> contains 10000`
	`349`	`+ rows, as do its indexes, but the indexes are (unsurprisingly) much`
	`350`	`+ smaller than the table.`
	`351`	`+ </para>`
	`352`	`+`
	`353`	`+ <para>`
	`354`	`+ For efficiency reasons, <structfield>reltuples</structfield>`
	`355`	`+ and <structfield>relpages</structfield> are not updated on-the-fly,`
	`356`	`+ and so they usually contain only approximate values (which is good`
	`357`	`+ enough for the planner's purposes). They are initialized with dummy`
	`358`	`+ values (presently 1000 and 10 respectively) when a table is created.`
	`359`	`+ They are updated by certain commands, presently <command>VACUUM</>,`
	`360`	`+ <command>ANALYZE</>, and <command>CREATE INDEX</>. A stand-alone`
	`361`	`+ <command>ANALYZE</>, that is one not part of <command>VACUUM</>,`
	`362`	`+ generates an approximate <structfield>reltuples</structfield> value`
	`363`	`+ since it does not read every row of the table.`
	`364`	`+ </para>`
	`365`	`+`
	`366`	`+ <para>`
	`367`	`+ Most queries retrieve only a fraction of the rows in a table, due`
	`368`	`+ to having WHERE clauses that restrict the rows to be examined.`
	`369`	`+ The planner thus needs to make an estimate of the`
	`370`	`+ <firstterm>selectivity</> of WHERE clauses, that is, the fraction of`
	`371`	`+ rows that match each clause of the WHERE condition. The information`
	`372`	`+ used for this task is stored in the <structname>pg_statistic</structname>`
	`373`	`+ system catalog. Entries in <structname>pg_statistic</structname> are`
	`374`	`+ updated by <command>ANALYZE</> and <command>VACUUM ANALYZE</> commands,`
	`375`	`+ and are always approximate even when freshly updated.`
	`376`	`+ </para>`
	`377`	`+`
	`378`	`+ <para>`
	`379`	`+ Rather than look at <structname>pg_statistic</structname> directly,`
	`380`	`+ it's better to look at its view <structname>pg_stats</structname>`
	`381`	`+ when examining the statistics manually. <structname>pg_stats</structname>`
	`382`	`+ is designed to be more easily readable. Furthermore,`
	`383`	`+ <structname>pg_stats</structname> is readable by all, whereas`
	`384`	`+ <structname>pg_statistic</structname> is only readable by the superuser.`
	`385`	`+ (This prevents unprivileged users from learning something about`
	`386`	`+ the contents of other people's tables from the statistics. The`
	`387`	`+ <structname>pg_stats</structname> view is restricted to show only`
	`388`	`+ rows about tables that the current user can read.)`
	`389`	`+ For example, we might do:`
	`390`	`+`
	`391`	`+<screen>`
	`392`	`+regression=# select attname, n_distinct, most_common_vals from pg_stats where tablename = 'road';`
	`393`	`+ attname \| n_distinct \| most_common_vals`
	`394`	`+---------+------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------`
	`395`	`+ name \| -0.467008 \| {"I- 580 Ramp","I- 880 Ramp","Sp Railroad ","I- 580 ","I- 680 Ramp","I- 80 Ramp","14th St ","5th St ","Mission Blvd","I- 880 "}`
	`396`	`+ thepath \| 20 \| {"[(-122.089,37.71),(-122.0886,37.711)]"}`
	`397`	`+(2 rows)`
	`398`	`+regression=#`
	`399`	`+</screen>`
	`400`	`+`
	`401`	`+ As of <productname>Postgres</productname> 7.2 the following columns exist`
	`402`	`+ in <structname>pg_stats</structname>:`
	`403`	`+ </para>`
	`404`	`+`
	`405`	`+ <table>`
	`406`	`+ <title>pg_stats Columns</title>`
	`407`	`+`
	`408`	`+ <tgroup cols=3>`
	`409`	`+ <thead>`
	`410`	`+ <row>`
	`411`	`+ <entry>Name</entry>`
	`412`	`+ <entry>Type</entry>`
	`413`	`+ <entry>Description</entry>`
	`414`	`+ </row>`
	`415`	`+ </thead>`
	`416`	`+`
	`417`	`+ <tbody>`
	`418`	`+ <row>`
	`419`	`+ <entry>tablename</entry>`
	`420`	`+ <entry><type>name</type></entry>`
	`421`	`+ <entry>Name of table containing column</entry>`
	`422`	`+ </row>`
	`423`	`+`
	`424`	`+ <row>`
	`425`	`+ <entry>attname</entry>`
	`426`	`+ <entry><type>name</type></entry>`
	`427`	`+ <entry>Column described by this row</entry>`
	`428`	`+ </row>`
	`429`	`+`
	`430`	`+ <row>`
	`431`	`+ <entry>null_frac</entry>`
	`432`	`+ <entry><type>real</type></entry>`
	`433`	`+ <entry>Fraction of column's entries that are NULL</entry>`
	`434`	`+ </row>`
	`435`	`+`
	`436`	`+ <row>`
	`437`	`+ <entry>avg_width</entry>`
	`438`	`+ <entry><type>integer</type></entry>`
	`439`	`+ <entry>Average width in bytes of column's entries</entry>`
	`440`	`+ </row>`
	`441`	`+`
	`442`	`+ <row>`
	`443`	`+ <entry>n_distinct</entry>`
	`444`	`+ <entry><type>real</type></entry>`
	`445`	`+ <entry>If greater than zero, the estimated number of distinct values`
	`446`	`+ in the column. If less than zero, the negative of the number of`
	`447`	`+ distinct values divided by the number of rows. (The negated form`
	`448`	`+ is used when ANALYZE believes that the number of distinct values`
	`449`	`+ is likely to increase as the table grows; the positive form is used`
	`450`	`+ when the column seems to have a fixed number of possible values.)`
	`451`	`+ For example, -1 indicates a unique column in which the number of`
	`452`	`+ distinct values is the same as the number of rows.`
	`453`	`+ </entry>`
	`454`	`+ </row>`
	`455`	`+`
	`456`	`+ <row>`
	`457`	`+ <entry>most_common_vals</entry>`
	`458`	`+ <entry><type>text[]</type></entry>`
	`459`	`+ <entry>A list of the most common values in the column. (Omitted if`
	`460`	`+ no values seem to be more common than any others.)</entry>`
	`461`	`+ </row>`
	`462`	`+`
	`463`	`+ <row>`
	`464`	`+ <entry>most_common_freqs</entry>`
	`465`	`+ <entry><type>real[]</type></entry>`
	`466`	`+ <entry>A list of the frequencies of the most common values,`
	`467`	`+ ie, number of occurrences of each divided by total number of rows.`
	`468`	`+ </entry>`
	`469`	`+ </row>`
	`470`	`+`
	`471`	`+ <row>`
	`472`	`+ <entry>histogram_bounds</entry>`
	`473`	`+ <entry><type>text[]</type></entry>`
	`474`	`+ <entry>A list of values that divide the column's values into`
	`475`	`+ groups of approximately equal population. The`
	`476`	`+ <structfield>most_common_vals</>, if present, are omitted from the`
	`477`	`+ histogram calculation. (Omitted if column datatype does not have a`
	`478`	`+ <literal><</> operator, or if the <structfield>most_common_vals</>`
	`479`	`+ list accounts for the entire population.)`
	`480`	`+ </entry>`
	`481`	`+ </row>`
	`482`	`+`
	`483`	`+ <row>`
	`484`	`+ <entry>correlation</entry>`
	`485`	`+ <entry><type>real</type></entry>`
	`486`	`+ <entry>Statistical correlation between physical row ordering and`
	`487`	`+ logical ordering of the column values. This ranges from -1 to +1.`
	`488`	`+ When the value is near -1 or +1, an indexscan on the column will`
	`489`	`+ be estimated to be cheaper than when it is near zero, due to reduction`
	`490`	`+ of random access to the disk. (Omitted if column datatype does`
	`491`	`+ not have a <literal><</> operator.)`
	`492`	`+ </entry>`
	`493`	`+ </row>`
	`494`	`+ </tbody>`
	`495`	`+ </tgroup>`
	`496`	`+ </table>`
	`497`	`+`
	`498`	`+ <para>`
	`499`	`+ The maximum number of entries in the <structfield>most_common_vals</>`
	`500`	`+ and <structfield>histogram_bounds</> arrays can be set on a`
	`501`	`+ column-by-column basis using the <command>ALTER TABLE SET STATISTICS</>`
	`502`	`+ command. The default limit is presently 10 entries. Raising the limit`
	`503`	`+ may allow more accurate planner estimates to be made, particularly for`
	`504`	`+ columns with irregular data distributions, at the price of consuming`
	`505`	`+ more space in <structname>pg_statistic</structname> and slightly more`
	`506`	`+ time to compute the estimates. Conversely, a lower limit may be`
	`507`	`+ appropriate for columns with simple data distributions.`
	`508`	`+ </para>`
	`509`	`+`
	`510`	`+ </sect1>`
	`511`	`+`
`318`	`512`	`<sect1 id="explicit-joins">`
`319`	`513`	`<title>Controlling the Planner with Explicit JOINs</title>`
`320`	`514`

`‎doc/src/sgml/ref/analyze.sgml`

Lines changed: 4 additions & 2 deletions

Original file line number	Diff line number	Diff line change
`@@ -1,5 +1,5 @@`
`1`	`1`	`<!--`
`2`		`-$Header: /cvsroot/pgsql/doc/src/sgml/ref/analyze.sgml,v 1.2 2001/09/03 12:57:49 petere Exp $`
	`2`	`+$Header: /cvsroot/pgsql/doc/src/sgml/ref/analyze.sgml,v 1.3 2001/10/16 01:13:44 tgl Exp $`
`3`	`3`	`Postgres documentation`
`4`	`4`	`-->`
`5`	`5`
`@@ -152,7 +152,9 @@ ANALYZE [ VERBOSE ] [ <replaceable class="PARAMETER">table</replaceable> [ (<rep`
`152`	`152`	`distribution in each column. One or both of these may be omitted if`
`153`	`153`	`<command>ANALYZE</command> deems them uninteresting (for example, in`
`154`	`154`	`a unique-key column, there are no common values) or if the column`
`155`		`- datatype does not support the appropriate operators.`
	`155`	`+ datatype does not support the appropriate operators. There is more`
	`156`	`+ information about the statistics in the <citetitle>User's`
	`157`	`+ Guide</citetitle>.`
`156`	`158`	`</para>`
`157`	`159`
`158`	`160`	`<para>`

0 commit comments

Comments

(0)

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Commit2bb6b98

File tree

2 files changed

2 files changed

`‎doc/src/sgml/perform.sgml`

`‎doc/src/sgml/ref/analyze.sgml`

0 commit comments