Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Commitbbd5d65

Browse files
committed
Update detail for new todo items.
1 parent7bbe216 commitbbd5d65

File tree

1 file changed

+252
-1
lines changed

1 file changed

+252
-1
lines changed

‎doc/TODO.detail/optimizer

Lines changed: 252 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1059,7 +1059,7 @@ From owner-pgsql-hackers@hub.org Thu Jan 20 18:45:32 2000
10591059
Received: from renoir.op.net (root@renoir.op.net [207.29.195.4])
10601060
by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id TAA00672
10611061
for <pgman@candle.pha.pa.us>; Thu, 20 Jan 2000 19:45:30 -0500 (EST)
1062-
Received: from hub.org (hub.org [216.126.84.1]) by renoir.op.net (o1/$Revision: 1.15 $) with ESMTP id TAA01989 for <pgman@candle.pha.pa.us>; Thu, 20 Jan 2000 19:39:15 -0500 (EST)
1062+
Received: from hub.org (hub.org [216.126.84.1]) by renoir.op.net (o1/$Revision: 1.16 $) with ESMTP id TAA01989 for <pgman@candle.pha.pa.us>; Thu, 20 Jan 2000 19:39:15 -0500 (EST)
10631063
Received: from localhost (majordom@localhost)
10641064
by hub.org (8.9.3/8.9.3) with SMTP id TAA00957;
10651065
Thu, 20 Jan 2000 19:35:19 -0500 (EST)
@@ -1586,3 +1586,254 @@ support a couple gigs of RAM now.
15861586

15871587
************
15881588

1589+
From pgsql-hackers-owner+M6019@hub.org Mon Aug 21 11:47:56 2000
1590+
Received: from hub.org (root@hub.org [216.126.84.1])
1591+
by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id LAA07289
1592+
for <pgman@candle.pha.pa.us>; Mon, 21 Aug 2000 11:47:55 -0400 (EDT)
1593+
Received: from hub.org (majordom@localhost [127.0.0.1])
1594+
by hub.org (8.10.1/8.10.1) with SMTP id e7LFlpT03383;
1595+
Mon, 21 Aug 2000 11:47:51 -0400 (EDT)
1596+
Received: from mail.fct.unl.pt (fct1.si.fct.unl.pt [193.136.120.1])
1597+
by hub.org (8.10.1/8.10.1) with SMTP id e7LFlaT03243
1598+
for <pgsql-hackers@postgresql.org>; Mon, 21 Aug 2000 11:47:37 -0400 (EDT)
1599+
Received: (qmail 7416 invoked by alias); 21 Aug 2000 15:54:33 -0000
1600+
Received: (qmail 7410 invoked from network); 21 Aug 2000 15:54:32 -0000
1601+
Received: from eros.si.fct.unl.pt (193.136.120.112)
1602+
by fct1.si.fct.unl.pt with SMTP; 21 Aug 2000 15:54:32 -0000
1603+
Date: Mon, 21 Aug 2000 16:48:08 +0100 (WEST)
1604+
From: =?iso-8859-1?Q?Tiago_Ant=E3o?= <tra@fct.unl.pt>
1605+
X-Sender: tiago@eros.si.fct.unl.pt
1606+
To: Tom Lane <tgl@sss.pgh.pa.us>
1607+
cc: pgsql-hackers@postgresql.org
1608+
Subject: Re: [HACKERS] Optimisation deficiency: currval('seq')-->seq scan,
1609+
constant-->index scan
1610+
In-Reply-To: <1731.966868649@sss.pgh.pa.us>
1611+
Message-ID: <Pine.LNX.4.21.0008211626250.25226-100000@eros.si.fct.unl.pt>
1612+
MIME-Version: 1.0
1613+
Content-Type: TEXT/PLAIN; charset=US-ASCII
1614+
X-Mailing-List: pgsql-hackers@postgresql.org
1615+
Precedence: bulk
1616+
Sender: pgsql-hackers-owner@hub.org
1617+
Status: ORr
1618+
1619+
On Mon, 21 Aug 2000, Tom Lane wrote:
1620+
1621+
> > One thing it might be interesting (please tell me if you think
1622+
> > otherwise) would be to improve pg with better statistical information, by
1623+
> > using, for example, histograms.
1624+
>
1625+
> Yes, that's been on the todo list for a while.
1626+
1627+
If it's ok and nobody is working on that, I'll look on that subject.
1628+
I'll start by looking at the analize portion of vacuum. I'm thinking in
1629+
using arrays for the histogram (I've never used the array data type of
1630+
postgres).
1631+
Should I use 7.0.2 or the cvs version?
1632+
1633+
1634+
> Interesting article. We do most of what she talks about, but we don't
1635+
> have anything like the ClusterRatio statistic. We need it --- that was
1636+
> just being discussed a few days ago in another thread. Do you have any
1637+
> reference on exactly how DB2 defines that stat?
1638+
1639+
1640+
I don't remember seeing that information spefically. From what I've
1641+
read I can speculate:
1642+
1643+
1. They have clusterratios for both indexes and the relation itself.
1644+
2. They might use an index even if there is no "order by" if the table
1645+
has a low clusterratio: just to get the RIDs, then sort the RIDs and
1646+
fetch.
1647+
3. One possible way to calculate this ratio:
1648+
a) for tables
1649+
SeqScan
1650+
if tuple points to a next tuple on the same page then its
1651+
"good"
1652+
ratio = # good tuples / # all tuples
1653+
b) for indexes (high speculation ratio here)
1654+
foreach pointed RID in index
1655+
if RID is in same page of next RID in index than mark as
1656+
"good"
1657+
1658+
I suspect that if a tuple size is big (relative to page size) than the
1659+
cluster ratio is always low.
1660+
1661+
A tuple might also be "good" if it pointed to the next page.
1662+
1663+
Tiago
1664+
1665+
1666+
From pgsql-hackers-owner+M6152@hub.org Wed Aug 23 13:00:33 2000
1667+
Received: from hub.org (root@hub.org [216.126.84.1])
1668+
by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id NAA10259
1669+
for <pgman@candle.pha.pa.us>; Wed, 23 Aug 2000 13:00:33 -0400 (EDT)
1670+
Received: from hub.org (majordom@localhost [127.0.0.1])
1671+
by hub.org (8.10.1/8.10.1) with SMTP id e7NGsPN83008;
1672+
Wed, 23 Aug 2000 12:54:25 -0400 (EDT)
1673+
Received: from mail.fct.unl.pt (fct1.si.fct.unl.pt [193.136.120.1])
1674+
by hub.org (8.10.1/8.10.1) with SMTP id e7NGniN81749
1675+
for <pgsql-hackers@postgresql.org>; Wed, 23 Aug 2000 12:49:44 -0400 (EDT)
1676+
Received: (qmail 9869 invoked by alias); 23 Aug 2000 15:10:04 -0000
1677+
Received: (qmail 9860 invoked from network); 23 Aug 2000 15:10:04 -0000
1678+
Received: from eros.si.fct.unl.pt (193.136.120.112)
1679+
by fct1.si.fct.unl.pt with SMTP; 23 Aug 2000 15:10:04 -0000
1680+
Date: Wed, 23 Aug 2000 16:03:42 +0100 (WEST)
1681+
From: =?iso-8859-1?Q?Tiago_Ant=E3o?= <tra@fct.unl.pt>
1682+
X-Sender: tiago@eros.si.fct.unl.pt
1683+
To: Tom Lane <tgl@sss.pgh.pa.us>
1684+
cc: Jules Bean <jules@jellybean.co.uk>, pgsql-hackers@postgresql.org
1685+
Subject: Re: [HACKERS] Optimisation deficiency: currval('seq')-->seq scan,
1686+
constant-->index scan
1687+
In-Reply-To: <27971.967041030@sss.pgh.pa.us>
1688+
Message-ID: <Pine.LNX.4.21.0008231543340.4273-100000@eros.si.fct.unl.pt>
1689+
MIME-Version: 1.0
1690+
Content-Type: TEXT/PLAIN; charset=US-ASCII
1691+
X-Mailing-List: pgsql-hackers@postgresql.org
1692+
Precedence: bulk
1693+
Sender: pgsql-hackers-owner@hub.org
1694+
Status: ORr
1695+
1696+
Hi!
1697+
1698+
On Wed, 23 Aug 2000, Tom Lane wrote:
1699+
1700+
> Yes, we know about that one. We have stats about the most common value
1701+
> in a column, but no information about how the less-common values are
1702+
> distributed. We definitely need stats about several top values not just
1703+
> one, because this phenomenon of a badly skewed distribution is pretty
1704+
> common.
1705+
1706+
1707+
An end-biased histogram has stats on top values and also on the least
1708+
frequent values. So if a there is a selection on a value that is well
1709+
bellow average, the selectivity estimation will be more acurate. On some
1710+
research papers I've read, it's refered that this is a better approach
1711+
than equi-width histograms (which are said to be the "industry" standard).
1712+
1713+
I not sure whether to use a table or a array attribute on pg_stat for
1714+
the histogram, the problem is what could be expected from the size of the
1715+
attribute (being a text). I'm very affraid of the cost of going through
1716+
several tuples on a table (pg_histogram?) during the optimization phase.
1717+
1718+
One other idea would be to only have better statistics for special
1719+
attributes requested by the user... something like "analyze special
1720+
table(column)".
1721+
1722+
Best Regards,
1723+
Tiago
1724+
1725+
1726+
1727+
From pgsql-hackers-owner+M6160@hub.org Thu Aug 24 00:21:39 2000
1728+
Received: from hub.org (root@hub.org [216.126.84.1])
1729+
by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id AAA27662
1730+
for <pgman@candle.pha.pa.us>; Thu, 24 Aug 2000 00:21:38 -0400 (EDT)
1731+
Received: from hub.org (majordom@localhost [127.0.0.1])
1732+
by hub.org (8.10.1/8.10.1) with SMTP id e7O46w585951;
1733+
Thu, 24 Aug 2000 00:06:58 -0400 (EDT)
1734+
Received: from sss2.sss.pgh.pa.us (sss.pgh.pa.us [209.114.166.2])
1735+
by hub.org (8.10.1/8.10.1) with ESMTP id e7O3uv583775
1736+
for <pgsql-hackers@postgresql.org>; Wed, 23 Aug 2000 23:56:57 -0400 (EDT)
1737+
Received: from sss2.sss.pgh.pa.us (tgl@localhost [127.0.0.1])
1738+
by sss2.sss.pgh.pa.us (8.9.3/8.9.3) with ESMTP id XAA20973;
1739+
Wed, 23 Aug 2000 23:56:35 -0400 (EDT)
1740+
To: =?iso-8859-1?Q?Tiago_Ant=E3o?= <tra@fct.unl.pt>
1741+
cc: Jules Bean <jules@jellybean.co.uk>, pgsql-hackers@postgresql.org
1742+
Subject: Re: [HACKERS] Optimisation deficiency: currval('seq')-->seq scan, constant-->index scan
1743+
In-reply-to: <Pine.LNX.4.21.0008231543340.4273-100000@eros.si.fct.unl.pt>
1744+
References: <Pine.LNX.4.21.0008231543340.4273-100000@eros.si.fct.unl.pt>
1745+
Comments: In-reply-to =?iso-8859-1?Q?Tiago_Ant=E3o?= <tra@fct.unl.pt>
1746+
message dated "Wed, 23 Aug 2000 16:03:42 +0100"
1747+
Date: Wed, 23 Aug 2000 23:56:35 -0400
1748+
Message-ID: <20970.967089395@sss.pgh.pa.us>
1749+
From: Tom Lane <tgl@sss.pgh.pa.us>
1750+
X-Mailing-List: pgsql-hackers@postgresql.org
1751+
Precedence: bulk
1752+
Sender: pgsql-hackers-owner@hub.org
1753+
Status: OR
1754+
1755+
=?iso-8859-1?Q?Tiago_Ant=E3o?= <tra@fct.unl.pt> writes:
1756+
> One other idea would be to only have better statistics for special
1757+
> attributes requested by the user... something like "analyze special
1758+
> table(column)".
1759+
1760+
This might actually fall out "for free" from the cheapest way of
1761+
implementing the stats. We've talked before about scanning btree
1762+
indexes directly to obtain data values in sorted order, which makes
1763+
it very easy to find the most common values. If you do that, you
1764+
get good stats for exactly those columns that the user has created
1765+
indexes on. A tad indirect but I bet it'd be effective...
1766+
1767+
regards, tom lane
1768+
1769+
From pgsql-hackers-owner+M6165@hub.org Thu Aug 24 05:33:02 2000
1770+
Received: from hub.org (root@hub.org [216.126.84.1])
1771+
by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id FAA14309
1772+
for <pgman@candle.pha.pa.us>; Thu, 24 Aug 2000 05:33:01 -0400 (EDT)
1773+
Received: from hub.org (majordom@localhost [127.0.0.1])
1774+
by hub.org (8.10.1/8.10.1) with SMTP id e7O9X0584670;
1775+
Thu, 24 Aug 2000 05:33:00 -0400 (EDT)
1776+
Received: from athena.office.vi.net (office-gwb.fulham.vi.net [194.88.77.158])
1777+
by hub.org (8.10.1/8.10.1) with ESMTP id e7O9Ix581216
1778+
for <pgsql-hackers@postgresql.org>; Thu, 24 Aug 2000 05:19:03 -0400 (EDT)
1779+
Received: from grommit.office.vi.net [192.168.1.200] (mail)
1780+
by athena.office.vi.net with esmtp (Exim 3.12 #1 (Debian))
1781+
id 13Rt2Y-00073I-00; Thu, 24 Aug 2000 10:11:14 +0100
1782+
Received: from jules by grommit.office.vi.net with local (Exim 3.12 #1 (Debian))
1783+
id 13Rt2Y-0005GV-00; Thu, 24 Aug 2000 10:11:14 +0100
1784+
Date: Thu, 24 Aug 2000 10:11:14 +0100
1785+
From: Jules Bean <jules@jellybean.co.uk>
1786+
To: Tom Lane <tgl@sss.pgh.pa.us>
1787+
Cc: Tiago Ant?o <tra@fct.unl.pt>, pgsql-hackers@postgresql.org
1788+
Subject: Re: [HACKERS] Optimisation deficiency: currval('seq')-->seq scan, constant-->index scan
1789+
Message-ID: <20000824101113.N17510@grommit.office.vi.net>
1790+
References: <1731.966868649@sss.pgh.pa.us> <Pine.LNX.4.21.0008211626250.25226-100000@eros.si.fct.unl.pt> <20000823133418.F17510@grommit.office.vi.net> <27971.967041030@sss.pgh.pa.us>
1791+
Mime-Version: 1.0
1792+
Content-Type: text/plain; charset=us-ascii
1793+
Content-Disposition: inline
1794+
User-Agent: Mutt/1.2i
1795+
In-Reply-To: <27971.967041030@sss.pgh.pa.us>; from tgl@sss.pgh.pa.us on Wed, Aug 23, 2000 at 10:30:30AM -0400
1796+
X-Mailing-List: pgsql-hackers@postgresql.org
1797+
Precedence: bulk
1798+
Sender: pgsql-hackers-owner@hub.org
1799+
Status: OR
1800+
1801+
On Wed, Aug 23, 2000 at 10:30:30AM -0400, Tom Lane wrote:
1802+
> Jules Bean <jules@jellybean.co.uk> writes:
1803+
> > I have in a table a 'category' column which takes a small number of
1804+
> > (basically fixed) values. Here by 'small', I mean ~1000, while the
1805+
> > table itself has ~10 000 000 rows. Some categories have many, many
1806+
> > more rows than others. In particular, there's one category which hits
1807+
> > over half the rows. Because of this (AIUI) postgresql assumes
1808+
> > that the query
1809+
> >select ... from thistable where category='something'
1810+
> > is best served by a seqscan, even though there is an index on
1811+
> > category.
1812+
>
1813+
> Yes, we know about that one. We have stats about the most common value
1814+
> in a column, but no information about how the less-common values are
1815+
> distributed. We definitely need stats about several top values not just
1816+
> one, because this phenomenon of a badly skewed distribution is pretty
1817+
> common.
1818+
1819+
ISTM that that might be enough, in fact.
1820+
1821+
If you have stats telling you that the most popular value is 'xyz',
1822+
and that it constitutes 50% of the rows (i.e. 5 000 000) then you can
1823+
conclude that, on average, other entries constitute a mere 5 000
1824+
000/999 ~~ 5000 entries, and it would be definitely be enough.
1825+
(That's assuming you store the number of distinct values somewhere).
1826+
1827+
1828+
> BTW, if your highly-popular value is actually a dummy value ('UNKNOWN'
1829+
> or something like that), a fairly effective workaround is to replace the
1830+
> dummy entries with NULL. The system does account for NULLs separately
1831+
> from real values, so you'd then get stats based on the most common
1832+
> non-dummy value.
1833+
1834+
I can't really do that. Even if I could, the distribution is very
1835+
skewed -- so the next most common makes up a very high proportion of
1836+
what's left. I forget the figures exactly.
1837+
1838+
Jules
1839+

0 commit comments

Comments
 (0)

[8]ページ先頭

©2009-2025 Movatter.jp