Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Commit7868590

Browse files
committed
While making the seq_page_cost changes, I was struck by the fact that
cost_nonsequential_access() is really totally inappropriate for its onlyremaining use, namely estimating I/O costs in cost_sort(). The routinewas designed on the assumption that disk caching might eliminate the needfor some re-reads on a random basis, but there's nothing very random inthat sense about sort's access pattern --- it'll always be picking up theoldest outputs. If we had a good fix on the effective cache size wemight consider charging zero for I/O unless the sort temp file sizeexceeds it, but that's probably putting much too much faith in theparameter. Instead just drop the logic in favor of a fixed compromisebetween seq_page_cost and random_page_cost per page of sort I/O.
1 parentb7af62e commit7868590

File tree

1 file changed

+5
-57
lines changed

1 file changed

+5
-57
lines changed

‎src/backend/optimizer/path/costsize.c

Lines changed: 5 additions & 57 deletions
Original file line numberDiff line numberDiff line change
@@ -54,7 +54,7 @@
5454
* Portions Copyright (c) 1994, Regents of the University of California
5555
*
5656
* IDENTIFICATION
57-
* $PostgreSQL: pgsql/src/backend/optimizer/path/costsize.c,v 1.156 2006/06/0502:49:58 tgl Exp $
57+
* $PostgreSQL: pgsql/src/backend/optimizer/path/costsize.c,v 1.157 2006/06/0520:56:33 tgl Exp $
5858
*
5959
*-------------------------------------------------------------------------
6060
*/
@@ -175,55 +175,6 @@ cost_seqscan(Path *path, PlannerInfo *root,
175175
path->total_cost=startup_cost+run_cost;
176176
}
177177

178-
/*
179-
* cost_nonsequential_access
180-
* Estimate the cost of accessing one page at random from a relation
181-
* (or sort temp file) of the given size in pages.
182-
*
183-
* The simplistic model that the cost is random_page_cost is what we want
184-
* to use for large relations; but for small ones that is a serious
185-
* overestimate because of the effects of caching.This routine tries to
186-
* account for that.
187-
*
188-
* Unfortunately we don't have any good way of estimating the effective cache
189-
* size we are working with --- we know that Postgres itself has NBuffers
190-
* internal buffers, but the size of the kernel's disk cache is uncertain,
191-
* and how much of it we get to use is even less certain. We punt the problem
192-
* for now by assuming we are given an effective_cache_size parameter.
193-
*
194-
* Given a guesstimated cache size, we estimate the actual I/O cost per page
195-
* with the entirely ad-hoc equations (writing relsize for
196-
* relpages/effective_cache_size):
197-
*if relsize >= 1:
198-
*random_page_cost - (random_page_cost-seq_page_cost)/2 * (1/relsize)
199-
*if relsize < 1:
200-
*seq_page_cost + ((random_page_cost-seq_page_cost)/2) * relsize ** 2
201-
* These give the right asymptotic behavior (=> seq_page_cost as relpages
202-
* becomes small, => random_page_cost as it becomes large) and meet in the
203-
* middle with the estimate that the cache is about 50% effective for a
204-
* relation of the same size as effective_cache_size. (XXX this is probably
205-
* all wrong, but I haven't been able to find any theory about how effective
206-
* a disk cache should be presumed to be.)
207-
*/
208-
staticCost
209-
cost_nonsequential_access(doublerelpages)
210-
{
211-
doublerelsize;
212-
doublerandom_delta;
213-
214-
/* don't crash on bad input data */
215-
if (relpages <=0.0||effective_cache_size <=0.0)
216-
returnrandom_page_cost;
217-
218-
relsize=relpages /effective_cache_size;
219-
220-
random_delta= (random_page_cost-seq_page_cost)*0.5;
221-
if (relsize >=1.0)
222-
returnrandom_page_cost-random_delta /relsize;
223-
else
224-
returnseq_page_cost+random_delta*relsize*relsize;
225-
}
226-
227178
/*
228179
* cost_index
229180
* Determines and returns the cost of scanning a relation using an index.
@@ -371,10 +322,7 @@ cost_index(IndexPath *path, PlannerInfo *root,
371322

372323
/*
373324
* min_IO_cost corresponds to the perfectly correlated case (csquared=1),
374-
* max_IO_cost to the perfectly uncorrelated case (csquared=0). Note that
375-
* we just charge random_page_cost per page in the uncorrelated case,
376-
* rather than using cost_nonsequential_access, since we've already
377-
* accounted for caching effects by using the Mackert model.
325+
* max_IO_cost to the perfectly uncorrelated case (csquared=0).
378326
*/
379327
min_IO_cost=ceil(indexSelectivity*T)*seq_page_cost;
380328
max_IO_cost=pages_fetched*random_page_cost;
@@ -778,7 +726,7 @@ cost_functionscan(Path *path, PlannerInfo *root, RelOptInfo *baserel)
778726
*disk traffic = 2 * relsize * ceil(logM(p / (2*work_mem)))
779727
*cpu = comparison_cost * t * log2(t)
780728
*
781-
* The disk traffic is assumed to behalf sequential andhalf random
729+
* The disk traffic is assumed to be3/4ths sequential and1/4th random
782730
* accesses (XXX can't we refine that guess?)
783731
*
784732
* We charge two operator evals per tuple comparison, which should be in
@@ -838,9 +786,9 @@ cost_sort(Path *path, PlannerInfo *root,
838786
else
839787
log_runs=1.0;
840788
npageaccesses=2.0*npages*log_runs;
841-
/* Assumehalfare sequential,half are not */
789+
/* Assume3/4ths of accessesare sequential,1/4th are not */
842790
startup_cost+=npageaccesses*
843-
(seq_page_cost+cost_nonsequential_access(npages))*0.5;
791+
(seq_page_cost*0.75+random_page_cost*0.25);
844792
}
845793

846794
/*

0 commit comments

Comments
 (0)

[8]ページ先頭

©2009-2025 Movatter.jp