Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Commit6fdd71c

Browse files
committed
Add to mmap discussion.
1 parent29c18bc commit6fdd71c

File tree

1 file changed

+392
-0
lines changed

1 file changed

+392
-0
lines changed

‎doc/TODO.detail/mmap

Lines changed: 392 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2014,3 +2014,395 @@ KwvG7YLsJ+xpsTUS67KD+4M=
20142014

20152015
--HjNkcEWJ4DMx36DP--
20162016

2017+
From pgsql-performance-owner+M1354=pgman=candle.pha.pa.us@postgresql.org Fri Mar 7 01:09:07 2003
2018+
Return-path: <pgsql-performance-owner+M1354=pgman=candle.pha.pa.us@postgresql.org>
2019+
Received: from relay2.pgsql.com (relay2.pgsql.com [64.49.215.143])
2020+
by candle.pha.pa.us (8.11.6/8.10.1) with ESMTP id h27693604295
2021+
for <pgman@candle.pha.pa.us>; Fri, 7 Mar 2003 01:09:05 -0500 (EST)
2022+
Received: from postgresql.org (postgresql.org [64.49.215.8])
2023+
by relay2.pgsql.com (Postfix) with ESMTP id 95CD2EDFD3B
2024+
for <pgman@candle.pha.pa.us>; Fri, 7 Mar 2003 01:09:03 -0500 (EST)
2025+
X-Original-To: pgsql-performance@postgresql.org
2026+
Received: from perrin.int.nxad.com (internal.ext.nxad.com [69.1.70.251])
2027+
by postgresql.org (Postfix) with ESMTP id F16034768E2
2028+
for <pgsql-performance@postgresql.org>; Fri, 7 Mar 2003 01:04:33 -0500 (EST)
2029+
Received: by perrin.int.nxad.com (Postfix, from userid 1001)
2030+
id 7969A21065; Thu, 6 Mar 2003 22:04:12 -0800 (PST)
2031+
Date: Thu, 6 Mar 2003 22:04:12 -0800
2032+
From: Sean Chittenden <sean@chittenden.org>
2033+
To: Neil Conway <neilc@samurai.com>
2034+
cc: Tom Lane <tgl@sss.pgh.pa.us>,
2035+
Christopher Kings-Lynne <chriskl@familyhealth.com.au>,
2036+
PostgreSQL Performance <pgsql-performance@postgresql.org>
2037+
Subject: Re: [PERFORM] [COMMITTERS] pgsql-server/ /configure /configure.in rc/incl ...
2038+
Message-ID: <20030307060412.GA19138@perrin.int.nxad.com>
2039+
References: <20030306031656.1876F4762E0@postgresql.org> <032f01c2e390$b1842b20$6500a8c0@fhp.internal> <11077.1046921667@sss.pgh.pa.us> <033f01c2e392$71476570$6500a8c0@fhp.internal> <12228.1046922471@sss.pgh.pa.us> <20030306094117.GA79234@perrin.int.nxad.com> <15071.1046964336@sss.pgh.pa.us> <20030307003640.GF79234@perrin.int.nxad.com> <1046998072.10527.67.camel@tokyo>
2040+
MIME-Version: 1.0
2041+
Content-Type: multipart/signed; micalg=pgp-sha1;
2042+
protocol="application/pgp-signature"; boundary="KsGdsel6WgEHnImy"
2043+
Content-Disposition: inline
2044+
In-Reply-To: <1046998072.10527.67.camel@tokyo>
2045+
User-Agent: Mutt/1.4i
2046+
X-PGP-Key: finger seanc@FreeBSD.org
2047+
X-PGP-Fingerprint: 3849 3760 1AFE 7B17 11A0 83A6 DD99 E31F BC84 B341
2048+
X-Web-Homepage: http://sean.chittenden.org/
2049+
Precedence: bulk
2050+
Sender: pgsql-performance-owner@postgresql.org
2051+
Status: OR
2052+
2053+
--KsGdsel6WgEHnImy
2054+
Content-Type: text/plain; charset=us-ascii
2055+
Content-Disposition: inline
2056+
Content-Transfer-Encoding: quoted-printable
2057+
2058+
> > I don't have my copy of Steven's handy (it's some 700mi away atm
2059+
> > otherwise I'd cite it), but if Tom or someone else has it handy, look
2060+
> > up the example re: the performance gain from read()'ing an mmap()'ed
2061+
> > file versus a non-mmap()'ed file. The difference is non-trivial and
2062+
> > _WELL_ worth the time given the speed increase.
2063+
>=20
2064+
> Can anyone confirm this? If so, one easy step we could take in this
2065+
> direction would be adapting COPY FROM to use mmap().
2066+
2067+
Weeee! Alright, so I got to have some fun writing out some simple
2068+
tests with mmap() and friends tonight. Are the results interesting?
2069+
Absolutely! Is this a simple benchmark? Yup. Do I think it
2070+
simulates PostgreSQL? Eh, not particularly. Does it demonstrate that
2071+
mmap() is a win and something worth implementing? I sure hope so. Is
2072+
this a test program to demonstrate the ideal use of mmap() in
2073+
PostgreSQL? No. Is it a place to start a factual discussion? I hope
2074+
so.
2075+
2076+
I have here four tests that are conditionalized by cpp.
2077+
2078+
# The first one uses read() and write() but with the buffer size set
2079+
# to the same size as the file.
2080+
gcc -O3 -finline-functions -fkeep-inline-functions -funroll-loops -o test-=
2081+
mmap test-mmap.c
2082+
/usr/bin/time ./test-mmap > /dev/null
2083+
Beginning tests with file: services
2084+
2085+
Page size: 4096
2086+
File read size is the same as the file size
2087+
Number of iterations: 100000
2088+
Start time: 1047013002.412516
2089+
Time: 82.88178
2090+
2091+
Completed tests
2092+
82.09 real 2.13 user 68.98 sys
2093+
2094+
# The second one uses read() and write() with the default buffer size:
2095+
# 65536
2096+
gcc -O3 -finline-functions -fkeep-inline-functions -funroll-loops -DDEFAUL=
2097+
T_READSIZE=3D1 -o test-mmap test-mmap.c
2098+
/usr/bin/time ./test-mmap > /dev/null
2099+
Beginning tests with file: services
2100+
2101+
Page size: 4096
2102+
File read size is default read size: 65536
2103+
Number of iterations: 100000
2104+
Start time: 1047013085.16204
2105+
Time: 18.155511
2106+
2107+
Completed tests
2108+
18.16 real 0.90 user 14.79 sys
2109+
# Please note this is significantly faster, but that's expected
2110+
2111+
# The third test uses mmap() + madvise() + write()
2112+
gcc -O3 -finline-functions -fkeep-inline-functions -funroll-loops -DDEFAUL=
2113+
T_READSIZE=3D1 -DDO_MMAP=3D1 -o test-mmap test-mmap.c
2114+
/usr/bin/time ./test-mmap > /dev/null
2115+
Beginning tests with file: services
2116+
2117+
Page size: 4096
2118+
File read size is the same as the file size
2119+
Number of iterations: 100000
2120+
Start time: 1047013103.859818
2121+
Time: 8.4294203644
2122+
2123+
Completed tests
2124+
7.24 real 0.41 user 5.92 sys
2125+
# Faster still, and twice as fast as the normal read() case
2126+
2127+
# The last test only calls mmap()'s once when the file is opened and
2128+
# only msync()'s, munmap()'s, close()'s the file once at exit.
2129+
gcc -O3 -finline-functions -fkeep-inline-functions -funroll-loops -DDEFAUL=
2130+
T_READSIZE=3D1 -DDO_MMAP=3D1 -DDO_MMAP_ONCE=3D1 -o test-mmap test-mmap.c
2131+
/usr/bin/time ./test-mmap > /dev/null
2132+
Beginning tests with file: services
2133+
2134+
Page size: 4096
2135+
File read size is the same as the file size
2136+
Number of iterations: 100000
2137+
Start time: 1047013111.623712
2138+
Time: 1.174076
2139+
2140+
Completed tests
2141+
1.18 real 0.09 user 0.92 sys
2142+
# Substantially faster
2143+
2144+
2145+
Obviously this isn't perfect, but reading and writing data is faster
2146+
(specifically moving pages through the VM/OS). Doing partial writes
2147+
from mmap()'ed data should be faster along with scanning through
2148+
mmap()'ed portions of - or completely mmap()'ed - files because the
2149+
pages are already loaded in the VM. PostgreSQL's LRU file descriptor
2150+
cache could easily be adjusted to add mmap()'ing of frequently
2151+
accessed files (specifically, system catalogs come to mind). It's not
2152+
hard to figure out how often particular files are accessed and to
2153+
either _avoid_ mmap()'ing a file that isn't accessed often, or to
2154+
mmap() files that _are_ accessed often. mmap() does have a cost, but
2155+
I'd wager that mmap()'ing the same file a second or third time from a
2156+
different process would be more efficient. The speedup of searching
2157+
through an mmap()'ed file may be worth it, however, to mmap() all
2158+
files if the system is under a tunable resource limit
2159+
(max_mmaped_bytes?).
2160+
2161+
If someone is so inclined or there's enough interest, I can reverse
2162+
this test case so that data is written to an mmap()'ed file, but the
2163+
same performance difference should hold true (assuming this isn't a
2164+
write to a tape drive ::grin::).
2165+
2166+
The URL for the program used to generate the above tests is at:
2167+
2168+
http://people.freebsd.org/~seanc/mmap_test/
2169+
2170+
2171+
Please ask if you have questions. -sc
2172+
2173+
--=20
2174+
Sean Chittenden
2175+
2176+
--KsGdsel6WgEHnImy
2177+
Content-Type: application/pgp-signature
2178+
Content-Disposition: inline
2179+
2180+
-----BEGIN PGP SIGNATURE-----
2181+
Comment: Sean Chittenden <sean@chittenden.org>
2182+
2183+
iD8DBQE+aDZc3ZnjH7yEs0ERAid6AJ9/TAYMUx2+ZcD2680OlKJBj5FzrACgquIG
2184+
PBNCzM0OegBXrPROJ/uIKDM=
2185+
=y7O6
2186+
-----END PGP SIGNATURE-----
2187+
2188+
--KsGdsel6WgEHnImy--
2189+
2190+
From pgsql-performance-owner+M1358=pgman=candle.pha.pa.us@postgresql.org Fri Mar 7 16:47:38 2003
2191+
Return-path: <pgsql-performance-owner+M1358=pgman=candle.pha.pa.us@postgresql.org>
2192+
Received: from relay2.pgsql.com (relay2.pgsql.com [64.49.215.143])
2193+
by candle.pha.pa.us (8.11.6/8.10.1) with ESMTP id h27LlX429809
2194+
for <pgman@candle.pha.pa.us>; Fri, 7 Mar 2003 16:47:35 -0500 (EST)
2195+
Received: from postgresql.org (postgresql.org [64.49.215.8])
2196+
by relay2.pgsql.com (Postfix) with ESMTP id D40CBEDFE05
2197+
for <pgman@candle.pha.pa.us>; Fri, 7 Mar 2003 16:47:32 -0500 (EST)
2198+
X-Original-To: pgsql-performance@postgresql.org
2199+
Received: from perrin.int.nxad.com (internal.ext.nxad.com [69.1.70.251])
2200+
by postgresql.org (Postfix) with ESMTP id 913B5474E44
2201+
for <pgsql-performance@postgresql.org>; Fri, 7 Mar 2003 16:46:50 -0500 (EST)
2202+
Received: by perrin.int.nxad.com (Postfix, from userid 1001)
2203+
id A55392105B; Fri, 7 Mar 2003 13:46:30 -0800 (PST)
2204+
Date: Fri, 7 Mar 2003 13:46:30 -0800
2205+
From: Sean Chittenden <sean@chittenden.org>
2206+
To: Tom Lane <tgl@sss.pgh.pa.us>
2207+
cc: Neil Conway <neilc@samurai.com>,
2208+
Christopher Kings-Lynne <chriskl@familyhealth.com.au>,
2209+
PostgreSQL Performance <pgsql-performance@postgresql.org>
2210+
Subject: Re: [PERFORM] [COMMITTERS] pgsql-server/ /configure /configure.in rc/incl ...
2211+
Message-ID: <20030307214630.GI79234@perrin.int.nxad.com>
2212+
References: <032f01c2e390$b1842b20$6500a8c0@fhp.internal> <11077.1046921667@sss.pgh.pa.us> <033f01c2e392$71476570$6500a8c0@fhp.internal> <12228.1046922471@sss.pgh.pa.us> <20030306094117.GA79234@perrin.int.nxad.com> <15071.1046964336@sss.pgh.pa.us> <20030307003640.GF79234@perrin.int.nxad.com> <1046998072.10527.67.camel@tokyo> <20030307060412.GA19138@perrin.int.nxad.com> <29933.1047047386@sss.pgh.pa.us>
2213+
MIME-Version: 1.0
2214+
Content-Type: multipart/signed; micalg=pgp-sha1;
2215+
protocol="application/pgp-signature"; boundary="TALVG7vV++YnpwZG"
2216+
Content-Disposition: inline
2217+
In-Reply-To: <29933.1047047386@sss.pgh.pa.us>
2218+
User-Agent: Mutt/1.4i
2219+
X-PGP-Key: finger seanc@FreeBSD.org
2220+
X-PGP-Fingerprint: 3849 3760 1AFE 7B17 11A0 83A6 DD99 E31F BC84 B341
2221+
X-Web-Homepage: http://sean.chittenden.org/
2222+
Precedence: bulk
2223+
Sender: pgsql-performance-owner@postgresql.org
2224+
Status: OR
2225+
2226+
--TALVG7vV++YnpwZG
2227+
Content-Type: text/plain; charset=us-ascii
2228+
Content-Disposition: inline
2229+
Content-Transfer-Encoding: quoted-printable
2230+
2231+
> > Absolutely! Is this a simple benchmark? Yup. Do I think it
2232+
> > simulates PostgreSQL? Eh, not particularly.
2233+
2234+
I think quite a few of these Q's would have been answered by reading
2235+
the code/Makefile....
2236+
2237+
> This would be on what OS?
2238+
2239+
FreeBSD, but it shouldn't matter. Any reasonably written VM should
2240+
have similar numbers (though BSD is generally regarded as having the
2241+
best VM, which, I think Linux poached not that long ago, iirc
2242+
::grimace::).
2243+
2244+
> What hardware?
2245+
2246+
My ultra-pathetic laptop with some fine - overly-noisy and can hardly
2247+
buildworld - IDE drives.
2248+
2249+
> What size test file?
2250+
2251+
In this case, only 72K. I've just updated the test program to use an
2252+
array of files though.
2253+
2254+
> Do the "iterations" mean so many reads of the entire file, or so
2255+
> many buffer-sized read requests?
2256+
2257+
In some cases, yes. With the file mmap()'ed, sorta. One of the test
2258+
cases (the one that did it in ~8s), mmap()'ed and munmap()'ed the file
2259+
every iteration and was twice as fast as the vanilla read() call.
2260+
2261+
> Did the mmap case actually *read* anything, or just map and unmap
2262+
> the file?
2263+
2264+
Nope, read it and wrote it out to stdout (which was redirected to
2265+
/dev/null).
2266+
2267+
> Also, what did you do to normalize for the effects of the test file
2268+
> being already in kernel disk cache after the first test?
2269+
2270+
That honestly doesn't matter too much since I wasn't testing the rate
2271+
of reading in files from my hard drive, only the OS's ability to
2272+
read/write pages of data around. In any case, I've updated my test
2273+
case to iterate through an array of files instead of just reading in a
2274+
copy of /etc/services. My laptop is generally a poor benchmark for
2275+
disk read performance given it takes 8hrs to buildworld, over 12hrs to
2276+
build mozilla, 18 for KDE, and about 48hrs for Open Office. :)
2277+
Someone with faster disks may want to try this and report back, but it
2278+
doesn't matter much in terms of relevancy for considering the benefits
2279+
of mmap(). The point is that there are calls that can be used that
2280+
substantially speed up read()'s and write()'s by allowing the VM to
2281+
align pages of data and give hints about its usage. For the sake of
2282+
argument re: the previously done tests, I'll reverse the order in
2283+
which I ran them and I bet dime to dollar that the times will be
2284+
identical.
2285+
2286+
% make =
2287+
~/open_source/mmap_test
2288+
cp -f /etc/services ./services
2289+
gcc -O3 -finline-functions -fkeep-inline-functions -funroll-loops -DDEFAUL=
2290+
T_READSIZE=3D1 -DDO_MMAP=3D1 -DDO_MMAP_ONCE=3D1 -o mmap-test mmap-test.c
2291+
/usr/bin/time ./mmap-test > /dev/null
2292+
Beginning tests with file: services
2293+
2294+
Page size: 4096
2295+
File read size is the same as the file size
2296+
Number of iterations: 100000
2297+
Start time: 1047064672.276544
2298+
Time: 1.281477
2299+
2300+
Completed tests
2301+
1.29 real 0.10 user 0.92 sys
2302+
gcc -O3 -finline-functions -fkeep-inline-functions -funroll-loops -DDEFAUL=
2303+
T_READSIZE=3D1 -DDO_MMAP=3D1 -o mmap-test mmap-test.c
2304+
/usr/bin/time ./mmap-test > /dev/null
2305+
Beginning tests with file: services
2306+
2307+
Page size: 4096
2308+
File read size is the same as the file size
2309+
Number of iterations: 100000
2310+
Start time: 1047064674.266191
2311+
Time: 7.486622
2312+
2313+
Completed tests
2314+
7.49 real 0.41 user 6.01 sys
2315+
gcc -O3 -finline-functions -fkeep-inline-functions -funroll-loops -DDEFAUL=
2316+
T_READSIZE=3D1 -o mmap-test mmap-test.c
2317+
/usr/bin/time ./mmap-test > /dev/null
2318+
Beginning tests with file: services
2319+
2320+
Page size: 4096
2321+
File read size is default read size: 65536
2322+
Number of iterations: 100000
2323+
Start time: 1047064682.288637
2324+
Time: 19.35214
2325+
2326+
Completed tests
2327+
19.04 real 0.88 user 15.43 sys
2328+
gcc -O3 -finline-functions -fkeep-inline-functions -funroll-loops -o mmap-=
2329+
test mmap-test.c
2330+
/usr/bin/time ./mmap-test > /dev/null
2331+
Beginning tests with file: services
2332+
2333+
Page size: 4096
2334+
File read size is the same as the file size
2335+
Number of iterations: 100000
2336+
Start time: 1047064701.867031
2337+
Time: 82.4294540875
2338+
2339+
Completed tests
2340+
81.57 real 2.10 user 69.55 sys
2341+
2342+
2343+
Here's the updated test that iterates through. Ooh! One better, the
2344+
files I've used are actual data files from ~pgsql. The new benchmark
2345+
iterates through the list of files and and calls bench() once for each
2346+
file and restarts at the first file after reaching the end of its
2347+
list (ARGV).
2348+
2349+
Whoa, if these tests are even close to real world, then we at the very
2350+
least should be mmap()'ing the file every time we read it (assuming
2351+
we're reading more than just a handful of bytes):
2352+
2353+
find /usr/local/pgsql/data -type f | /usr/bin/xargs /usr/bin/time ./mmap-te=
2354+
st > /dev/null
2355+
Page size: 4096
2356+
File read size is the same as the file size
2357+
Number of iterations: 100000
2358+
Start time: 1047071143.463360
2359+
Time: 12.109530
2360+
2361+
Completed tests
2362+
12.11 real 0.36 user 6.80 sys
2363+
2364+
find /usr/local/pgsql/data -type f | /usr/bin/xargs /usr/bin/time ./mmap-te=
2365+
st > /dev/null
2366+
Page size: 4096
2367+
File read size is default read size: 65536
2368+
Number of iterations: 100000
2369+
.... [been waiting here for >40min now....]
2370+
2371+
2372+
Ah well, if these tests finish this century, I'll post the results in
2373+
a bit, but it's pretty clearly a win. In terms of the data that I'm
2374+
copying, I'm copying ~700MB of data from my test DB on my laptop. I
2375+
only have 256MB of RAM so I can pretty much promise you that the data
2376+
isn't in my system buffers. If anyone else would like to run the
2377+
tests or look at the results, please check it out:
2378+
2379+
o1 and o2 should be the only targets used if FILES is bigger than the
2380+
RAM on the system. o3's by far and away the fastest, but only in rare
2381+
cases will a DBA have more RAM than data. But, as mentioned earlier,
2382+
the LRU cache could easily be modified to munmap() infrequently
2383+
accessed files to keep the size of mmap()'ed data down to a reasonable
2384+
level.
2385+
2386+
The updated test programs are at:
2387+
2388+
http://people.FreeBSD.org/~seanc/mmap_test/
2389+
2390+
-sc
2391+
2392+
--=20
2393+
Sean Chittenden
2394+
2395+
--TALVG7vV++YnpwZG
2396+
Content-Type: application/pgp-signature
2397+
Content-Disposition: inline
2398+
2399+
-----BEGIN PGP SIGNATURE-----
2400+
Comment: Sean Chittenden <sean@chittenden.org>
2401+
2402+
iD8DBQE+aRM23ZnjH7yEs0ERAoqhAKCFgmhpvNMqe9tucoFvK1H6J50z2QCeIZEI
2403+
mgBHwu/H1pe1sXIX9UG2V+I=
2404+
=cFRQ
2405+
-----END PGP SIGNATURE-----
2406+
2407+
--TALVG7vV++YnpwZG--
2408+

0 commit comments

Comments
 (0)

[8]ページ先頭

©2009-2025 Movatter.jp