2929 <para>
3030 <application>pg_test_timing</> is a tool to measure the timing overhead
3131 on your system and confirm that the system time never moves backwards.
32- Systems that are slow to collect timing data can give less accurate
32+ Systems that are slow to collect timing data can give less accurate
3333 <command>EXPLAIN ANALYZE</command> results.
3434 </para>
3535 </refsect1>
6868 <title>Interpreting results</title>
6969
7070 <para>
71- Good results will show most (>90%) individual timing calls take less
72- than one microsecond. Average per loop overhead will be even lower,
73- below 100 nanoseconds. This example from an Intel i7-860 system using
74- a TSC clock source shows excellent performance:
75- </para>
71+ Good results will show most (>90%) individual timing calls take less than
72+ one microsecond. Average per loop overhead will be even lower, below 100
73+ nanoseconds. This example from an Intel i7-860 system using a TSC clock
74+ source shows excellent performance:
7675
7776<screen>
7877Testing timing overhead for 3 seconds.
@@ -85,56 +84,55 @@ Histogram of timing durations:
8584 2: 2999652 3.59518%
8685 1: 80435604 96.40465%
8786</screen>
87+ </para>
8888
8989 <para>
90- Note that different units are used for the per loop time than the
91- histogram. The loop can have resolution within a few nanoseconds
92- (nsec), while the individual timing calls can only resolve down to
93- one microsecond (usec).
90+ Note that different units are used for the per loop time than the
91+ histogram. The loop can have resolution within a few nanoseconds (nsec),
92+ while the individual timing calls can only resolve down to one microsecond
93+ (usec).
9494 </para>
9595
9696 </refsect2>
9797 <refsect2>
9898 <title>Measuring executor timing overhead</title>
9999
100100 <para>
101- When the query executor is running a statement using
102- <command>EXPLAIN ANALYZE</command>, individual operations are
103- timed as well as showing a summary. The overhead of your system
104- can be checked by counting rows with the psql program:
105- </para>
101+ When the query executor is running a statement using
102+ <command>EXPLAIN ANALYZE</command>, individual operations are timed as well
103+ as showing a summary. The overhead of your system can be checked by
104+ counting rows with the psql program:
106105
107106<screen>
108107CREATE TABLE t AS SELECT * FROM generate_series(1,100000);
109108\timing
110109SELECT COUNT(*) FROM t;
111110EXPLAIN ANALYZE SELECT COUNT(*) FROM t;
112111</screen>
112+ </para>
113113
114114 <para>
115- The i7-860 system measured runs the count query in 9.8 ms while
116- the <command>EXPLAIN ANALYZE</command> version takes 16.6 ms,
117- each processing just over 100,000 rows. That 6.8 ms difference
118- means the timing overhead per row is 68 ns, about twice what
119- pg_test_timing estimated it would be. Even that relatively
120- small amount of overhead is making the fully timed count statement
121- take almost 70% longer. On more substantial queries, the
122- timing overhead would be less problematic.
115+ The i7-860 system measured runs the count query in 9.8 ms while
116+ the <command>EXPLAIN ANALYZE</command> version takes 16.6 ms, each
117+ processing just over 100,000 rows. That 6.8 ms difference means the timing
118+ overhead per row is 68 ns, about twice what pg_test_timing estimated it
119+ would be. Even that relatively small amount of overhead is making the fully
120+ timed count statement take almost 70% longer. On more substantial queries,
121+ the timing overhead would be less problematic.
123122 </para>
124123
125124 </refsect2>
126125
127126 <refsect2>
128127 <title>Changing time sources</title>
129128 <para>
130- On some newer Linux systems, it's possible to change the clock
131- source used to collect timing data at any time. A second example
132- shows the slowdown possible from switching to the slower acpi_pm
133- time source, on the same system used for the fast results above:
134- </para>
129+ On some newer Linux systems, it's possible to change the clock source used
130+ to collect timing data at any time. A second example shows the slowdown
131+ possible from switching to the slower acpi_pm time source, on the same
132+ system used for the fast results above:
135133
136134<screen>
137- # cat /sys/devices/system/clocksource/clocksource0/available_clocksource
135+ # cat /sys/devices/system/clocksource/clocksource0/available_clocksource
138136tsc hpet acpi_pm
139137# echo acpi_pm > /sys/devices/system/clocksource/clocksource0/current_clocksource
140138# pg_test_timing
@@ -147,45 +145,43 @@ Histogram of timing durations:
147145 2: 2990371 72.05956%
148146 1: 1155682 27.84870%
149147</screen>
148+ </para>
150149
151150 <para>
152- In this configuration, the sample <command>EXPLAIN ANALYZE</command>
153- above takes 115.9 ms. That's 1061 nsec of timing overhead, again
154- a small multiple of what's measured directly by this utility.
155- That much timing overhead means the actual query itself is only
156- taking a tiny fraction of the accounted for time, most of it
157- is being consumed in overhead instead. In this configuration,
158- any <command>EXPLAIN ANALYZE</command> totals involving many
159- timed operations would be inflated significantly by timing overhead.
151+ In this configuration, the sample <command>EXPLAIN ANALYZE</command> above
152+ takes 115.9 ms. That's 1061 nsec of timing overhead, again a small multiple
153+ of what's measured directly by this utility. That much timing overhead
154+ means the actual query itself is only taking a tiny fraction of the
155+ accounted for time, most of it is being consumed in overhead instead. In
156+ this configuration, any <command>EXPLAIN ANALYZE</command> totals involving
157+ many timed operations would be inflated significantly by timing overhead.
160158 </para>
161159
162160 <para>
163- FreeBSD also allows changing the time source on the fly, and
164- it logs information about the timer selected during boot:
165- </para>
161+ FreeBSD also allows changing the time source on the fly, and it logs
162+ information about the timer selected during boot:
166163
167164<screen>
168- dmesg | grep "Timecounter"
165+ dmesg | grep "Timecounter"
169166sysctl kern.timecounter.hardware=TSC
170167</screen>
168+ </para>
171169
172170 <para>
173- Other systems may only allow setting the time source on boot.
174- On older Linux systems the "clock" kernel setting is the only way
175- to make this sort of change. And even on some more recent ones,
176- the only option you'll see for a clock source is "jiffies". Jiffies
177- are the older Linux software clock implementation, which can have
178- good resolution when it's backed by fast enough timing hardware,
179- as in this example:
180- </para>
171+ Other systems may only allow setting the time source on boot. On older
172+ Linux systems the "clock" kernel setting is the only way to make this sort
173+ of change. And even on some more recent ones, the only option you'll see
174+ for a clock source is "jiffies". Jiffies are the older Linux software clock
175+ implementation, which can have good resolution when it's backed by fast
176+ enough timing hardware, as in this example:
181177
182178<screen>
183- $ cat /sys/devices/system/clocksource/clocksource0/available_clocksource
184- jiffies
179+ $ cat /sys/devices/system/clocksource/clocksource0/available_clocksource
180+ jiffies
185181$ dmesg | grep time.c
186182time.c: Using 3.579545 MHz WALL PM GTOD PIT/TSC timer.
187183time.c: Detected 2400.153 MHz processor.
188- $./ pg_test_timing
184+ $ pg_test_timing
189185Testing timing overhead for 3 seconds.
190186Per timing duration including loop overhead: 97.75 ns
191187Histogram of timing durations:
@@ -197,76 +193,74 @@ Histogram of timing durations:
197193 2: 2993204 9.75277%
198194 1: 27694571 90.23734%
199195</screen>
196+ </para>
200197
201198 </refsect2>
199+
202200 <refsect2>
203201 <title>Clock hardware and timing accuracy</title>
204202
205203 <para>
206- Collecting accurate timing information is normally done on computers
207- using hardware clocks with various levels of accuracy. With some
208- hardware the operating systems can pass the system clock time almost
209- directly to programs. A system clock can also be derived from a chip
210- that simply provides timing interrupts, periodic ticks at some known
211- time interval. In either case, operating system kernels provide
212- a clock source that hides these details. But the accuracy of that
213- clock source and how quickly it can return results varies based
214- on the underlying hardware.
204+ Collecting accurate timing information is normally done on computers using
205+ hardware clocks with various levels of accuracy. With some hardware the
206+ operating systems can pass the system clock time almost directly to
207+ programs. A system clock can also be derived from a chip that simply
208+ provides timing interrupts, periodic ticks at some known time interval. In
209+ either case, operating system kernels provide a clock source that hides
210+ these details. But the accuracy of that clock source and how quickly it can
211+ return results varies based on the underlying hardware.
215212 </para>
216213
217214 <para>
218- Inaccurate time keeping can result in system instability. Test
219- any change to the clock source very carefully. Operating system
220- defaults are sometimes made to favor reliability over best
221- accuracy. And if you are using a virtual machine, look into the
222- recommended time sources compatible with it. Virtual hardware
223- faces additional difficulties when emulating timers, and there
224- are often per operating system settings suggested by vendors.
215+ Inaccurate time keeping can result in system instability. Test any change
216+ to the clock source very carefully. Operating system defaults are sometimes
217+ made to favor reliability over best accuracy. And if you are using a virtual
218+ machine, look into the recommended time sources compatible with it. Virtual
219+ hardware faces additional difficulties when emulating timers, and there are
220+ often per operating system settings suggested by vendors.
225221 </para>
226222
227223 <para>
228- The Time Stamp Counter (TSC) clock source is the most accurate one
229- available on current generation CPUs. It's the preferred way to track
230- the system time when it's supported by the operating system and the
231- TSC clock is reliable. There are several ways that TSC can fail
232- to provide an accurate timing source, making it unreliable. Older
233- systems can have a TSC clock that varies based on the CPU
234- temperature, making it unusable for timing. Trying to use TSC on some
235- older multi-core CPUs can give a reported time that's inconsistent
236- among multiple cores. This can result in the time going backwards, a
237- problem this program checks for. And even the newest systems can
238- fail to provide accurate TSC timing with very aggressive power saving
239- configurations.
224+ The Time Stamp Counter (TSC) clock source is the most accurate one available
225+ on current generation CPUs. It's the preferred way to track the system time
226+ when it's supported by the operating system and the TSC clock is
227+ reliable. There are several ways that TSC can fail to provide an accurate
228+ timing source, making it unreliable. Older systems can have a TSC clock that
229+ varies based on the CPU temperature, making it unusable for timing. Trying
230+ to use TSC on some older multi-core CPUs can give a reported time that's
231+ inconsistent among multiple cores. This can result in the time going
232+ backwards, a problem this program checks for. And even the newest systems
233+ can fail to provide accurate TSC timing with very aggressive power saving
234+ configurations.
240235 </para>
241236
242237 <para>
243- Newer operating systems may check for the known TSC problems and
244- switch to a slower, more stable clock source when they are seen.
245- If your system supports TSC time but doesn't default to that, it
246- may be disabled for a good reason. And some operating systems may
247- not detect all the possible problems correctly, or will allow using
248- TSC even in situations where it's known to be inaccurate.
238+ Newer operating systems may check for the known TSC problems and switch to a
239+ slower, more stable clock source when they are seen. If your system
240+ supports TSC time but doesn't default to that, it may be disabled for a good
241+ reason. And some operating systems may not detect all the possible problems
242+ correctly, or will allow using TSC even in situations where it's known to be
243+ inaccurate.
249244 </para>
250245
251246 <para>
252- The High Precision Event Timer (HPET) is the preferred timer on
253- systems where it's available and TSC is not accurate. The timer
254- chip itself is programmable to allow up to 100 nanosecond resolution,
255- but you may not see that much accuracy in your system clock.
247+ The High Precision Event Timer (HPET) is the preferred timer on systems
248+ where it's available and TSC is not accurate. The timer chip itself is
249+ programmable to allow up to 100 nanosecond resolution, but you may not see
250+ that much accuracy in your system clock.
256251 </para>
257252
258253 <para>
259- Advanced Configuration and Power Interface (ACPI) provides a
260- Power Management (PM) Timer, which Linux refers to as the acpi_pm.
261- The clock derived from acpi_pm will at best provide 300 nanosecond
262- resolution.
254+ Advanced Configuration and Power Interface (ACPI) provides a Power
255+ Management (PM) Timer, which Linux refers to as the acpi_pm. The clock
256+ derived from acpi_pm will at best provide 300 nanosecond resolution.
263257 </para>
264258
265259 <para>
266- Timers used on older PC hardware including the 8254 Programmable
267- Interval Timer (PIT), the real-time clock (RTC), the Advanced
268- Programmable Interrupt Controller (APIC) timer, and the Cyclone
269- timer. These timers aim for millisecond resolution.
260+ Timers used on older PC hardware including the 8254 Programmable Interval
261+ Timer (PIT), the real-time clock (RTC), the Advanced Programmable Interrupt
262+ Controller (APIC) timer, and the Cyclone timer. These timers aim for
263+ millisecond resolution.
270264 </para>
271265 </refsect2>
272266 </refsect1>