Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Commit43bb028

Browse files
committed
Add disk rotation idea to WAL todo emails.
1 parent0684043 commit43bb028

File tree

1 file changed

+139
-0
lines changed
  • doc/TODO.detail

1 file changed

+139
-0
lines changed

‎doc/TODO.detail/wal

Lines changed: 139 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2698,3 +2698,142 @@ TIP 4: Don't 'kill -9' the postmaster
26982698

26992699

27002700

2701+
From pgsql-hackers-owner+M31893@postgresql.org Fri Nov 15 11:25:58 2002
2702+
Return-path: <pgsql-hackers-owner+M31893@postgresql.org>
2703+
Received: from postgresql.org (postgresql.org [64.49.215.8])
2704+
by candle.pha.pa.us (8.11.6/8.10.1) with ESMTP id gAFHPvR10276
2705+
for <pgman@candle.pha.pa.us>; Fri, 15 Nov 2002 12:25:57 -0500 (EST)
2706+
Received: from localhost (postgresql.org [64.49.215.8])
2707+
by postgresql.org (Postfix) with ESMTP
2708+
id A2D5A4774A1; Fri, 15 Nov 2002 11:34:54 -0500 (EST)
2709+
Received: from postgresql.org (postgresql.org [64.49.215.8])
2710+
by postgresql.org (Postfix) with SMTP
2711+
id 5E898477132; Fri, 15 Nov 2002 11:15:45 -0500 (EST)
2712+
Received: from localhost (postgresql.org [64.49.215.8])
2713+
by postgresql.org (Postfix) with ESMTP id 90CF1475B85
2714+
for <pgsql-hackers@postgresql.org>; Mon, 11 Nov 2002 15:33:47 -0500 (EST)
2715+
Received: from Curtis-Vaio (unknown [63.164.0.45])
2716+
by postgresql.org (Postfix) with SMTP id C6CB1475A3F
2717+
for <pgsql-hackers@postgresql.org>; Mon, 11 Nov 2002 15:33:46 -0500 (EST)
2718+
Received: from [127.0.0.1] by Curtis-Vaio
2719+
(ArGoSoft Mail Server Freeware, Version 1.8 (1.8.1.7)); Mon, 11 Nov 2002 16:33:42 -0400
2720+
From: "Curtis Faith" <curtis@galtcapital.com>
2721+
To: <pgsql-hackers@postgresql.org>
2722+
Subject: [HACKERS] 500 tpsQL + WAL log implementation
2723+
Date: Mon, 11 Nov 2002 16:33:41 -0400
2724+
Message-ID: <DMEEJMCDOJAKPPFACMPMCEBMCFAA.curtis@galtcapital.com>
2725+
MIME-Version: 1.0
2726+
Content-Type: text/plain;
2727+
charset="iso-8859-1"
2728+
Content-Transfer-Encoding: 7bit
2729+
X-Priority: 3 (Normal)
2730+
X-MSMail-Priority: Normal
2731+
X-Mailer: Microsoft Outlook IMO, Build 9.0.2416 (9.0.2911.0)
2732+
X-MimeOLE: Produced By Microsoft MimeOLE V5.00.2919.6700
2733+
Importance: Normal
2734+
X-Virus-Scanned: by AMaViS new-20020517
2735+
Precedence: bulk
2736+
Sender: pgsql-hackers-owner@postgresql.org
2737+
X-Virus-Scanned: by AMaViS new-20020517
2738+
Status: ORr
2739+
2740+
I have been experimenting with empirical tests of file system and device
2741+
level writes to determine the actual constraints in order to speed up the WAL
2742+
logging code.
2743+
2744+
Using a raw file partition and a time-based technique for determining the
2745+
optimal write position, I am able to get 8K writes physically written to disk
2746+
synchronously in the range of 500 to 650 writes per second using FreeBSD raw
2747+
device partitions on IDE disks (with write cache disabled). I will be
2748+
testing it soon under linux with 10,00RPM SCSI which should be even better.
2749+
It is my belief that the mechanism used to achieve these speeds could be
2750+
incorporated into the existing WAL logging code as an abstraction that looks
2751+
to the WAL code just like the file level access currently used. The current
2752+
speeds are limited by the speed of a single disk rotation. For a 7,200 RPM
2753+
disk this is 120/second, for a 10,000 RPM disk this is 166.66/second
2754+
2755+
The mechanism works by adjusting the seek offset of the write by using
2756+
gettimeofday to determine approximately where the disk head is in its
2757+
rotation. The mechanism does not use any AIO calls.
2758+
2759+
Assuming the following:
2760+
2761+
1) Disk rotation time is 8.333ms or 8333us (7200 RPM).
2762+
2763+
2) A write at offset 1,500K completes at system time 103s 000ms 000us
2764+
2765+
3) A new write is requested at system time 103s 004ms 166us
2766+
2767+
4) A 390K per rotation alignment of the data on the disk.
2768+
2769+
5) A write must be sent at least 20K ahead of the current head position to
2770+
ensure that it is written in less than one rotation.
2771+
2772+
It can be determined from the above that a write for an offset of something
2773+
slightly more than 195K past the last write, or offset 1,695K will be ahead
2774+
of the current location of the head and will therefore complete in less than
2775+
a single rotation's time.
2776+
2777+
The disk specific metrics (rotation speed, bytes per rotation, base write
2778+
time, etc.) can be derived empirically through a tester program that would
2779+
take a few minutes to run and which could be run at log setup time.
2780+
2781+
The obvious problem with the above mechanism is that the WAL log needs to be
2782+
able to read from the log file in transaction order during recovery. This
2783+
could be provided for using an abstraction that prepends the logical order
2784+
for each block written to the disk and makes sure that the log blocks contain
2785+
either a valid logical order number or some other marker indicating that the
2786+
block is not being used.
2787+
2788+
A bitmap of blocks that have already been used would be kept in memory for
2789+
quickly determining the next set of possible unused blocks but this bitmap
2790+
would not need to be written to disk except during normal shutdown since in
2791+
the even of a failure the bitmaps would be reconstructed by reading all the
2792+
blocks from the disk.
2793+
2794+
Checkpointing and something akin to log rotation could be handled using this
2795+
mechanism as well.
2796+
2797+
So, MY REAL QUESTION is whether or not this is the sort of speed improvement
2798+
that warrants the work of writing the required abstraction layer and making
2799+
this very robust. The WAL code should remain essentially unchanged, with
2800+
perhaps new calls for the five or six routines used to access the log files,
2801+
and handle the equivalent of log rotation for raw device access. These new
2802+
calls would either use the current file based implementation or the new
2803+
logging mechanism depending on the configuration.
2804+
2805+
I anticipate that the extra work required for a PostgreSQL administrator to
2806+
use the proposed logging mechanism would be to:
2807+
2808+
1) Create a raw device partition of the appropriate size
2809+
2) Run the metrics tester for that device partition
2810+
3) Set the appropriate configuration parameters to indicate raw WAL logging
2811+
2812+
I anticipate that the additional space requirements for this system would be
2813+
on the order of 10% to 15% beyond the current file-based implementation's
2814+
requirements.
2815+
2816+
So, is this worth doing? Would a robust implementation likely be accepted for
2817+
7.4 assuming it can demonstrate speed improvements in the range of 500tps?
2818+
2819+
- Curtis
2820+
2821+
2822+
2823+
2824+
2825+
2826+
2827+
2828+
2829+
2830+
2831+
2832+
2833+
2834+
2835+
2836+
2837+
---------------------------(end of broadcast)---------------------------
2838+
TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
2839+

0 commit comments

Comments
 (0)

[8]ページ先頭

©2009-2025 Movatter.jp