@@ -1053,3 +1053,190 @@ ncm@zembu.com
10531053>
10541054> regards, tom lane
10551055
1056+ From pgsql-hackers-owner+M11649@postgresql.org Wed Aug 1 15:22:46 2001
1057+ Return-path: <pgsql-hackers-owner+M11649@postgresql.org>
1058+ Received: from postgresql.org (webmail.postgresql.org [216.126.85.28])
1059+ by candle.pha.pa.us (8.10.1/8.10.1) with ESMTP id f71JMjN09768
1060+ for <pgman@candle.pha.pa.us>; Wed, 1 Aug 2001 15:22:45 -0400 (EDT)
1061+ Received: from postgresql.org.org (webmail.postgresql.org [216.126.85.28])
1062+ by postgresql.org (8.11.3/8.11.1) with SMTP id f71JMUf62338;
1063+ Wed, 1 Aug 2001 15:22:30 -0400 (EDT)
1064+ (envelope-from pgsql-hackers-owner+M11649@postgresql.org)
1065+ Received: from sectorbase2.sectorbase.com (sectorbase2.sectorbase.com [63.88.121.62] (may be forged))
1066+ by postgresql.org (8.11.3/8.11.1) with SMTP id f71J4df57086
1067+ for <pgsql-hackers@postgresql.org>; Wed, 1 Aug 2001 15:04:40 -0400 (EDT)
1068+ (envelope-from vmikheev@SECTORBASE.COM)
1069+ Received: by sectorbase2.sectorbase.com with Internet Mail Service (5.5.2653.19)
1070+ id <PG1LSSPZ>; Wed, 1 Aug 2001 12:04:31 -0700
1071+ Message-ID: <3705826352029646A3E91C53F7189E32016705@sectorbase2.sectorbase.com>
1072+ From: "Mikheev, Vadim" <vmikheev@SECTORBASE.COM>
1073+ To: "'pgsql-hackers@postgresql.org'" <pgsql-hackers@postgresql.org>
1074+ Subject: [HACKERS] Using POSIX mutex-es
1075+ Date: Wed, 1 Aug 2001 12:04:24 -0700
1076+ MIME-Version: 1.0
1077+ X-Mailer: Internet Mail Service (5.5.2653.19)
1078+ Content-Type: text/plain;
1079+ charset="koi8-r"
1080+ Precedence: bulk
1081+ Sender: pgsql-hackers-owner@postgresql.org
1082+ Status: OR
1083+
1084+ 1. Just changed
1085+ TAS(lock) to pthread_mutex_trylock(lock)
1086+ S_LOCK(lock) to pthread_mutex_lock(lock)
1087+ S_UNLOCK(lock) to pthread_mutex_unlock(lock)
1088+ (and S_INIT_LOCK to share mutex-es between processes).
1089+
1090+ 2. pgbench was initialized with scale 10.
1091+ SUN WS 10 (512Mb), Solaris 2.6 (I'm unable to test on E4500 -:()
1092+ -B 16384, wal_files 8, wal_buffers 256,
1093+ checkpoint_segments 64, checkpoint_timeout 3600
1094+ 50 clients x 100 transactions
1095+ (after initialization DB dir was saved and before each test
1096+ copyed back and vacuum-ed).
1097+
1098+ 3. No difference.
1099+ Mutex version maybe 0.5-1 % faster (eg: 37.264238 tps vs 37.083339 tps).
1100+
1101+ So - no gain, but no performance loss "from using pthread library"
1102+ (I've also run tests with 1 client), at least on Solaris.
1103+
1104+ And so - looks like we can use POSIX mutex-es and conditional variables
1105+ (not semaphores; man pthread_cond_wait) and should implement light lmgr,
1106+ probably with priority locking.
1107+
1108+ Vadim
1109+
1110+ ---------------------------(end of broadcast)---------------------------
1111+ TIP 2: you can get off all lists at once with the unregister command
1112+ (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
1113+
1114+ From pgsql-hackers-owner+M11790@postgresql.org Sun Aug 5 14:41:34 2001
1115+ Return-path: <pgsql-hackers-owner+M11790@postgresql.org>
1116+ Received: from postgresql.org (webmail.postgresql.org [216.126.85.28])
1117+ by candle.pha.pa.us (8.10.1/8.10.1) with ESMTP id f75IfXh25356
1118+ for <pgman@candle.pha.pa.us>; Sun, 5 Aug 2001 14:41:33 -0400 (EDT)
1119+ Received: from postgresql.org.org (webmail.postgresql.org [216.126.85.28])
1120+ by postgresql.org (8.11.3/8.11.4) with SMTP id f75IfY644815;
1121+ Sun, 5 Aug 2001 14:41:34 -0400 (EDT)
1122+ (envelope-from pgsql-hackers-owner+M11790@postgresql.org)
1123+ Received: from candle.pha.pa.us (candle.navpoint.com [162.33.245.46])
1124+ by postgresql.org (8.11.3/8.11.4) with ESMTP id f75IUs641174
1125+ for <pgsql-hackers@postgresql.org>; Sun, 5 Aug 2001 14:30:54 -0400 (EDT)
1126+ (envelope-from pgman@candle.pha.pa.us)
1127+ Received: (from pgman@localhost)
1128+ by candle.pha.pa.us (8.10.1/8.10.1) id f75IUhM25071;
1129+ Sun, 5 Aug 2001 14:30:43 -0400 (EDT)
1130+ From: Bruce Momjian <pgman@candle.pha.pa.us>
1131+ Message-ID: <200108051830.f75IUhM25071@candle.pha.pa.us>
1132+ Subject: Re: [HACKERS] Idea for nested transactions / savepoints
1133+ In-Reply-To: <8173.997022088@sss.pgh.pa.us> "from Tom Lane at Aug 5, 2001 10:34:48
1134+ am"
1135+ To: Tom Lane <tgl@sss.pgh.pa.us>
1136+ Date: Sun, 5 Aug 2001 14:30:43 -0400 (EDT)
1137+ cc: PostgreSQL-development <pgsql-hackers@postgresql.org>
1138+ X-Mailer: ELM [version 2.4ME+ PL90 (25)]
1139+ MIME-Version: 1.0
1140+ Content-Transfer-Encoding: 7bit
1141+ Content-Type: text/plain; charset=US-ASCII
1142+ Precedence: bulk
1143+ Sender: pgsql-hackers-owner@postgresql.org
1144+ Status: OR
1145+
1146+ > Bruce Momjian <pgman@candle.pha.pa.us> writes:
1147+ > > My idea is that we not put UNDO information into WAL but keep a List of
1148+ > > rel ids / tuple ids in the memory of each backend and do the undo inside
1149+ > > the backend.
1150+ >
1151+ > The complaints about WAL size amount to "we don't have the disk space
1152+ > to keep track of this, for long-running transactions". If it doesn't
1153+ > fit on disk, how likely is it that it will fit in memory?
1154+
1155+ Sure, we can put on the disk if that is better. I thought the problem
1156+ with WAL undo is that you have to keep UNDO info around for all
1157+ transactions that are older than the earliest transaction. So, if I
1158+ start a nested transaction, and then sit at a prompt for 8 hours, all
1159+ WAL logs are kept for 8 hours.
1160+
1161+ We can create a WAL file for every backend, and record just the nested
1162+ transaction information. In fact, once a nested transaction finishes,
1163+ we don't need the info anymore. Certainly we don't need to flush these
1164+ to disk.
1165+
1166+ --
1167+ Bruce Momjian | http://candle.pha.pa.us
1168+ pgman@candle.pha.pa.us | (610) 853-3000
1169+ + If your life is a hard drive, | 830 Blythe Avenue
1170+ + Christ can be your backup. | Drexel Hill, Pennsylvania 19026
1171+
1172+ ---------------------------(end of broadcast)---------------------------
1173+ TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
1174+
1175+ From pgman Sun Aug 5 21:16:32 2001
1176+ Return-path: <pgman>
1177+ Received: (from pgman@localhost)
1178+ by candle.pha.pa.us (8.10.1/8.10.1) id f761GWH11356;
1179+ Sun, 5 Aug 2001 21:16:32 -0400 (EDT)
1180+ From: Bruce Momjian <pgman>
1181+ Message-ID: <200108060116.f761GWH11356@candle.pha.pa.us>
1182+ Subject: Re: [HACKERS] Idea for nested transactions / savepoints
1183+ In-Reply-To: <200108051938.f75Jchi27522@candle.pha.pa.us> "from Bruce Momjian
1184+ at Aug 5, 2001 03:38:43 pm"
1185+ To: Bruce Momjian <pgman@candle.pha.pa.us>
1186+ Date: Sun, 5 Aug 2001 21:16:32 -0400 (EDT)
1187+ cc: Tom Lane <tgl@sss.pgh.pa.us>,
1188+ PostgreSQL-development <pgsql-hackers@postgresql.org>
1189+ X-Mailer: ELM [version 2.4ME+ PL90 (25)]
1190+ MIME-Version: 1.0
1191+ Content-Transfer-Encoding: 7bit
1192+ Content-Type: text/plain; charset=US-ASCII
1193+ Status: OR
1194+
1195+ > > Bruce Momjian <pgman@candle.pha.pa.us> writes:
1196+ > > >> The complaints about WAL size amount to "we don't have the disk space
1197+ > > >> to keep track of this, for long-running transactions". If it doesn't
1198+ > > >> fit on disk, how likely is it that it will fit in memory?
1199+ > >
1200+ > > > Sure, we can put on the disk if that is better.
1201+ > >
1202+ > > I think you missed my point. Unless something can be done to make the
1203+ > > log info a lot smaller than it is now, keeping it all around until
1204+ > > transaction end is just not pleasant. Waving your hands and saying
1205+ > > that we'll keep it in a different place doesn't affect the fundamental
1206+ > > problem: if the transaction runs a long time, the log is too darn big.
1207+ >
1208+ > When you said long running, I thought you were concerned about long
1209+ > running in duration, not large transaction. Long duration in one-WAL
1210+ > setup would cause all transaction logs to be kept. Large transactions
1211+ > are another issue.
1212+ >
1213+ > One solution may be to store just the relid if many tuples are modified
1214+ > in the same table. If you stored the command counter for start/end of
1215+ > the nested transaction, it would be possible to sequential scan the
1216+ > table and undo all the affected tuples. Does that help? Again, I am
1217+ > just throwing out ideas here, hoping something will catch.
1218+
1219+ Actually, we need to keep around nested transaction UNDO information
1220+ only until the nested transaction exits to the main transaction:
1221+
1222+ BEGIN WORK;
1223+ BEGIN WORK;
1224+ COMMIT;
1225+ -- we can throw away the UNDO here
1226+ BEGIN WORK;
1227+ BEGIN WORK;
1228+ ...
1229+ COMMIT
1230+ COMMIT;
1231+ -- we can throw away the UNDO here
1232+ COMMIT;
1233+
1234+ We are using the outside transaction for our ACID capabilities, and just
1235+ using UNDO for nested transaction capability.
1236+
1237+ --
1238+ Bruce Momjian | http://candle.pha.pa.us
1239+ pgman@candle.pha.pa.us | (610) 853-3000
1240+ + If your life is a hard drive, | 830 Blythe Avenue
1241+ + Christ can be your backup. | Drexel Hill, Pennsylvania 19026
1242+