NotificationsYou must be signed in to change notification settings
Fork5
Star26

Commitd2c2551

committed

Add file.

1 parentdd47964 commitd2c2551Copy full SHA for d2c2551

File tree

1 file changed

+189

-0

lines changed

doc/TODO.detail
- async

1 file changed

+189

-0

lines changed

`‎doc/TODO.detail/async`

Lines changed: 189 additions & 0 deletions

Original file line number	Diff line number	Diff line change
`@@ -0,0 +1,189 @@`
	`1`	`+From owner-pgsql-hackers@hub.org Sun Jan 23 13:31:03 2000`
	`2`	`+Received: from renoir.op.net (root@renoir.op.net [207.29.195.4])`
	`3`	`+by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id NAA28482`
	`4`	`+for <pgman@candle.pha.pa.us>; Sun, 23 Jan 2000 13:31:01 -0500 (EST)`
	`5`	`+Received: from hub.org (hub.org [216.126.84.1]) by renoir.op.net (o1/$Revision: 1.1 $) with ESMTP id NAA08409 for <pgman@candle.pha.pa.us>; Sun, 23 Jan 2000 13:04:34 -0500 (EST)`
	`6`	`+Received: from localhost (majordom@localhost)`
	`7`	`+by hub.org (8.9.3/8.9.3) with SMTP id MAA65651;`
	`8`	`+Sun, 23 Jan 2000 12:57:33 -0500 (EST)`
	`9`	`+(envelope-from owner-pgsql-hackers)`
	`10`	`+Received: by hub.org (bulk_mailer v1.5); Sun, 23 Jan 2000 12:57:20 -0500`
	`11`	`+Received: (from majordom@localhost)`
	`12`	`+by hub.org (8.9.3/8.9.3) id MAA65548`
	`13`	`+for pgsql-hackers-outgoing; Sun, 23 Jan 2000 12:56:20 -0500 (EST)`
	`14`	`+(envelope-from owner-pgsql-hackers@postgreSQL.org)`
	`15`	`+Received: from sss2.sss.pgh.pa.us (sss.pgh.pa.us [209.114.166.2])`
	`16`	`+by hub.org (8.9.3/8.9.3) with ESMTP id MAA65492`
	`17`	`+for <pgsql-hackers@postgreSQL.org>; Sun, 23 Jan 2000 12:55:41 -0500 (EST)`
	`18`	`+(envelope-from tgl@sss.pgh.pa.us)`
	`19`	`+Received: from sss2.sss.pgh.pa.us (tgl@localhost [127.0.0.1])`
	`20`	`+by sss2.sss.pgh.pa.us (8.9.3/8.9.3) with ESMTP id MAA06211;`
	`21`	`+Sun, 23 Jan 2000 12:55:36 -0500 (EST)`
	`22`	`+To: Alfred Perlstein <bright@wintelcom.net>`
	`23`	`+cc: pgsql-hackers@postgreSQL.org`
	`24`	`+Subject: Re: pg_dump possible fix, need testers. (was: Re: [HACKERS] pg_dump disaster)`
	`25`	`+In-reply-to: <20000123022341.J26520@fw.wintelcom.net>`
	`26`	`+References: <20000122211427.C26520@fw.wintelcom.net> <200001230525.AAA08020@candle.pha.pa.us> <20000122220256.H26520@fw.wintelcom.net> <5120.948606837@sss.pgh.pa.us> <20000123022341.J26520@fw.wintelcom.net>`
	`27`	`+Comments: In-reply-to Alfred Perlstein <bright@wintelcom.net>`
	`28`	`+message dated "Sun, 23 Jan 2000 02:23:41 -0800"`
	`29`	`+Date: Sun, 23 Jan 2000 12:55:36 -0500`
	`30`	`+Message-ID: <6208.948650136@sss.pgh.pa.us>`
	`31`	`+From: Tom Lane <tgl@sss.pgh.pa.us>`
	`32`	`+Sender: owner-pgsql-hackers@postgreSQL.org`
	`33`	`+Status: ORr`
	`34`	`+`
	`35`	`+>> Um, I didn't have any trouble at all reproducing Patrick's complaint.`
	`36`	`+>> pg_dump any moderately large table (I used tenk1 from the regress`
	`37`	`+>> database) and try to load the script with psql. Kaboom.`
	`38`	`+`
	`39`	`+> This is after or before my latest patch?`
	`40`	`+`
	`41`	`+Before. I haven't updated since yesterday...`
	`42`	`+`
	`43`	`+> I can't seem to reproduce this problem,`
	`44`	`+`
	`45`	`+Odd. Maybe there is something different about the kernel's timing of`
	`46`	`+message sending on your platform. I see it very easily on HPUX 10.20,`
	`47`	`+and Patrick sees it very easily on whatever he's using (netbsd I think).`
	`48`	`+You might try varying the situation a little, say`
	`49`	`+psql mydb <dumpfile`
	`50`	`+psql -f dumpfile mydb`
	`51`	`+psql mydb`
	`52`	`+\i dumpfile`
	`53`	`+and the same with -h localhost (to get a TCP/IP connection instead of`
	`54`	`+Unix domain). At the moment (pre-patch) I see failures with the`
	`55`	`+first two of these, but not with the \i method. -h doesn't seem to`
	`56`	`+matter for me, but it might for you.`
	`57`	`+`
	`58`	`+> Telling me something is wrong without giving suggestions on how`
	`59`	`+> to fix it, nor direct pointers to where it fails doesn't help me`
	`60`	`+> one bit. You're not offering constructive critism, you're not`
	`61`	`+> even offering valid critism, you're just waving your finger at`
	`62`	`+> "problems" that you say exist but don't pin down to anything specific.`
	`63`	`+`
	`64`	`+I have been explaining it as clearly as I could. Let's try it`
	`65`	`+one more time.`
	`66`	`+`
	`67`	`+> I spent hours looking over what I did to pqFlush and pqPutnBytes`
	`68`	`+> because of what you said earlier when all the bug seems to have`
	`69`	`+> come down to is that I missed that the socket is set to non-blocking`
	`70`	`+> in all cases now.`
	`71`	`+`
	`72`	`+Letting the socket mode default to blocking will hide the problems from`
	`73`	`+existing clients that don't care about non-block mode. But people who`
	`74`	`+try to actually use the nonblock mode are going to see the same kinds of`
	`75`	`+problems that psql is exhibiting.`
	`76`	`+`
	`77`	`+> The old sequence of events that happened was as follows:`
	`78`	`+`
	`79`	`+> user sends data almost filling the output buffer...`
	`80`	`+> user sends another line of text overflowing the buffer...`
	`81`	`+> pqFlush is invoked blocking the user until the output pipe clears...`
	`82`	`+> and repeat.`
	`83`	`+`
	`84`	`+Right.`
	`85`	`+`
	`86`	`+> The nonblocking code allows sends to fail so the user can abort`
	`87`	`+> sending stuff to the backend in order to process other work:`
	`88`	`+`
	`89`	`+> user sends data almost filling the output buffer...`
	`90`	`+> user sends another line of text that may overflow the buffer...`
	`91`	`+> pqFlush is invoked,`
	`92`	`+> if the pipe can't be cleared an error is returned allowing the user to`
	`93`	`+> retry the send later.`
	`94`	`+> if the flush succeeds then more data is queued and success is returned`
	`95`	`+`
	`96`	`+But you haven't thought through the mechanics of the "error is returned`
	`97`	`+allowing the user to retry" code path clearly enough. Let's take`
	`98`	`+pqPutBytes for an example. If it returns EOF, is that a hard error or`
	`99`	`+does it just mean that the application needs to wait a while? The`
	`100`	`+application must distinguish these cases, or it will do the wrong`
	`101`	`+thing: for example, if it mistakes a hard error for "wait a while",`
	`102`	`+then it will wait forever without making any progress or producing`
	`103`	`+an error report.`
	`104`	`+`
	`105`	`+You need to provide a different return convention that indicates`
	`106`	`+what happened, say`
	`107`	`+EOF (-1)=> hard error (same as old code)`
	`108`	`+0=> OK`
	`109`	`+1=> no data was queued due to risk of blocking`
	`110`	`+And you need to guarantee that the application knows what the state is`
	`111`	`+when the can't-do-it-yet return is made; note that I specified "no data`
	`112`	`+was queued" above. If pqPutBytes might queue some of the data before`
	`113`	`+returning 1, the application is in trouble again. While you apparently`
	`114`	`+foresaw that in recoding pqPutBytes, your code doesn't actually work.`
	`115`	`+There is the minor code bug that you fail to update "avail" after the`
	`116`	`+first pqFlush call, and the much more fundamental problem that you`
	`117`	`+cannot guarantee to have queued all or none of the data. Think about`
	`118`	`+what happens if the passed nbytes is larger than the output buffer size.`
	`119`	`+You may pass the first pqFlush successfully, then get into the loop and`
	`120`	`+get a won't-block return from pqFlush in the loop. What then?`
	`121`	`+You can't simply refuse to support the case nbytes > bufsize at all,`
	`122`	`+because that will cause application failures as well (too long query`
	`123`	`+sends it into an infinite loop trying to queue data, most likely).`
	`124`	`+`
	`125`	`+A possible answer is to specify that a return of +N means "N bytes`
	`126`	`+remain unqueued due to risk of blocking" (after having queued as much`
	`127`	`+as you could). This would put the onus on the caller to update his`
	`128`	`+pointers/counts properly; propagating that into all the internal uses`
	`129`	`+of pqPutBytes would be no fun. (Of course, so far you haven't updated`
	`130`	`+any of the internal callers to behave reasonably in case of a`
	`131`	`+won't-block return; PQfn is just one example.)`
	`132`	`+`
	`133`	`+Another possible answer is to preserve pqPutBytes' old API, "queue or`
	`134`	`+bust", by the expedient of enlarging the output buffer to hold whatever`
	`135`	`+we can't send immediately. This is probably more attractive, even`
	`136`	`+though a long query might suck up a lot of space that won't get`
	`137`	`+reclaimed as long as the connection lives. If you don't do this then`
	`138`	`+you are going to have to make a lot of ugly changes in the internal`
	`139`	`+callers to deal with won't-block returns. Actually, a bulk COPY IN`
	`140`	`+would probably be the worst case --- the app could easily load data into`
	`141`	`+the buffer far faster than it could be sent. It might be best to extend`
	`142`	`+PQputline to have a three-way return and add code there to limit the`
	`143`	`+growth of the output buffer, while allowing all internal callers to`
	`144`	`+assume that the buffer is expanded when they need it.`
	`145`	`+`
	`146`	`+pqFlush has the same kind of interface design problem: the same EOF code`
	`147`	`+is returned for either a hard error or can't-flush-yet, but it would be`
	`148`	`+disastrous to treat those cases alike. You must provide a 3-way return`
	`149`	`+code.`
	`150`	`+`
	`151`	`+Furthermore, the same sort of 3-way return code convention will have to`
	`152`	`+propagate out through anything that calls pqFlush (with corresponding`
	`153`	`+documentation updates). pqPutBytes can be made to hide a pqFlush won't-`
	`154`	`+block return by trying to enlarge the output buffer, but in most other`
	`155`	`+places you won't have a choice except to punt it back to the caller.`
	`156`	`+`
	`157`	`+PQendcopy has the same interface design problem. It used to be that`
	`158`	`+(unless you passed a null pointer) PQendcopy would guarantee that`
	`159`	`+the connection was no longer in COPY state on return --- by resetting`
	`160`	`+it, if necessary. So the return code was mainly informative; the`
	`161`	`+application didn't have to do anything different if PQendcopy reported`
	`162`	`+failure. But now, a nonblocking application does need to pay attention`
	`163`	`+to whether PQendcopy completed or not --- and you haven't provided a way`
	`164`	`+for it to tell. If 1 is returned, the connection might still be in`
	`165`	`+COPY state, or it might not (PQendcopy might have reset it). If the`
	`166`	`+application doesn't distinguish these cases then it will fail.`
	`167`	`+`
	`168`	`+I also think that you want to take a hard look at the automatic "reset"`
	`169`	`+behavior upon COPY failure, since a PQreset call will block the`
	`170`	`+application until it finishes. Really, what is needed to close down a`
	`171`	`+COPY safely in nonblock mode is a pair of entry points along the line of`
	`172`	`+"PQendcopyStart" and "PQendcopyPoll", with API conventions similar to`
	`173`	`+PQresetStart/PQresetPoll. This gives you the ability to do the reset`
	`174`	`+(if one is necessary) without blocking the application. PQendcopy`
	`175`	`+itself will only be useful to blocking applications.`
	`176`	`+`
	`177`	`+> I'm sorry if they don't work for some situations other than COPY IN,`
	`178`	`+> but it's functionality that I needed and I expect to be expanded on`
	`179`	`+> by myself and others that take interest in nonblocking operation.`
	`180`	`+`
	`181`	`+I don't think that the nonblock code is anywhere near production quality`
	`182`	`+at this point. It may work for you, if you don't stress it too hard and`
	`183`	`+never have a communications failure; but I don't want to see us ship it`
	`184`	`+as part of Postgres unless these issues get addressed.`
	`185`	`+`
	`186`	`+regards, tom lane`
	`187`	`+`
	`188`	`+************`
	`189`	`+`

0 commit comments

Comments

(0)

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Commitd2c2551

File tree

1 file changed

1 file changed

`‎doc/TODO.detail/async`

0 commit comments