This repository was archived by the owner on Sep 8, 2025. It is now read-only.

djc/couchdb-pythonPublic archive

NotificationsYou must be signed in to change notification settings
Fork84
Star206

Fix the bug where ConnectionPool cannot be used with multiprocessing#314

Open

kevinjqiu wants to merge5 commits intodjc:master

base:master

Choose a base branch

fromkevinjqiu:issue-313

Open

Fix the bug where ConnectionPool cannot be used with multiprocessing#314

kevinjqiu wants to merge5 commits intodjc:masterfromkevinjqiu:issue-313

Conversation

Copy link

kevinjqiu commentedMar 6, 2017

As discussed in#313

When couchdb-python is used with multiprocessing, you getTypeError: 'ResponseBody' object is not iterable.

This happens incouchdb.http.Session:request method:

# Read the full response for empty responses so that the connection is# in good state for the next requestifmethod=='HEAD'orresp.getheader('content-length')=='0'or \status<200orstatusin (204,304):resp.read()self.connection_pool.release(url,conn)# Buffer small non-JSON response bodieselifint(resp.getheader('content-length',sys.maxsize))<CHUNK_SIZE:data=resp.read()self.connection_pool.release(url,conn)# For large or chunked response bodies, do not buffer the full body,# and instead return a minimal file-like objectelse:data=ResponseBody(resp,self.connection_pool,url,conn)streamed=True

In this particular case, the resp object fails to match either condition and falls through to theelse clause, which causes a rawResponseBody object to be returned upstream to the client code, and when client code doesresponse['row'], it fails b/cResponseBody object does not support item indexing.

Adding aprint on theresp object reveals thatresp.getheader('content-length') isNone, and hence the secondelif is skipped.

The reason forcontent-length to beNone:httplib.HTTPConnection.begin, line 470~475:

ifself.version==9:self.length=Noneself.chunked=0self.will_close=1self.msg=HTTPMessage(StringIO())return

so HTTPConnection thinks it's connecting to a HTTP/0.9 server, even though couchdb response wasHTTP/1.1.

Tracing further, in order forself.version == 9,version returned byHTTPConnection._read_status must be9:

def_read_status(self):# Initialize with Simple-Response defaultsline=self.fp.readline(_MAXLINE+1)        ...

Putting a print statement after the line is read, and rerun the bug script:

HTP/1.1 0 O2re:CuD. Eln OTP17T0KSevr ochB/1.61(rag/)ETag: "1-b188d355f013ee97662615a5b4a85577"Traceback (most recent call last):  File "bug.py", line 36, in <module>    main()  File "bug.py", line 31, in main    docs = pool.map(query_id, ['1', '2', '3'])  File "/usr/lib/python2.7/multiprocessing/pool.py", line 251, in map    return self.map_async(func, iterable, chunksize).get()  File "/usr/lib/python2.7/multiprocessing/pool.py", line 558, in get    raise self._valueTypeError: 'ResponseBody' object is not iterable

The process got a garbled status line, even though the response from couchdb is fine. With the garbled status line,_read_status method assumes it must beHTTP/0.9 so it returnsversion==9.

So basically what we have here is a race condition where all three processes are talking to the server over the same socket at the same time. This is because the couchdb-python'sConnectionPool is created in the parent process by theSession object, which is in turn only created once perDatabase object. So in essence, all three sub-processes are sharing the same session object and the same connection pool object. Because they all talk to the same host/port combination, they all checkout the same connection object from the pool and same underlying socket is being used across all three subprocesses, and hence the bug.

The fix here is to makeConnectionPool process aware in that the connections are keyed by the current pid in addition toscheme andnetloc. This way, we make sure that sub-processes get their own separate connections.

TBH, I'm not sure this is a good implementation. Having theConnectionPool knowing about the process it's running on seems to be violating its responsibility. Feel free to suggest another better solution.

kevinjqiu mentioned this pull request

Mar 6, 2017

Bug when used with multiprocessing#313

Open

elistevens reviewed

Mar 6, 2017

View reviewed changes

couchdb/tests/client.py Outdated



		def_current_pid():
		returnmultiprocessing.current_process().pid

Copy link

elistevensMar 6, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Wouldos.getpid() make more sense here?

Copy link

Author

kevinjqiuMar 18, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Done.

Copy link

elistevens commentedMar 6, 2017

I reported this (or at least a very similar) issue back in 2011:#205

I ended up solving my issue with application level code like:

    @property    def db(self):        if self._db_pid != os.getpid():            self.db = couchdb.Database(self.url)        return self._db    @db.setter    def db(self, value):        self._db_pid = os.getpid()        self._db = value

Not pretty, but gets the job done. It would be great if this could get into the library proper.

Copy link

Author

kevinjqiu commentedMar 14, 2017

@djc Thoughts?

Copy link

Owner

djc commentedMar 16, 2017

Sorry, I've been very busy recently.

I think it looks okay. Can we doos.getpid() instead of all themultiprocessing stuff incouchdb.http? Also, would be nice if you can clean up your commits to squash the typo commit, and maybe separate the tests from the fix.