As discussed in#313
When couchdb-python is used with multiprocessing, you getTypeError: 'ResponseBody' object is not iterable
.
This happens incouchdb.http.Session:request
method:
# Read the full response for empty responses so that the connection is# in good state for the next requestifmethod=='HEAD'orresp.getheader('content-length')=='0'or \status<200orstatusin (204,304):resp.read()self.connection_pool.release(url,conn)# Buffer small non-JSON response bodieselifint(resp.getheader('content-length',sys.maxsize))<CHUNK_SIZE:data=resp.read()self.connection_pool.release(url,conn)# For large or chunked response bodies, do not buffer the full body,# and instead return a minimal file-like objectelse:data=ResponseBody(resp,self.connection_pool,url,conn)streamed=True
In this particular case, the resp object fails to match either condition and falls through to theelse
clause, which causes a rawResponseBody
object to be returned upstream to the client code, and when client code doesresponse['row']
, it fails b/cResponseBody
object does not support item indexing.
Adding aprint
on theresp
object reveals thatresp.getheader('content-length')
isNone
, and hence the secondelif
is skipped.
The reason forcontent-length
to beNone
:httplib.HTTPConnection.begin
, line 470~475:
ifself.version==9:self.length=Noneself.chunked=0self.will_close=1self.msg=HTTPMessage(StringIO())return
so HTTPConnection thinks it's connecting to a HTTP/0.9 server, even though couchdb response wasHTTP/1.1
.
Tracing further, in order forself.version == 9
,version
returned byHTTPConnection._read_status
must be9
:
def_read_status(self):# Initialize with Simple-Response defaultsline=self.fp.readline(_MAXLINE+1) ...
Putting a print statement after the line is read, and rerun the bug script:
HTP/1.1 0 O2re:CuD. Eln OTP17T0KSevr ochB/1.61(rag/)ETag: "1-b188d355f013ee97662615a5b4a85577"Traceback (most recent call last): File "bug.py", line 36, in <module> main() File "bug.py", line 31, in main docs = pool.map(query_id, ['1', '2', '3']) File "/usr/lib/python2.7/multiprocessing/pool.py", line 251, in map return self.map_async(func, iterable, chunksize).get() File "/usr/lib/python2.7/multiprocessing/pool.py", line 558, in get raise self._valueTypeError: 'ResponseBody' object is not iterable
The process got a garbled status line, even though the response from couchdb is fine. With the garbled status line,_read_status
method assumes it must beHTTP/0.9
so it returnsversion==9
.
So basically what we have here is a race condition where all three processes are talking to the server over the same socket at the same time. This is because the couchdb-python'sConnectionPool
is created in the parent process by theSession
object, which is in turn only created once perDatabase
object. So in essence, all three sub-processes are sharing the same session object and the same connection pool object. Because they all talk to the same host/port combination, they all checkout the same connection object from the pool and same underlying socket is being used across all three subprocesses, and hence the bug.
The fix here is to makeConnectionPool
process aware in that the connections are keyed by the current pid in addition toscheme
andnetloc
. This way, we make sure that sub-processes get their own separate connections.
TBH, I'm not sure this is a good implementation. Having theConnectionPool
knowing about the process it's running on seems to be violating its responsibility. Feel free to suggest another better solution.
As discussed in#313
When couchdb-python is used with multiprocessing, you get
TypeError: 'ResponseBody' object is not iterable
.This happens in
couchdb.http.Session:request
method:In this particular case, the resp object fails to match either condition and falls through to the
else
clause, which causes a rawResponseBody
object to be returned upstream to the client code, and when client code doesresponse['row']
, it fails b/cResponseBody
object does not support item indexing.Adding a
print
on theresp
object reveals thatresp.getheader('content-length')
isNone
, and hence the secondelif
is skipped.The reason for
content-length
to beNone
:httplib.HTTPConnection.begin
, line 470~475:so HTTPConnection thinks it's connecting to a HTTP/0.9 server, even though couchdb response was
HTTP/1.1
.Tracing further, in order for
self.version == 9
,version
returned byHTTPConnection._read_status
must be9
:Putting a print statement after the line is read, and rerun the bug script:
The process got a garbled status line, even though the response from couchdb is fine. With the garbled status line,
_read_status
method assumes it must beHTTP/0.9
so it returnsversion==9
.So basically what we have here is a race condition where all three processes are talking to the server over the same socket at the same time. This is because the couchdb-python's
ConnectionPool
is created in the parent process by theSession
object, which is in turn only created once perDatabase
object. So in essence, all three sub-processes are sharing the same session object and the same connection pool object. Because they all talk to the same host/port combination, they all checkout the same connection object from the pool and same underlying socket is being used across all three subprocesses, and hence the bug.The fix here is to make
ConnectionPool
process aware in that the connections are keyed by the current pid in addition toscheme
andnetloc
. This way, we make sure that sub-processes get their own separate connections.TBH, I'm not sure this is a good implementation. Having the
ConnectionPool
knowing about the process it's running on seems to be violating its responsibility. Feel free to suggest another better solution.