Movatterモバイル変換


[0]ホーム

URL:


[Python-Dev] 51 Million calls to _PyUnicodeUCS2_IsLinebreak() (???)

Keir Mierlekeir at cs.toronto.edu
Tue Aug 23 22:10:21 CEST 2005


Hi, I'm working on Argon (http://www.third-bit.com/trac/argon) with GregWilson this summerWe're having a very strange problem with Python's unicode parsing of sourcefiles. Basically, our CGI script was running extremely slowly on our productionbox (a pokey dual-Xeon 3GHz w/ 4GB RAM and 15K SCSI drives). Slow to the tuneof 6-10 seconds per request. I eventually tracked this down to imports of oursource tree; the actual request was completing in 300ms, the rest of the timewas spent in __import__.After doing some gprof profiling, I discovered _PyUnicodeUCS2_IsLinebreak wasgetting called 51 million times. Our code is 1.2 million characters, so Ihardly think it makes sense to call IsLinebreak 50 times for each character;and we're not even importing our entire source tree on every invocation.Our code is a fork of Trac, and originally had these lines at the top:# -*- coding: iso8859-1 -*-  This made me suspicious, so I removed all of them. The CGI execution timeimmediately dropped to ~1 second. gprof revealed that_PyUnicodeUCS2_IsLinebreak is not called at all anymore.Now that our code works fast enough, I don't really care about this, but Ithought python-dev might want to know something weird is going on with unicodesplitlines.I documented my investigation of this problem; if anyone wants further details,just email me. (I'm not on python-dev)http://www.third-bit.com/trac/argon/ticket/525Thanks in advance,Keir


More information about the Python-Devmailing list

[8]ページ先頭

©2009-2025 Movatter.jp