Movatterモバイル変換
[0]ホーム
[Python-Dev] 51 Million calls to _PyUnicodeUCS2_IsLinebreak() (???)
Walter Dörwaldwalter at livinglogic.de
Wed Aug 24 13:47:58 CEST 2005
Martin v. Löwis wrote:> Walter Dörwald wrote:>>>This is caused by the chances to the codecs in 2.4. Basically the codecs>>no longer rely on C's readline() to do line splitting (which can't work>>for UTF-16), but do it themselves (via unicode.splitlines()).>> That explains why you get any calls to IsLineBreak; it doesn't explain> why you get so many of them.>> I investigated this a bit, and one issue seems to be that> StreamReader.readline performs splitline on the entire input, only to> fetch the first line. It then joins the rest for later processing.> In addition, it also performs splitlines on a single line, just to> strip any trailing line breaks.This is because unicode.splitlines() is the only API available to Python that knows about unicode line feeds.> The net effect is that, for a file with N lines, IsLineBreak is invoked> up to N*N/2 times per character (atleast for the last character). >> So I think it would be best if Unicode characters exposed a .islinebreak> method (or, failing that, codecs just knew what the line break> characters are in Unicode 3.2), and then codecs would split off> the first line of input itself.I think a maxsplit argument (just as for unicode.split()) would help too.> [...]Bye, Walter Dörwald
More information about the Python-Devmailing list
[8]ページ先頭