Movatterモバイル変換


[0]ホーム

URL:


[Python-Dev] 51 Million calls to _PyUnicodeUCS2_IsLinebreak() (???)

M.-A. Lemburgmal at egenix.com
Wed Aug 24 12:27:45 CEST 2005


Walter Dörwald wrote:> I wonder if we should switch back to a simple readline() implementation> for those codecs that don't require the current implementation> (basically every charmap codec).That would be my preference as well. The 2.4 .readline() approachis really only needed for codecs that have to deal with encodingsthat:a) use multi-byte formats, orb) support more line-end formats than just CR, CRLF, LF, orc) are stateful.This can easily be had by using a mix-in class forcodecs which do need the buffered .readline() approach.> AFAIK source files are opened in> universal newline mode, so at least we'd get proper treatment of "\n",> "\r" and "\r\n" line ends, but we'd loose u"\x1c", u"\x1d", u"\x1e",> u"\x85", u"\u2028" and u"\u2029" (which are line terminators according> to unicode.splitlines()).While the Unicode standard defines these characters as lineend code points, I think their definition does not necessarilyapply to data that is converted from a certain encoding toUnicode, so that's not a big loss.E.g. in ASCII or Latin-1, FILE, GROUP and RECORDSEPARATOR and NEXT LINE characters (0x1c, 0x1d, 0x1e, 0x85)are not interpreted as line end characters.Furthermore, we had no reports of anyone complaining inPython 1.6, 2.0 - 2.3 that line endings were not detectedproperly. All these Python versions relied on the stream's.readline() method to get the next line. The only bug reportswe had were for UTF-16 which falls into the abovecategory a) and did not support .readline() until Python 2.4.A note on the performance of _PyUnicode_IsLinebreak():in Python 2.0 Fredrik changed this to use the two steplookup (reducing the size of the lookup tables considerably).I think it's worthwhile reconsidering this approach forcharacter type queries that do no involve a huge numberof code points.In Python 1.6 the function looked like this (and wasinlined by the compiler using its own fast lookuptable):int _PyUnicode_IsLinebreak(register const Py_UNICODE ch){    switch (ch) {    case 0x000A: /* LINE FEED */    case 0x000D: /* CARRIAGE RETURN */    case 0x001C: /* FILE SEPARATOR */    case 0x001D: /* GROUP SEPARATOR */    case 0x001E: /* RECORD SEPARATOR */    case 0x0085: /* NEXT LINE */    case 0x2028: /* LINE SEPARATOR */    case 0x2029: /* PARAGRAPH SEPARATOR */return 1;    default:return 0;    }}another candidate to convert back is:int _PyUnicode_IsWhitespace(register const Py_UNICODE ch){    switch (ch) {    case 0x0009: /* HORIZONTAL TABULATION */    case 0x000A: /* LINE FEED */    case 0x000B: /* VERTICAL TABULATION */    case 0x000C: /* FORM FEED */    case 0x000D: /* CARRIAGE RETURN */    case 0x001C: /* FILE SEPARATOR */    case 0x001D: /* GROUP SEPARATOR */    case 0x001E: /* RECORD SEPARATOR */    case 0x001F: /* UNIT SEPARATOR */    case 0x0020: /* SPACE */    case 0x0085: /* NEXT LINE */    case 0x00A0: /* NO-BREAK SPACE */    case 0x1680: /* OGHAM SPACE MARK */    case 0x2000: /* EN QUAD */    case 0x2001: /* EM QUAD */    case 0x2002: /* EN SPACE */    case 0x2003: /* EM SPACE */    case 0x2004: /* THREE-PER-EM SPACE */    case 0x2005: /* FOUR-PER-EM SPACE */    case 0x2006: /* SIX-PER-EM SPACE */    case 0x2007: /* FIGURE SPACE */    case 0x2008: /* PUNCTUATION SPACE */    case 0x2009: /* THIN SPACE */    case 0x200A: /* HAIR SPACE */    case 0x200B: /* ZERO WIDTH SPACE */    case 0x2028: /* LINE SEPARATOR */    case 0x2029: /* PARAGRAPH SEPARATOR */    case 0x202F: /* NARROW NO-BREAK SPACE */    case 0x3000: /* IDEOGRAPHIC SPACE */return 1;    default:return 0;    }}-- Marc-Andre LemburgeGenix.comProfessional Python Services directly from the Source  (#1, Aug 23 2005)>>> Python/Zope Consulting and Support ...http://www.egenix.com/>>> mxODBC.Zope.Database.Adapter ...http://zope.egenix.com/>>> mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/________________________________________________________________________::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! ::::


More information about the Python-Devmailing list

[8]ページ先頭

©2009-2025 Movatter.jp