Movatterモバイル変換


[0]ホーム

URL:


[Python-Dev] Internal representation of strings and Micropython

Tim Delaneytimothy.c.delaney at gmail.com
Fri Jun 6 23:35:59 CEST 2014


On 7 June 2014 00:52, Paul Sokolovsky <pmiscml at gmail.com> wrote:> > At heart, this is exactly what the Python 3 "str" type is. The> > universal convention is "code points".>> Yes. Except for one small detail - Python3 specifies these code points> to be Unicode code points. And Unicode is a very bloated thing.>> But if we drop that "Unicode" stipulation, then it's also exactly what> MicroPython implements. Its "str" type consists of codepoints, we don't> have pet names for them yet, like Unicode does, but their numeric> values are 0-255. Note that it in no way limits encodings, characters,> or scripts which can be used with MicroPython, because just like> Unicode, it support concept of "surrogate pairs" (but we don't call it> like that) - specifically, smaller code points may comprise bigger> groupings. But unlike Unicode, we don't stipulate format, value or> other constraints on how these "surrogate pairs"-alikes are formed,> leaving that to users.I think you've missed my point.There is absolutely nothing conceptually bloaty about what a Python 3string is. It's just like a 7-bit ASCII string, except each entry can befrom a larger table. When you index into a Python 3 string, you get backexactly *one valid entry* from the Unicode code point table. That plus thelength of the string, plus the guarantee of immutability gives everythingneeded to layer the rest of the string functionality on top.There are no surrogate pairs - each code point is standalone (unlike code*units*). It is conceptually very simple. The implementation may bedifficult (if you're trying to do better than 4 bytes per code point) butthe concept is dead simple.If the MicroPython string type requires people *using* it to deal withsurrogates (i.e. indexing could return a value that is not a valid Unicodecode point) then it will have broken the conceptual simplicity of thePython 3 string type (and most certainly can't be considered in any waycompatible).Tim Delaney-------------- next part --------------An HTML attachment was scrubbed...URL: <http://mail.python.org/pipermail/python-dev/attachments/20140607/034954c2/attachment.html>


More information about the Python-Devmailing list

[8]ページ先頭

©2009-2025 Movatter.jp