Movatterモバイル変換


[0]ホーム

URL:


[Python-Dev] Internal representation of strings and Micropython (Steven D'Aprano's summary)

Jim J. Jewettjimjjewett at gmail.com
Fri Jun 6 05:54:55 CEST 2014


Steven D'Aprano wrote:> (1) I asked if it would be okay for MicroPython to *optionally* use> nominally Unicode strings limited to ASCII. Pretty much the only> response to this as been Guido saying "That would be a pretty lousy> option", and since nobody has really defended the suggestion, I think we> can assume that it's off the table.Lousy is not quite the same as forbidden.Doing it in good faith would require making the limit prominentin the documentation, and raising some sort of CharacterNotSupportedexception (or at least a warning) whenever there is an attempt tocreate a non-ASCII string, even via the C API.> (2) I asked if it would be okay ... to use an UTF-8 implementation> even though it would lead to O(N) indexing operations instead of O(1).> There's been some opposition to this, including Guido's:[Non-ASCII character removed.]It is bad when quirks -- even good quirks -- of one implementation leadpeople to write code that will perform badly on a different Pythonimplementation.  Cpython has at least delayed obvious optimizations forthis reason.  Changing idiomatic operations from O(1) to O(N) is bigenough to cause a concern.That said, the target environment itself apparently limits N to smallenough that the problem should be mostly theoretical.  If you want tobe good citizens, then do put a note in the documentation warning thatparticularly long strings are likely to cause performance issues uniqueto the MicroPython implementation.(Frankly, my personal opinion is that if you're really optimizing forspace, then long strings will start getting awkward long before N isbig enough for algorithmic complexity to overcome constant factors.)> ... those strings will need to be transcoded to UTF-8 before they> can be written or printed, so keeping them as UTF-8 ...That all assumes that the external world is using UTF-8 anyhow.Which is more likely to be true if you document it as a limitationof MicroPython.> ... but many strings may never be written out:    print(prefix + s[1:].strip().lower().center(80) + suffix)> creates five strings that are never written out and one that is.But looking at the actual strings -- UTF-8 doesn't really hurtmuch.  Only the slice and center() are more complex, and for astring less than 80 characters long, O(N) is irrelevant.-jJ--If there are still threading problems with my replies, pleaseemail me with details, so that I can try to resolve them.  -jJ


More information about the Python-Devmailing list

[8]ページ先頭

©2009-2025 Movatter.jp