
This issue trackerhas been migrated toGitHub, and is currentlyread-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.
Created on2008-09-09 05:37 byloewis, last changed2022-04-11 14:56 byadmin. This issue is nowclosed.
| Files | ||||
|---|---|---|---|---|
| File name | Uploaded | Description | Edit | |
| ucd51.diff.bz2 | loewis,2008-09-09 05:37 | |||
| Messages (11) | |||
|---|---|---|---|
| msg72821 -(view) | Author: Martin v. Löwis (loewis)*![]() | Date: 2008-09-09 05:37 | |
This is a patch to update the Unicode database. It's mostly the importeddata, but there were two code changes:- 5.1 changes the "mirrored" property for a character (U+0F3A), and thedelta-to-3.2 code did not support that. I added a field intohange_record to support that kind of change.- 5.1 also added a character (U+1d79) whose upper-case version is faroff (U+A77D), triggering a complaint that the delta can't be representedin 16 bits. I fixed that adding a flag into the ctype record indicatingthat deltas aren't used for that record.Fredrik, can you please review these changes? | |||
| msg72941 -(view) | Author: Martin v. Löwis (loewis)*![]() | Date: 2008-09-10 04:51 | |
Guido, would you like to review? | |||
| msg72946 -(view) | Author: Fredrik Lundh (effbot)*![]() | Date: 2008-09-10 07:06 | |
The patch looks fine to me (assuming that I didn't miss something critical hidden among the large table diffs).(I'd probably named the "NODELTA" flag after what it is rather than what it isn't, but I cannot think of a short replacement right now, so let's leave it as it is.) | |||
| msg72950 -(view) | Author: Marc-Andre Lemburg (lemburg)*![]() | Date: 2008-09-10 09:34 | |
Reviewed the patch: looks fine to me. One nit: the unicodedata module doc-string must be updated to 5.1.0 aswell. Ditto for the documentation. | |||
| msg72962 -(view) | Author: Martin v. Löwis (loewis)*![]() | Date: 2008-09-10 14:11 | |
I have now committed the change asr66362 (including the missingdocumentation updates), and ported it to 3.0 asr66363 (where I had tochange the flag value and regenerate the data, as the flag 0x100 wasalready taken). | |||
| msg72973 -(view) | Author: Guido van Rossum (gvanrossum)*![]() | Date: 2008-09-10 16:11 | |
2008/9/10 Martin v. Löwis <report@bugs.python.org>:> I have now committed the change asr66362 (including the missing> documentation updates), and ported it to 3.0 asr66363 (where I had to> change the flag value and regenerate the data, as the flag 0x100 was> already taken).That's unfortunate -- perhaps the 2.6 flag and data can be brought in line,to make future merges easier? | |||
| msg72979 -(view) | Author: Martin v. Löwis (loewis)*![]() | Date: 2008-09-10 18:09 | |
> That's unfortunate -- perhaps the 2.6 flag and data can be brought in> line, to make future merges easier?I thought of that, however, merging the databases themselves would stillnot be possible: the 3.0 database has the flags set in many records,which causes merge conflicts (as the 2.x database has different flagvalues). So regenerating the database is necessary, anyway.In future changes, it might be useful to have new flags with the samevalues, so that such patches can be merged without conflicts in thegenerator. | |||
| msg72987 -(view) | Author: Daniel Diniz (ajaksu2)*![]() | Date: 2008-09-10 21:31 | |
#66363 breaks test_unicode and test_format on 3.0. | |||
| msg72997 -(view) | Author: Amaury Forgeot d'Arc (amaury.forgeotdarc)*![]() | Date: 2008-09-10 23:54 | |
Code point 0x0370 is now a printable character.r66381 corrected the failures by simply changing it to 0x0378, until the next unicodedata upgrade...I wonder if there is a value that is guaranteed to stay non-printable. | |||
| msg73000 -(view) | Author: Guido van Rossum (gvanrossum)*![]() | Date: 2008-09-11 01:08 | |
2008/9/10 Amaury Forgeot d'Arc <report@bugs.python.org>:> Code point 0x0370 is now a printable character.>r66381 corrected the failures by simply changing it to 0x0378, until the> next unicodedata upgrade...> I wonder if there is a value that is guaranteed to stay non-printable.The control characters? | |||
| msg73005 -(view) | Author: Martin v. Löwis (loewis)*![]() | Date: 2008-09-11 06:05 | |
> The control characters?Indeed, also the private-use characters. test_unicode explicitlycomments that the test is about unassigned characters, althoughI don't understand the purpose of that test (it then also testsa surrogate character, which is also guaranteed to remainunprintable).One of the characters that is guaranteed to remain unassigned isU+FFFE (and its mirrors in other planes, e.g. U+1FFFE, ...).This guarantee is made to support the BOM. Along with U+FFFF,these are non-characters.#765036 once suggested that Python shouldrefuse to represent them at all, but that proposal was rejected. | |||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022-04-11 14:56:38 | admin | set | github: 48061 |
| 2008-09-11 06:05:22 | loewis | set | messages: +msg73005 |
| 2008-09-11 01:09:44 | gvanrossum | set | files: -unnamed |
| 2008-09-11 01:08:53 | gvanrossum | set | files: +unnamed messages: +msg73000 |
| 2008-09-10 23:54:54 | amaury.forgeotdarc | set | nosy: +amaury.forgeotdarc messages: +msg72997 |
| 2008-09-10 21:31:01 | ajaksu2 | set | nosy: +ajaksu2 messages: +msg72987 versions: + Python 3.0 |
| 2008-09-10 18:09:23 | loewis | set | messages: +msg72979 |
| 2008-09-10 16:18:10 | gvanrossum | set | files: -unnamed |
| 2008-09-10 16:11:42 | gvanrossum | set | files: +unnamed messages: +msg72973 |
| 2008-09-10 14:11:27 | loewis | set | status: open -> closed resolution: accepted messages: +msg72962 |
| 2008-09-10 09:34:27 | lemburg | set | nosy: +lemburg messages: +msg72950 |
| 2008-09-10 07:06:13 | effbot | set | messages: +msg72946 |
| 2008-09-10 04:51:42 | loewis | set | assignee:effbot ->gvanrossum messages: +msg72941 nosy: +gvanrossum |
| 2008-09-09 05:39:59 | loewis | set | keywords: +needs review |
| 2008-09-09 05:39:54 | loewis | set | keywords: +patch, -needs review |
| 2008-09-09 05:37:53 | loewis | create | |