Corrigendum #5: Normalization Idempotency
| Corrigendum | Effective Date | Applicable Versions | Fixed Version | Result Documented In: |
|---|
| Corrigendum #5: Normalization Idempotency | 2005-Feb-07 [102-C3, PRI #61, PRI #29] | 3.0.0 to 4.0.1 | 4.1.0 2005-March | UAX #15 |
Background
The language of the of the specification ofUAX #15: Unicode Normalization Forms (citing Version 4.0) for forms NFC and NFKC is not logically self-consistentin The Unicode Standard, Versions 3.0 through 4.0.1. Programs that dependon such logical consistency could be subject to security problems untilfixed, although as yet no realistic scenarios are known that would presentsuch problems. The problem text occurs in Definition D2, which defines what it means for a character to be blocked. This corrigendum provides a textual fix for this problem.
The change will not have an impact on real data found in practice (with the possible exception of test cases for the algorithm itself), because the affected sequences do not constitute well-formed text in any known language.
For more background information, see Public Review Issue #29, Normalization Issue.
Changes to the Text of UAX #15
Whenever this corrigendum is applied to a version of Unicode from Unicode3.0.0 to Unicode 4.0.1, the text for definition D2 inUAX #15 is changed by adding two words(underlined here), so that it has the following wording:
D2.In any character sequence beginning with a starter S, a character C isblocked from S if and only if there is some character B between S and C, and either B is a starter or it has the sameor higher combining class as C.
Explanatory text on the implications of this corrigendum for implementations can be found in UAX #15: Unicode Normalization Forms in Section 3.3, Guaranteeing Process Stability and Section 20, Corrigendum 5 Sequences.