Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings
/perl5Public

memcollxfrm: Handle above-Unicode code points#22989

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Merged
khwilliamson merged 5 commits intoPerl:bleadfromkhwilliamson:locale_leak
Feb 20, 2025

Conversation

khwilliamson
Copy link
Contributor

As stated in the comments added by this commit, it is undefined behavior to call strxfrm() on above-Unicode code points, and especially calling it with Perl's invented extended UTF-8. This commit changes all such input into a legal value, replacing all above-Unicode with the highest permanently unassigned code point, U+10FFFF.

  • This set of changes may require a perldelta entry, and please state your opinion

@tonycoz
Copy link
Contributor

That looks more reasonable, though I don't see why the i386 CI is failing, I couldn't reproduce it with a-m32 build on Debian.

@khwilliamson
Copy link
ContributorAuthor

I have started a smoke-me to see what other platforms may have problems.

I suspect it is something in strcollxfrm. Is there a way to turn on -DLv for that platform?

@tonycoz
Copy link
Contributor

Is there a way to turn on -DLv for that platform?

You could add that toswitches for the fresh_perl() call, possibly repeating the call with that switch if it fails without the switch.

This value is not going to be used again.  I put in the ++ out of habit.
This creates an internal macro that skips some error checking for usewhen we don't care if it is completely well-formed or not.
The next commit will want to use the results later.
@khwilliamson
Copy link
ContributorAuthor

I looked over the code again, and realized that it copied as-is the initial portion of the string before the first bytes that needed to be translated, but did not advance the destination pointer to account for that, so that the translation overwrote the as-is portion. In the other string, no translation was needed, so the string's initial segment was intact, and was getting compared with the 10FFFF. Platforms could differ in how they lexically compare those

@tonycoz
Copy link
Contributor

Platforms could differ in how they lexically compare those

Ideally we'd test the intermediate transformation from perl string to no-NULs-no-extended-UTF-8 form, since that doesn't depend on the underlying locale implementation.

To do that we'd need to split that out into a separate function and export it, but that's not something we've generally done in core perl.

@tonycoz
Copy link
Contributor

It could change behaviour, I think it could use a brief perldelta entry.

As stated in the comments added by this commit, it is undefined behaviorto call strxfrm() on above-Unicode code points, and especially callingit with Perl's invented extended UTF-8.  This commit changes all suchinput into a legal value, replacing all above-Unicode with the highestpermanently unassigned code point, U+10FFFF.
Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment
Reviewers

@tonycoztonycoztonycoz approved these changes

Assignees
No one assigned
Labels
None yet
Projects
None yet
Milestone
No milestone
Development

Successfully merging this pull request may close these issues.

2 participants
@khwilliamson@tonycoz

[8]ページ先頭

©2009-2025 Movatter.jp