Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Addis_potential_mixed_script_confusable_char function#13

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Merged

Conversation

@crlf0710
Copy link
Collaborator

@crlf0710crlf0710 commentedMay 4, 2020
edited
Loading

This is a prototype of the data required from mixed_script_confusable lint of rustc. I'm not really sure whether we need to give more detailed data to rustc for diagnostics. (e.g. This code point is potentially confusable with which code point or which script?)

Putting those aside, maybe we can have a early review first.

Implementsis_potential_mixed_script_confusable_char function.

A few other issues raised up duringadding the actual lint -#15#16

@crlf0710crlf0710force-pushed therustc_mixed_script_confusable branch 2 times, most recently from35d49f3 to1fb9c04CompareMay 4, 2020 18:43
@crlf0710
Copy link
CollaboratorAuthor

crlf0710 commentedMay 6, 2020
edited
Loading

There's a debug boolean value in the python script (search "debug = False",https://github.com/unicode-rs/unicode-security/pull/13/files#diff-c87c196441d88317a5ea3bf97e9fde0aR536), if anyone's curious why these codepoints are confusable, you can toggle it and regenerate the table file, which will include comments on why these code points are considered mixed script confusable.

@crlf0710
Copy link
CollaboratorAuthor

The python calculation part is a little complex there. So i'll leave a brief description. The main idea is creating equivalence classes fromconfusables.txt, with the prototype as each equivalence class's representative element. Then compare each element pair within each equivalence class. If they're from different scripts, then mark each of them potentially mixed script confusable. And within that there're some special handling when the prototype has multiple code points.

Copy link
Member

@ManishearthManishearth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

It's really hard to review the python stuff, it would be nice if each function had a long comment explaining what its purpose is. currently i have to kinda figure that out from the code, and it's hard to review things when you don't know what they're doing


returnconfusables

defaliases():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

perhaps link to the original source in unicode-script so things can be manually updated if necessary

Copy link
CollaboratorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Added comments below this line.

@crlf0710
Copy link
CollaboratorAuthor

@Manishearth Will add some comments, thanks!

@crlf0710
Copy link
CollaboratorAuthor

Added some comments. Also cc@pyfisch here.

@crlf0710
Copy link
CollaboratorAuthor

Adjusted the APIs and added a very simple test.

@crlf0710crlf0710 changed the titleImplement data tables for rustc_mixed_script_confusable_detection.Addis_potential_mixed_script_confusable_char functionMay 30, 2020
Copy link
Member

@ManishearthManishearth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Overall looks good, would appreciate docs on each function

defis_script_ignored_in_mixedscript(source):
returnsource=='Zinh'orsource=='Zyyy'orsource=='Zzzz'

defprocess_mixedscript_single_to_multi(item_i,script_i,proto_lst,scripts):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

what is this function trying to do? what does it operate on? where did those rules come from?

Copy link
CollaboratorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I will try to explain in comments.

returnTrue
returnFalse

defload_potential_mixedscript_confusables(f,identifier_allowed,scripts):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

what this function returns, and what each argument is, should be documented.

Copy link
CollaboratorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

ok, i will add more comments.

@crlf0710
Copy link
CollaboratorAuthor

Added more comments to document the details.

@ManishearthManishearth merged commitfd03564 intounicode-rs:masterJun 9, 2020
@crlf0710crlf0710 deleted the rustc_mixed_script_confusable branchJune 10, 2020 01:20
Manishearth added a commit to Manishearth/rust that referenced this pull requestJun 26, 2020
…=ManishearthImplement mixed script confusable lint.This implements the mixed script confusable lint defined in RFC 2457.This is blocked onrust-lang#72069 andunicode-rs/unicode-security#13, and will need a Cargo.toml version bump after those are resolved.The lint message warning is sub-optimal for now. We'll need a mechanism to properly output  `AugmentScriptSet` to screen, this is to be added in `unicode-security` crate.r?@Manishearth
Manishearth added a commit to Manishearth/rust that referenced this pull requestJun 26, 2020
…=ManishearthImplement mixed script confusable lint.This implements the mixed script confusable lint defined in RFC 2457.This is blocked onrust-lang#72069 andunicode-rs/unicode-security#13, and will need a Cargo.toml version bump after those are resolved.The lint message warning is sub-optimal for now. We'll need a mechanism to properly output  `AugmentScriptSet` to screen, this is to be added in `unicode-security` crate.r?@Manishearth
Manishearth added a commit to Manishearth/rust that referenced this pull requestJun 26, 2020
…=ManishearthImplement mixed script confusable lint.This implements the mixed script confusable lint defined in RFC 2457.This is blocked onrust-lang#72069 andunicode-rs/unicode-security#13, and will need a Cargo.toml version bump after those are resolved.The lint message warning is sub-optimal for now. We'll need a mechanism to properly output  `AugmentScriptSet` to screen, this is to be added in `unicode-security` crate.r?@Manishearth
Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment

Reviewers

@ManishearthManishearthManishearth approved these changes

Assignees

No one assigned

Labels

None yet

Projects

None yet

Milestone

No milestone

Development

Successfully merging this pull request may close these issues.

2 participants

@crlf0710@Manishearth

[8]ページ先頭

©2009-2025 Movatter.jp