Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Unicode sentence boundaries#24

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Merged
Manishearth merged 4 commits intounicode-rs:masterfromtomcumming:master
May 15, 2019

Conversation

tomcumming
Copy link
Contributor

@tomcummingtomcumming commentedMay 6, 2017
edited
Loading

This is an implementation of thesentence breaks specification, including changes to the python files to grab the sentence break test data.

I welcome any advice for improving.

@tomcummingtomcumming changed the titleCode review pleaseCode review please (Unicode sentence boundary partial implementation)May 6, 2017
@tomcummingtomcumming changed the titleCode review please (Unicode sentence boundary partial implementation)Unicode sentence boundariesMay 16, 2017
@Manishearth
Copy link
Member

Sorry for letting this stagnate! I'm rather busy to review this right now, but will try to get to it soon!

Sentence boundaries is something I've wanted implemented here for a while 😄

@Manishearth
Copy link
Member

(still no time to look at this, apologies. Really hope to get to it soon)

@rthrth mentioned this pull requestMay 3, 2019
@rth
Copy link
Contributor

rth commentedMay 6, 2019

This PR looks quite good and it would be great to have this functionality. Could I help in any way ?

@tomcumming
Copy link
ContributorAuthor

@rth I can split this out into another crate if required?

@rth
Copy link
Contributor

rth commentedMay 8, 2019

@tomcumming I would really like to use this implementation (and compare it with other sentences splitting approaches) inrth/vtext#51 . Having this implementation in theunicode-segmentation crate would be ideal, but if it is unlikely to be reviewed in the near future, maybe putting it in some other crate could be a workaround.

Any chance@Manishearth that you would have some review bandwidth for this, or could suggest someone who could review it?

@Manishearth
Copy link
Member

@rth mind doing a review yourself as well? I can also try and review, but I don't think I'd be able to give this a proper thorough review and would feel more comfortable if more people have gone through it.

@rth
Copy link
Contributor

rth commentedMay 9, 2019

Sure, I'll try to review it in the next few days.

Copy link
Member

@ManishearthManishearth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Code looks correct! Mostly want more documentation.

@tomcumming
Copy link
ContributorAuthor

@Manishearth@rth I have updated the PR including requested changes

@Manishearth
Copy link
Member

Looks good!@rth want to do a second review?

Copy link
Contributor

@rthrth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Thanks a lot for the review@Manishearth !

I went through the code in more detail, I find it quite readable and I don't really have anything to add. (Though I am fairly new to rust and don't know that much about Unicode segmentation specs).

I can confirm thatsrc/tables.rs andsrc/testdata.rs in this PR can be re-generated in their current state with the included python scripts, but they require setting,
scripts/unicode.py

-        os.system("curl -O http://www.unicode.org/Public/UNIDATA/%s"+        os.system("curl -O http://www.unicode.org/Public/9.0.0/ucd/%s"

as otherwise data for latest Unicode 12.0 is downloaded.

@tomcumming
Copy link
ContributorAuthor

Fixing the URL for test data should probably be another PR

@rth
Copy link
Contributor

rth commentedMay 15, 2019

Fixing the URL for test data should probably be another PR

Yes, I'll do it.

Thanks@tomcumming I don't have any other comments.

@rthrth mentioned this pull requestMay 15, 2019
@ManishearthManishearth merged commitc7a6b6f intounicode-rs:masterMay 15, 2019
@Manishearth
Copy link
Member

Thank you! I'll push a release soonish

rth reacted with hooray emoji

@Manishearth
Copy link
Member

Published 1.3.0. Thanks for the work on this, and sorry for the delay in reviewing!

Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment
Reviewers

@rthrthrth left review comments

@ManishearthManishearthManishearth approved these changes

Assignees
No one assigned
Labels
None yet
Projects
None yet
Milestone
No milestone
Development

Successfully merging this pull request may close these issues.

3 participants
@tomcumming@Manishearth@rth

[8]ページ先頭

©2009-2025 Movatter.jp