- Notifications
You must be signed in to change notification settings - Fork32
Treat emoji presentation sequences as fullwidth#35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
Closed
Jules-Bertholet wants to merge13 commits intounicode-rs:masterfromJules-Bertholet:emoji-presentation
Uh oh!
There was an error while loading.Please reload this page.
Closed
Changes fromall commits
Commits
Show all changes
13 commits Select commitHold shift + click to select a range
130f3fd Treat emoji presentation sequences as fullwidth
Jules-Bertholet6bd8215 emoji presentation: store single codepoints instead of ranges
Jules-Bertholeta4d25a9 Use a better datastructure
Jules-Bertholet51a8417 Document exact width rules
Jules-Bertholet5d8bc25 Add more CI checks
Jules-Bertholet6beb76f Add emoji benchmark
Jules-Bertholetad55481 Address review comments
Jules-Bertholet4f80b57 Use `match` instead of array for first level of tree
Jules-Bertholetd944bdd Spuriously treat certain always-wide characters as eligible for emoji…
Jules-Bertholeta8b2fab Align `EMOJI_PRESENTATION_LEAVES` to 128 bytes
Jules-Bertholeta5066aa Convert tests into integration tests
Jules-Bertholet5e8bf9b Update docs to mention `Grapheme_Extend`
Jules-Bertholet46a6067 Update unicode.py commendt to match new rules
Jules-BertholetFile filter
Filter by extension
Conversations
Failed to load comments.
Loading
Uh oh!
There was an error while loading.Please reload this page.
Jump to
Jump to file
Failed to load files.
Loading
Uh oh!
There was an error while loading.Please reload this page.
Diff view
Diff view
There are no files selected for viewing
13 changes: 13 additions & 0 deletions.github/workflows/rust.yml
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters
1 change: 1 addition & 0 deletions.gitignore
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -3,3 +3,4 @@ Cargo.lock | ||
| scripts/tmp | ||
| scripts/*.txt | ||
| scripts/*.rs | ||
| bench_data/* | ||
10 changes: 6 additions & 4 deletionsCargo.toml
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters
113 changes: 113 additions & 0 deletionsbenches/benches.rs
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,113 @@ | ||
| // Copyright 2012-2015 The Rust Project Developers. See the COPYRIGHT | ||
| // file at the top-level directory of this distribution and at | ||
| // http://rust-lang.org/COPYRIGHT. | ||
| // | ||
| // Licensed under the Apache License, Version 2.0 <LICENSE-APACHE or | ||
| // http://www.apache.org/licenses/LICENSE-2.0> or the MIT license | ||
| // <LICENSE-MIT or http://opensource.org/licenses/MIT>, at your | ||
| // option. This file may not be copied, modified, or distributed | ||
| // except according to those terms. | ||
| #![feature(test)] | ||
| extern crate test; | ||
| use std::iter; | ||
| use test::Bencher; | ||
| use unicode_width::{UnicodeWidthChar, UnicodeWidthStr}; | ||
| #[bench] | ||
| fn cargo(b: &mut Bencher) { | ||
| let string = iter::repeat('a').take(4096).collect::<String>(); | ||
| b.iter(|| { | ||
| for c in string.chars() { | ||
| test::black_box(UnicodeWidthChar::width(c)); | ||
| } | ||
| }); | ||
| } | ||
| #[bench] | ||
| fn stdlib(b: &mut Bencher) { | ||
| let string = iter::repeat('a').take(4096).collect::<String>(); | ||
| b.iter(|| { | ||
| for c in string.chars() { | ||
| test::black_box(c.width()); | ||
| } | ||
| }); | ||
| } | ||
| #[bench] | ||
| fn simple_if(b: &mut Bencher) { | ||
| let string = iter::repeat('a').take(4096).collect::<String>(); | ||
| b.iter(|| { | ||
| for c in string.chars() { | ||
| test::black_box(simple_width_if(c)); | ||
| } | ||
| }); | ||
| } | ||
| #[bench] | ||
| fn simple_match(b: &mut Bencher) { | ||
| let string = iter::repeat('a').take(4096).collect::<String>(); | ||
| b.iter(|| { | ||
| for c in string.chars() { | ||
| test::black_box(simple_width_match(c)); | ||
| } | ||
| }); | ||
| } | ||
| #[inline] | ||
| fn simple_width_if(c: char) -> Option<usize> { | ||
| let cu = c as u32; | ||
| if cu < 127 { | ||
| if cu > 31 { | ||
| Some(1) | ||
| } else if cu == 0 { | ||
| Some(0) | ||
| } else { | ||
| None | ||
| } | ||
| } else { | ||
| UnicodeWidthChar::width(c) | ||
| } | ||
| } | ||
| #[inline] | ||
| fn simple_width_match(c: char) -> Option<usize> { | ||
| match c as u32 { | ||
| cu if cu == 0 => Some(0), | ||
| cu if cu < 0x20 => None, | ||
| cu if cu < 0x7f => Some(1), | ||
| _ => UnicodeWidthChar::width(c), | ||
| } | ||
| } | ||
| #[bench] | ||
| fn enwik8(b: &mut Bencher) { | ||
| // To benchmark, download & unzip `enwik8` from https://data.deepai.org/enwik8.zip | ||
| let data_path = "bench_data/enwik8"; | ||
| let string = std::fs::read_to_string(data_path).unwrap_or_default(); | ||
| b.iter(|| test::black_box(UnicodeWidthStr::width(string.as_str()))); | ||
| } | ||
| #[bench] | ||
| fn jawiki(b: &mut Bencher) { | ||
| // To benchmark, download & extract `jawiki-20240201-pages-articles-multistream-index.txt` from | ||
| // https://dumps.wikimedia.org/jawiki/20240201/jawiki-20240201-pages-articles-multistream-index.txt.bz2 | ||
| let data_path = "bench_data/jawiki-20240201-pages-articles-multistream-index.txt"; | ||
| let string = std::fs::read_to_string(data_path).unwrap_or_default(); | ||
| b.iter(|| test::black_box(UnicodeWidthStr::width(string.as_str()))); | ||
| } | ||
| #[bench] | ||
| fn emoji(b: &mut Bencher) { | ||
| // To benchmark, download emoji-style.txt from https://www.unicode.org/emoji/charts/emoji-style.txt | ||
| let data_path = "bench_data/emoji-style.txt"; | ||
| let string = std::fs::read_to_string(data_path).unwrap_or_default(); | ||
| b.iter(|| test::black_box(UnicodeWidthStr::width(string.as_str()))); | ||
| } |
Oops, something went wrong.
Uh oh!
There was an error while loading.Please reload this page.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.