Implement a special-case lookup for ascii grapheme categories.#79

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Jump to bottom

Merged

Manishearth merged 2 commits intounicode-rs:masterfromcessen:ascii_grapheme_optimization

Feb 14, 2020

Merged

Implement a special-case lookup for ascii grapheme categories.#79

Manishearth merged 2 commits intounicode-rs:masterfromcessen:ascii_grapheme_optimization

Feb 14, 2020

Conversation

Copy link

Contributor

cessen commentedFeb 14, 2020

This speeds up processing even for many non-ascii texts, since
they often still use ascii-range punctuation and whitespace.

Implement a special-case lookup for ascii grapeheme categories.

c9aa6fa

This speeds up processing even for many non-ascii texts, sincethey often still use ascii-range punctuation and whitespace.

cessen changed the title~~Implement a special-case lookup for ascii grapeheme categories.~~Implement a special-case lookup for ascii grapheme categories.

Feb 14, 2020

Copy link

ContributorAuthor

I took a crack at special-casing the ASCII range as discussed in#77, and it definitely improves performance across the board. I tried two different implementations: one based on a 128-element lookup table, and another based on a few if statements. This PR is for the if statement approach.

Here are the bench results, including results for both approaches:

No ascii special-casing:test graphemes_arabic      ... bench:     893,766 ns/iter (+/- 68,494) = 56 MB/stest graphemes_english     ... bench:     989,329 ns/iter (+/- 42,635) = 50 MB/stest graphemes_hindi       ... bench:     933,926 ns/iter (+/- 30,817) = 53 MB/stest graphemes_japanese    ... bench:     727,820 ns/iter (+/- 37,069) = 69 MB/stest graphemes_korean      ... bench:   1,083,001 ns/iter (+/- 33,362) = 46 MB/stest graphemes_mandarin    ... bench:     520,048 ns/iter (+/- 10,724) = 97 MB/stest graphemes_russian     ... bench:     842,000 ns/iter (+/- 32,343) = 60 MB/stest graphemes_source_code ... bench:   1,060,247 ns/iter (+/- 42,456) = 47 MB/sTable:test graphemes_arabic      ... bench:     692,838 ns/iter (+/- 34,703) = 72 MB/stest graphemes_english     ... bench:   1,018,363 ns/iter (+/- 48,135) = 48 MB/stest graphemes_hindi       ... bench:     826,851 ns/iter (+/- 39,035) = 59 MB/stest graphemes_japanese    ... bench:     739,136 ns/iter (+/- 68,602) = 68 MB/stest graphemes_korean      ... bench:   1,000,320 ns/iter (+/- 85,934) = 50 MB/stest graphemes_mandarin    ... bench:     522,754 ns/iter (+/- 24,797) = 96 MB/stest graphemes_russian     ... bench:     717,681 ns/iter (+/- 25,272) = 71 MB/stest graphemes_source_code ... bench:   1,008,405 ns/iter (+/- 31,694) = 49 MB/sIf branching:test graphemes_arabic      ... bench:     663,521 ns/iter (+/- 36,100) = 75 MB/stest graphemes_english     ... bench:     927,851 ns/iter (+/- 35,652) = 53 MB/stest graphemes_hindi       ... bench:     801,972 ns/iter (+/- 35,350) = 61 MB/stest graphemes_japanese    ... bench:     716,374 ns/iter (+/- 40,118) = 70 MB/stest graphemes_korean      ... bench:     964,766 ns/iter (+/- 47,644) = 51 MB/stest graphemes_mandarin    ... bench:     503,575 ns/iter (+/- 15,838) = 100 MB/stest graphemes_russian     ... bench:     665,667 ns/iter (+/- 27,923) = 76 MB/stest graphemes_source_code ... bench:     959,008 ns/iter (+/- 66,405) = 52 MB/s

Turns out that the branching approach beats the table approach, at least on my machine. The table implementation wasn't completely naive either: I made sure to avoid duplicate bounds checks (although they probably would have been optimized out anyway) by using the sliceget() method to roll the bounds check and ascii-range check into one, so the whole thing was just a single branch and an array lookup.

The if-branching approach is also just way easier to maintain and check for correctness, because you don't have to wade through an array of 128 grapheme categories.