Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up

a Python implementation of the Unicode Collation Algorithm

License

MIT, Unknown licenses found

Licenses found

MIT
LICENSE
Unknown
LICENSE-allkeys
NotificationsYou must be signed in to change notification settings

jtauber/pyuca

Repository files navigation

Build StatusCoverage StatusMIT License

DOIJOSS

This is a Python implementation of theUnicode Collation Algorithm (UCA). Itpasses 100% of the UCA conformance tests for Unicode 5.2.0 (Python 2.7),Unicode 6.3.0 (Python 3.3+), Unicode 8.0.0 (Python 3.5+), Unicode 9.0.0(Python 3.6+), and Unicode 10.0.0 (Python 3.7+) with a variable-weightingsetting of Non-ignorable.

What do you use it for?

In short, sorting non-English strings properly.

The core of the algorithm involves multi-level comparison. For example,café comes beforecaff because at the primary level, the accent isignored and the first word is treated as if it werecafe. The secondarylevel (which considers accents) only applies then to words that are equivalentat the primary level.

The Unicode Collation Algorithm and pyuca also support contraction andexpansion.Contraction is where multiple letters are treated as a singleunit. In Spanish,ch is treated as a letter coming betweenc anddso that, for example, words beginningch should sort after all other wordsbeginnings withc.Expansion is where a single letter is treated asthough it were multiple letters. In German,ä is sorted as if it wereae, i.e. afterad but beforeaf.

How to use it

Here is how to use thepyuca module.

pip install pyuca

Usage example:

from pyuca import Collatorc = Collator()assert sorted(["cafe", "caff", "café"]) == ["cafe", "caff", "café"]assert sorted(["cafe", "caff", "café"], key=c.sort_key) == ["cafe", "café", "caff"]

Collator can also take an optional filename for specifying a customcollation element table.

You can also import collators for specific Unicode versions,e.g.from pyuca.collator import Collator_8_0_0.But justfrom pyuca import Collator will ensure that the collator versionmatches the version ofunicodata provided by the standard library for yourversion of Python.

How to cite it

Tauber, J. K. (2016). pyuca: a Python implementation of the Unicode Collation Algorithm. The Journal of Open Source Software. DOI: 10.21105/joss.00021

License

Python code is made available under an MIT license (seeLICENSE).allkeys.txt is made available under the similar license defined inLICENSE-allkeys.

Contacting the Developer

If you have any problems, questions or suggestions, it's best to file an issueon GitHub although you can also contact me atjtauber@jtauber.com.

For more of my work on linguistics and Ancient Greek, seehttp://jktauber.com/.

About

a Python implementation of the Unicode Collation Algorithm

Topics

Resources

License

MIT, Unknown licenses found

Licenses found

MIT
LICENSE
Unknown
LICENSE-allkeys

Stars

Watchers

Forks

Sponsor this project

 

Languages


[8]ページ先頭

©2009-2025 Movatter.jp