- Notifications
You must be signed in to change notification settings - Fork49
The PyICU project repository has moved tohttps://pyicu.org.
License
ovalhub/pyicu
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
Welcome to PyICU, a Python extension wrapping the ICU C++ libraries.
ICU stands for "International Components for Unicode".These are the i18n libraries of the Unicode Consortium.They implement much of the Unicode Standard,many of its companion Unicode Technical Standards,and much of Unicode CLDR.
The PyICU source code is hosted athttps://gitlab.pyicu.org/main/pyicu.
The ICU homepage ishttp://site.icu-project.org/
See also the CLDR homepage athttp://cldr.unicode.org/
PyICU is a python extension implemented in C++ that wraps the C/C++ ICU library.It is known to also work as aPyPy extension.Unlesspkg-config
and the ICU libraries and headers are already installed,building PyICU from the sources onPyPIinvolves more than just apip
call. Many operating systems distributepre-built binary packages of ICU and PyICU, see below.
Mac OS X
- Ensure ICU is installed and can be found by
pkg-config
(asicu-config
wasdeprecated as of ICU 63.1), either by followingICU build instructions, or by using Homebrew:# install libicu (keg-only)brew install pkg-config icu4c# let setup.py discover keg-only icu4c via pkg-configexport PATH="/usr/local/opt/icu4c/bin:/usr/local/opt/icu4c/sbin:$PATH"export PKG_CONFIG_PATH="$PKG_CONFIG_PATH:/usr/local/opt/icu4c/lib/pkgconfig"
- Install PyICUwith the same C++ compiler as your Python distribution(more info):
# EITHER - when using a gcc-built CPython (e.g. from Homebrew)export CC="$(which gcc)" CXX="$(which g++)"# OR - when using system CPython or another clang-based CPython, ensure system clang is used (for proper libstdc++ https://gitlab.pyicu.org/main/pyicu/issues/5#issuecomment-291631507):unset CC CXX# avoid wheels from previous runs or PyPIpip install --no-binary=:pyicu: pyicu
- Ensure ICU is installed and can be found by
Debian
apt-get update# EITHER - from apt directly https://packages.debian.org/source/stable/pyicuapt-get install python3-icu# OR - from sourceapt-get install pkg-config libicu-devpip install --no-binary=:pyicu: pyicu
Ubuntu: similar to Debian, there is a pyicupackageavailable via
apt
.Alpine Linux: there is a pyicupackageavailable via
apk
.NetBSD: there is a pyicupackageavailable via
pkg_add
.OpenBSD: there is a pyicupackageavailable via
pkg_add
.Other operating systems: see below.
Before building PyICU the ICU libraries must be built and installed. Referto each system'sinstructions for more information.
PyICU is built with setuptools:
verify that
pkg-config
is available (theicu-config
program isdeprecatedas of ICU 63.1)pkg-config --cflags --libs icu-i18n
If this command returns an error or doesn't return the paths expectedthen ensure that the
INCLUDES
,LFLAGS
,CFLAGS
andLIBRARIES
dictionaries insetup.py
contain correct values for your platform.Starting with ICU 60,-std=c++11
must appear in your CFLAGS or be thedefault for your C++ compiler.build and install pyicu
python setup.py buildsudo python setup.py install
Mac OS XMake sure that
DYLD_LIBRARY_PATH
contains paths to the directory(ies)containing the ICU libs.Linux & SolarisMake sure that
LD_LIBRARY_PATH
contains paths to the directory(ies)containing the ICU libs or that you added the corresponding-rpath
argument toLFLAGS
.WindowsMake sure that
PATH
contains paths to the directory(ies)containing the ICU DLLs.
See theCHANGES filefor an up to date log of changes and additions.
There is no API documentation for PyICU. The API for ICU is documented athttps://unicode-org.github.io/icu-docs/apidoc/released/icu4c/ and thefollowing patterns can be used to translate from the C++ APIs to thecorresponding Python APIs.
The ICU string type,UnicodeString, is a type pointing at a mutable array ofUChar Unicode 16-bit wide characters and is describedhere. The Python 3str type is describedhere andhere. The Python 2unicode type is describedhere.
Because of their differences, ICU's and Python's string objects are not mergedinto the same type when crossing the C++ boundary but converted.
ICU APIs takingUnicodeString
arguments have been overloaded to alsoaccept arguments that are Python 3str
or Python 2unicode
objects.Python 2str
objects are auto-decoded into ICU strings using theutf-8
encoding.
To convert a Python 3bytes
or a Python 2str
object encoded in anencoding other thanutf-8
to an ICUUnicodeString
use theUnicodeString(str, encodingName)
constructor.
ICU's C++ APIs accept and returnUnicodeString
arguments in severalways: by value, by pointer or by reference.When an ICU C++ API is documented to accept aUnicodeString
referenceparameter, it is safe to assume that there are several correspondingPyICU python APIs making it accessible in simpler ways:
For example, the'UnicodeString &Locale::getDisplayName(UnicodeString &)'
API, documentedhere,can be invoked from Python in several ways:
The ICU way
>>> from icu import UnicodeString, Locale >>> locale = Locale('pt_BR') >>> string = UnicodeString() >>> name = locale.getDisplayName(string) >>> name <UnicodeString: 'Portuguese (Brazil)'> >>> name is string True <-- string arg was returned, modified in place
The Python way
>>> from icu import Locale >>> locale = Locale('pt_BR') >>> name = locale.getDisplayName() >>> name 'Portuguese (Brazil)'
A
UnicodeString
object was allocated and converted to a Pythonstr
object.
A UnicodeString can be converted to a Python unicode string with Python 3'sstr()
or Python 2'sunicode()
constructor. The usuallen()
,comparison, `[]and
[:]`` operators are all available, with the additionaltwists that slicing is not read-only and that ``+=`` is also available since aUnicodeString is mutable. For example:
>>> name = locale.getDisplayName()'Portuguese (Brazil)'>>> name = UnicodeString(name)>>> name<UnicodeString: 'Portuguese (Brazil)'>>>> str(name)'Portuguese (Brazil)'>>> len(name)19>>> str(name)'Portuguese (Brazil)'>>> name[3]'t'>>> name[12:18]<UnicodeString: 'Brazil'>>>> name[12:18] = 'the country of Brasil'>>> name<UnicodeString: 'Portuguese (the country of Brasil)'>>>> name += ' oh joy'>>> name<UnicodeString: 'Portuguese (the country of Brasil) oh joy'>
The C++ ICU library does not use C++ exceptions to report errors. ICUC++ APIs return errors via aUErrorCode
reference argument. All suchAPIs are wrapped by Python APIs that omit this argument and throw anICUError
Python exception instead. The same is true for ICU APIstaking both aParseError
and aUErrorCode
, they are both to beomitted.
For example, the'UnicodeString &DateFormat::format(const Formattable &, UnicodeString &, FieldPosition &, UErrorCode &)'
API, documentedhere is invoked from Python with:
>>> from icu import DateFormat, Formattable>>> df = DateFormat.createInstance()>>> df<SimpleDateFormat: M/d/yy h:mm a>>>> f = Formattable(940284258.0, Formattable.kIsDate)>>> df.format(f)'10/18/99 3:04 PM'
Of course, the simpler'UnicodeString &DateFormat::format(UDate, UnicodeString &)'
documentedhere can be used too:
>>> from icu import DateFormat>>> df = DateFormat.createInstance()>>> df<SimpleDateFormat: M/d/yy h:mm a>>>> df.format(940284258.0)'10/18/99 3:04 PM'
ICU uses a double floating point type calledUDate
that represents thenumber of milliseconds elapsed since 1970-jan-01 UTC for dates.
In Python, the value returned by thetime
module'stime()
function is the number of seconds since 1970-jan-01 UTC. Because of thisdifference, floating point values are multiplied by 1000 when passed toAPIs takingUDate
and divided by 1000 when returned asUDate
.
Python'sdatetime
objects, with or without timezone information, canalso be used with APIs takingUDate
arguments. Thedatetime
objects get converted toUDate
when crossing into the C++ layer.
Many ICU API take array arguments. A list of elements of the arrayelement types is to be passed from Python.
An ICUStringEnumeration
has threenext
methods:next()
whichreturnsstr
objects,unext()
which returnsstr
objects in Python 3orunicode
objects in Python 2 andsnext()
which returnsUnicodeString
objects. Any of these methods can be used as an iterator,using the Python built-initer
function.
For example, lete
be aStringEnumeration
instance:
e=TimeZone.createEnumeration()[sforsine]# a list of 'str' objects[sforsiniter(e.unext,'')]# a list of 'str' or 'unicode' objects[sforsiniter(e.snext,'')]# a list of 'UnicodeString' objects
The ICUTimeZone
type may be wrapped with anICUtzinfo
type forusage with Python'sdatetime
type. For example:
fromdatetimeimportdatetimetz=ICUtzinfo(TimeZone.createTimeZone('US/Mountain'))datetime.now(tz)
or, even simpler:
tz=ICUtzinfo.getInstance('Pacific/Fiji')datetime.now(tz)
To get the default time zone use:
defaultTZ=ICUtzinfo.getDefault()
To get the time zone's id, use thetzid
attribute or coerce the timezone to a string:
ICUtzinfo.getInstance('Pacific/Fiji').tzid->'Pacific/Fiji'str(ICUtzinfo.getInstance('Pacific/Fiji'))->'Pacific/Fiji'