Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

A C++ library for localization using GNU gettext po files, based on boost spirit

License

NotificationsYou must be signed in to change notification settings

cbeck88/spirit-po

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

268 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Build StatusAppveyor statusBoost licensed

spirit-po is a header-only C++11 library that you can use forlocalization within the GNU gettext system, instead of usinglibintl.

spirit-po has the advantage that it does not use mo files, so yourprogram can use the simple, portable po files produced directly bytranslators rather than require a binary format whichrequires special tools to create.

spirit-po is very easy to use in your C++11 projects.According tosloccount, it is only 916 lines of code in total (at time of writing),and is implemented usingboost::spirit.Our po grammar itself is only a few dozen lines.This makes it relatively easy to understand its behavior, and makes the library as a whole easy to modify if needed.

By contrast, the equivalent po-parser facility within the gettext project isthelibgettext-po po-manipulation library.sloccount counts the entirelibgettext-po directory asansic: 29382 (98.35%), that is, 30kloc of ANSI C.The parser is only a piece of this, but it is also quite complex and difficult to separatefrom the rest of the code. (Which is partly why I madespirit-po.)

An in-depth explanation of the rationale for this library as compared tolibintl andboost::locale::gettext, and specifically, the advantagesof parsing po files rather than mo files at run-time, is providedon the wiki.

Compatibility

spirit_po is intended to be a drop-in replacement for the use of GNUmsgfmt and GNUlibintl.
It should parse any well-formed.po file thatmsgfmt would read and the interface should produce thesame results.

It's not guaranteed to reject any po file thatmsgfmt would reject, or to emit warningssimilar tomsgfmt for common translator errors. Broadly speaking, the parser has been engineered with a fail-fastmentality, and there are several unit tests that check that major structural problems cause a parse error rather thansilently being accepted. However, for best results you may wish to validate po files by running them throughmsgfmtjust to see if it emits warnings, before deploying them, even if you usespirit_po in your application.

Similarly, there are certain cases that I am aware of in whichmsgfmt will drop a message from the catalog ifit contains invalid C format specifiers.spirit_po doesn't do this, which is a minor discrepancy.

If you are aware of any.po file whichmsgfmt parses, butspirit_po fails to parse, or, our emulation of thelibintlinterface doesn't yield expected results, please post a report on the issue tracker, with the po file included.

Quick Start

To begin, first obtain some.po files. PO files are created by translators, they contain a dictionary of translated strings.

For examples from various GNU projects, seeour test folder.

Then, load the file and construct aspirit_po::catalog from it.

#include<spirit_po/spirit_po.hpp>#include<fstream>#include<iostream>#include<string>intmain() {  std::ifstreamifs("test.po");  std::string po_file{std::istreambuf_iterator<char>{ifs}, std::istreambuf_iterator<char>()};    spirit_po::default_catalog cat{spirit_po::default_catalog::from_range(po_file)};  std::cout << cat.gettext("Hello world!") << std::endl;    std::cout << cat.pgettext("Pick a number:","prompt") << std::endl;int number =6;  std::cin >> number;  std::cout << std::endl;printf(cat.ngettext("Did I fire %d shot or was it only %d? Do you feel lucky, punk?","Did I fire %d shots or was it only %d? Do you feel lucky, punk?",                      number),         number,         number -1);}

The catalog object serves translation requests using the strings it loaded from the PO file.

In this line,

std::cout << cat.gettext("Hello world!") << std::endl;

the translated form of"Hello world!" will be displayed. The result will be aconst char * pointing toa string owned by the catalog. (Or, if the translation misses becaues this string wasn't in the catalog, it will simply return theenglish text"Hello world!", the same pointer it was passed in.)

In this line,

std::cout << cat.pgettext("Pick a number:","prompt") << std::endl;

a string is translated, and also marked with a context string. Sometimes the same english phrase or sentence appears in your program in multiple places, but should be translateddifferently depending on context. The context string allows you to provide a hint to the translator and allows the program to disambiguate the two usages. (This particular example is unfortunately a poor one.)

In this line,

printf(cat.ngettext("Did I fire %d shot or was it only %d? Do you feel lucky, punk?","Did I fire %d shots or was it only %d? Do you feel lucky, punk?",                    number),       number,       number - 1);

the catalog object will look up the C-format string in the catalog, and search for the plural form corresponding tonumber. This ensuresthat"shots" will be pluralized correctly no matter what language is used. (In many languages, there are more than two plural forms and language-specific logic is needed to determine the appropriate form to use based on the number. The translator provides this logic in the po-file header.) Then we useprintf to substitute the numbers into the string.

These examples are actually all rehash from gettext documentation -- the member functionsgettext, pgettext, ngettext are all analogous to calls to the C librarylibintl.

If you aren't already familiar with gettext, have a look at theirdocumentation.

Note: It is standard in gettext documentation examples to use functions likeprintf with translated strings, since it is a C library. However, if you are working in C++, you might be better off to use a type-safe alternative toprintf. Otherwise, if the translator makes a mistake, or an adversary modifies the po files, you can get undefined behavior in your program with code like above. You might prefer to use something liketinyformat for instance.

Usage

Loading and Construction

When you load translations withspirit_po the loading process is entirely in your hands and you can make it work howeveryou like. A catalog can be constructed using one of three methods:

  • factory functionspirit_po::catalog<>::from_iterators which can takeany pair of iterators which spirit can use.
  • factory functionspirit_po::catalog<>::from_range which can take anyforward range of characters which spirit can use (such as astd::string).
  • factory functionspirit_po::catalog<>::from_istream which takes anygivenistream and builds a po catalog from it. Spirit reads incrementally,so this does not require reading the entireistream into a string first.(However, in the typical case of reading a po-file, that will usually befaster. Recommendation is not to usefrom_istream with astd::ifstreamfor best performance.)

(You should use one of these rather than using the ctor directly.)

If the po content is malformed, one of two things will happen (configurable):

  • Aspirit_po::catalog_exception will be thrown. (This is the default.)
  • If preprocessor symbol is definedSPIRIT_PO_NO_EXCEPTIONS before includingspirit_po.hpp,then the catalog constructor will not throw (and none of the other functionswill either), and instead, the catalog will result with whatever strings itmanaged to load, anexplicit operator bool() const function will be definedwhich returns false if the constructor would have thrown, and a methodstd::string error() const is defined which returns the error string in case there was anerror. This option is here to support projects that compile with-fno-exceptions.

Looking Up Strings (Messages)

Thespirit_po::catalog object has 4 methods which are part of the gettextspecification:

  • const char * gettext(const char * msgid)
  • const char * ngettext(const char * msgid, const char * msgid_plural, uint plural)
  • const char * pgettext(const char * msgctxt, const char * msgid)
  • const char * npgettext(const char * msgctxt, const char * msgid, const char * msgid_plural, uint plural)

These implement, basic message lookup, plural-forms lookup, contextual lookup,and plural-forms-with-context lookup, respectively. See GNU gettext docs for details.

As in GNUlibintl, the string pointers which are returned are non-owning pointers.When a translated form is found the catalog, the returned pointer points to a string owned bythe catalog. When a translated form is not found, the returned pointer is one of the arguments.This is maximally efficient when using gettext with_ macros and such, where the input parameterswill be string literals with static storage duration. However, in the general case, if the input pointerbecomes invalid, then the output pointer may become invalid also.

We also give equivalent, alternate versions of these which returnstd::stringand takeconst std::string & in place ofconst char * as parameters. Insome scenarios (i.e. if you must make a copy of the output anyways, and the input stringis already held by astd::string) these versions may actually be more efficient, and thelifetime of the result is unambiguous.

They are otherwise equivalent.

  • std::string gettext_str(const std::string & msgid)
  • std::string ngettext_str(const std::string & msgid, const std::string & msgid_plural, uint plural)
  • std::string pgettext_str(const std::string & msgctxt, const std::string & msgid)
  • std::string npgettext_str(const std::string & msgctxt, const std::string & msgid, const std::string & msgid_plural, uint plural)

Managing multiple catalogs

In the simplest setup, you would have one application, and the set of all strings it needs to translate.If there arem languages you support, you might have one.pot file (po-template which you give to translators)and get backm different.po files. Your application would then load only one of them at a time based on the locale.

In many GNU programs, the situation is more complicated. The application may be broken up into components, and the stringsfrom different components classified into different "textdomains". Then, there would be one.pot file for each textdomain.If there arem textdomains andn languages, you would haven * m different.po files.

In the traditionallibintlC-interface, textdomains are, like locale, handled by a global variable. Thelibintl librarymanages the loading of catalogs and textdomains, which is accessed bygettext and friends. At any time, a global function may be called tochange the current locale or textdomain, which are stored in global variables. So, if for instance your program has a UI module and a UI textdomain,the UI code would set the textdomain when it is entered, and then callgettext on each string it needs. Then when you enter another module,you would bind a different textdomain, and then makegettext calls for those strings.

There are various problems that I've experienced when writing programs that use this interface.

  • The built-in C file functions do not support UTF-8 paths when compiling with mingw for windows.libintl does not provide any way to work around this. (Some other libraries like SDL allow you to pass function pointersto alternative filesystem functions. Then you can have one cross-platform filesystem implementation, and make everything in your program use it.)
  • If you have a multithreaded program and multiple threads that need to talk tolibintl, you can create a datarace becauseeverything is clobbering the same global variables. This is always a problem with libc, and fortunately it's very unlikely to happen with changes to thelocale, because the locale rarely changes in typical programs. However, it's a big problem with textdomains, which are likely to change frequently.It's also quite likely that code in different threads will be using different textdomains.

When usingspirit_po, you should understand that,spirit_po is not attempting to emulate the entirelibintl interface. Aspirit_po::catalog isonly a single catalog, corresponding to a single po file loaded into memory. It doesn't havedcgettext method likelibintl does for instance, becausea catalog only represents one textdomain.

If you don't need multiple textdomains, I recommend that you avoid it. (AFAIK the main advantage of having multiple textdomains is that if you have alarge number of strings, you can assign different textdomains to different translators, to divide up the work. Many translators and translation management platformsalready support having multiple textdomains for a project, and giving progress reports for each one. That is, I think this is primarily done for convenience oftranslators rather than programmers.)

If you do need multiple textdomains, I recommend that you throw together your own mechanism for this. For instance, if you are already using textdomains andlibintland want to switch tospirit_po, you can have a globalstd::map<std::string, spirit_po::catalog>. You can provide a function likedcgettext which dereferences thismap at the correct textdomain, or manage the textdomain in a global variable yourself, or both. Coding this up is straightforward.

If your program has multiple threads, you can have one such map for each thread, and put it in thread-local storage. Or it may be that your main thread needs multipletextdomains, but the other threads each use at most one textdomain, which would make things simpler. You could then pass each one a catalog object when it starts up, or whatever.

Basically, I don't want to make such architectural decisions for you.spirit_po is focused just on parsing po-files and handling the queries, I don't want to forceyou to use global variables or broken filesystem functions. So, we don't completely reproduce thelibintl interface.

In the interest of being flexible,spirit_po also allows you to merge compatible catalogs together into one master catalog. This is an alternate approach -- ifthe translation team wants to have multiple textdomains for their convenience, it doesn't mean the programmers have to think about multiple textdomains all the time aswell.

  • void merge(spirit_po::catalog && other)
    Check if the metadata of this catalog and given catalog shows they are compatible(number of plural forms are equal). If not then signal an error (exception or errorstate). If so, then move all the message entries from the other hashmap to this map.May trigger warnings on the warning channel if there are collisions.

  • void set_warning_channel(const std::function<void(const std::string &)> & w)
    Set the warning channel for this catalog. The warning channel is a function whichwill be called with a warning message whenever a string (with context) is clobbered.The warning channel object may also be passed to the constructor, if one is concernedabout duplicated strings within a single po file. By default warnings are ignored.

However, merging catalogs has its own pitfalls. What happens if two catalogs contain the same string? One of them gets discarded, but which one is essentially arbitrary.You can give the catalog object a "warning channel" where it can send warnings when that happens. But it's hard to actually handle the problem or fix it. Also, merging catalogsrequires dropping the metadata of one of them, and some other features like getting the line number at which a string appeared in thepo file no longer makes sense, because when we mergewe don't keep track of which file a string came from. So, merging catalogs might be appropriate for some projects, but I certainly wouldn't recommend it for every project.

Other functions

Some less commonly useful accessors

  • const spirit_po::catalog_metadata & get_metadata() const
    Return the metadata structure that was parsed from the po header.
  • std::size_t gettext_line_no(const std::string & msgid) const
    Return the line number at which a given catalog message was read.0 if it is not found.
  • std::size_t pgettext_line_no(const std::string & msgctxt, const std::string & msgid) const
    Return the line number at which a given catalog message (with context) was read.0 if it is not found.

Customization points

spirit_po::catalog is a template, and you may customize it in two ways.

  • Specify an alternate hashmap type.
    The default isstd::unordered_map, butif you like you can experiment withboost::flat_map or a flat unordered map,or one of the Loki hashmaps, etc.

  • Specify an alternate plural forms compiler.
    GNU Gettext specifies a pseudo-C expression language for plural forms functions.For example, in Polish there are three plural forms. There is a form for the singular,a form used when the number ends in 12, 13 or 14, and a form for all other cases.The appropriate logic is typically specified in the po-header like so:

    Plural-Forms: nplurals=3; \    plural=n==1 ? 0 : \           n%10>=2 && n%10<=4 && (n%100<10 || n%100>=20) ? 1 : 2;

    To implement functions likengettext,spirit_po needs to be able to read these pseudo-C expressions andevaluate them for different values ofn.spirit_po contains a built-in facility to do this efficiently --it converts such expressions to a sequence of op-codes that run on a simple stack machine.

    If for some reason you want to use a different format for this, or a different C++ implementation of the standard format,you can pass a custom plural forms "compiler" type as the second template parameter tospirit_po::catalog.

    The compiler is a function object that should be default constructible, and should take a string(the part that starts afterplural=) and returna function object of signatureunsigned int(unsigned int), representing the compiled plural forms function.See the default implementation for details.

Licensing and Distribution

spirit-po is open-source software available under the Boost software license.

Dependencies

  • spirit-po is only intended to work with UTF-8 encoded po files.
  • spirit-po has been tested against many boost versions, ranging from 1.48 to 1.65.
  • spirit-po does not require C++ exceptions to be enabled.
    The tests run when compiled with-fno-exceptions, provided that
    • SPIRIT_PO_NO_EXCEPTIONS is defined
    • BOOST_NO_EXCEPTIONS is defined
    • Boost (headers-only) version >= 1.55. (Fails below that due to a bug inboost::variant.)

Compiler Support

spirit-po has been tested with

  • gcc versions4.9, 5.0, 5.4, 6.3
  • clang versions3.5, 3.7. 3.8, 4.0
  • MSVC 2013, 2015, 2017

See.travis.yml andappveyor.yml for info about our CI.

Tests

Thetest/ folder contains the unit tests, built with boost.build. (There is also an old Makefile that should still work.)

Thetest_libintl/ folder contains the validation tests againstlibintl, builtusing cmake. To add new validation test cases, just drop new.po files in the foldertest_libintl/po/.

Acknowledgements

The author thanks David White, Kristina Simpson, and others for conversationswhich informed the creation of this library.

About

A C++ library for localization using GNU gettext po files, based on boost spirit

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors3

  •  
  •  
  •  

Languages


[8]ページ先頭

©2009-2026 Movatter.jp