Movatterモバイル変換

[0]ホーム

Python Library Reference

Previous:4.7 cStringIOUp:4. String ServicesNext:4.8.1 Codec Base Classes

4.8`codecs` -- Codec registry and base classes

This module defines base classes for standard Python codecs (encodersand decoders) and provides access to the internal Python codecregistry which manages the codec lookup process.

It defines the following functions:

register(search_function)

Register a codec search function. Search functions are expected totake one argument, the encoding name in all lower case letters, andreturn a tuple of functions(encoder,decoder,stream_reader,stream_writer) taking the following arguments:

encoder anddecoder: These must be functions or methods which have the same interface as theencode()/decode() methods of Codec instances (see Codec Interface). The functions/methods are expected to work in a stateless mode.

stream_reader andstream_writer: These have to be factory functions providing the following interface:

factory(stream,errors='strict')

The factory functions must return objects providing the interfaces defined by the base classesStreamWriter andStreamReader, respectively. Stream codecs can maintain state.

Possible values for errors are'strict' (raise an exception in case of an encoding error),'replace' (replace malformed data with a suitable replacement marker, such as "?") and'ignore' (ignore malformed data and continue without further notice).

In case a search function cannot find a given encoding, it shouldreturnNone.

lookup(encoding): Looks up a codec tuple in the Python codec registry and returns thefunction tuple as defined above.
Encodings are first looked up in the registry's cache. If not found,the list of registered search functions is scanned. If no codecs tupleis found, aLookupError is raised. Otherwise, the codecstuple is stored in the cache and returned to the caller.

To simply access to the various codecs, the module provides theseadditional functions which uselookup() for the codeclookup:

getencoder(encoding): Lookup up the codec for the given encoding and return its encoderfunction.
Raises aLookupError in case the encoding cannot be found.

getdecoder(encoding): Lookup up the codec for the given encoding and return its decoderfunction.
Raises aLookupError in case the encoding cannot be found.

getreader(encoding): Lookup up the codec for the given encoding and return its StreamReaderclass or factory function.
Raises aLookupError in case the encoding cannot be found.

getwriter(encoding): Lookup up the codec for the given encoding and return its StreamWriterclass or factory function.
Raises aLookupError in case the encoding cannot be found.

To simplify working with encoded files or stream, the modulealso defines these utility functions:

open(filename, mode[, encoding[, errors[, buffering]]])

Open an encoded file using the givenmode and returna wrapped version providing transparent encoding/decoding.

Note:The wrapped version will only accept the object formatdefined by the codecs, i.e. Unicode objects for most built-incodecs. Output is also codec-dependent and will usually be Unicode aswell.

encoding specifies the encoding which is to be used for thethe file.

errors may be given to define the error handling. It defaultsto'strict' which causes aValueError to be raisedin case an encoding error occurs.

buffering has the same meaning as for the built-inopen() function. It defaults to line buffered.

EncodedFile(file, input[, output[, errors]])

Return a wrapped version of file which provides transparentencoding translation.

Strings written to the wrapped file are interpreted according to thegiveninput encoding and then written to the original file asstrings using theoutput encoding. The intermediate encoding willusually be Unicode but depends on the specified codecs.

Ifoutput is not given, it defaults toinput.

errors may be given to define the error handling. It defaults to'strict', which causesValueError to be raised in casean encoding error occurs.

The module also provides the following constants which are usefulfor reading and writing to platform dependent files:

BOM
BOM_BE
BOM_LE
BOM32_BE
BOM32_LE
BOM64_BE
BOM64_LE: These constants define the byte order marks (BOM) used in datastreams to indicate the byte order used in the stream or file.BOM is eitherBOM_BE orBOM_LEdepending on the platform's native byte order, while the othersrepresent big endian ("_BE" suffix) and little endian("_LE" suffix) byte order using 32-bit and 64-bit encodings.

See Also:

http://sourceforge.net/projects/python-codecs/: A SourceForge project working on additional support for Asian codecs for use with Python. They are in the early stages of development at the time of this writing -- look in their FTP area for downloadable files.

Subsections

4.8.1 Codec Base Classes

Python Library Reference

Previous:4.7 cStringIOUp:4. String ServicesNext:4.8.1 Codec Base Classes

Release 2.2.3, documentation updated on 30 May 2003.

SeeAbout this document... for information on suggesting changes.

[8]ページ先頭

Movatterモバイル変換

4.8codecs -- Codec registry and base classes

4.8`codecs` -- Codec registry and base classes