Movatterモバイル変換


[0]ホーム

URL:


libxml2
Loading...
Searching...
No Matches
Data Structures |Typedefs |Enumerations |Functions
encoding.h File Reference

Character encoding conversion functions.More...

Data Structures

struct  _xmlCharEncodingHandler
 A character encoding conversion handler for non UTF-8 encodings.More...

Typedefs

typedef int(* xmlCharEncodingInputFunc) (unsigned char *out, int *outlen, const unsigned char *in, int *inlen)
 Convert characters to UTF-8.
typedef int(* xmlCharEncodingOutputFunc) (unsigned char *out, int *outlen, const unsigned char *in, int *inlen)
 Convert characters from UTF-8.
typedefxmlCharEncError(* xmlCharEncConvFunc) (void *vctxt, unsigned char *out, int *outlen, const unsigned char *in, int *inlen, int flush)
 Convert between character encodings.
typedef void(* xmlCharEncConvCtxtDtor) (void *vctxt)
 Free a conversion context.
typedef struct_xmlCharEncodingHandler xmlCharEncodingHandler
 Character encoding converter.
typedefxmlParserErrors(* xmlCharEncConvImpl) (void *vctxt, const char *name,xmlCharEncFlags flags,xmlCharEncodingHandler **out)
 If this function returns XML_ERR_OK, it must fill theout pointer with an encoding handler.

Enumerations

enum  xmlCharEncError
 Encoding conversion errors.More...
enum  xmlCharEncoding
 Predefined values for some standard encodings.More...
enum  xmlCharEncFlags
 Encoding conversion flags.More...

Functions

void xmlInitCharEncodingHandlers (void)
void xmlCleanupCharEncodingHandlers (void)
 Cleanup the memory allocated for the char encoding support, it unregisters all the encoding handlers and the aliases.
void xmlRegisterCharEncodingHandler (xmlCharEncodingHandler *handler)
 Register the char encoding handler.
xmlParserErrors xmlLookupCharEncodingHandler (xmlCharEncoding enc,xmlCharEncodingHandler **out)
 Find or create a handler matching the encoding.
xmlParserErrors xmlOpenCharEncodingHandler (const char *name, int output,xmlCharEncodingHandler **out)
 Find or create a handler matching the encoding.
xmlParserErrors xmlCreateCharEncodingHandler (const char *name,xmlCharEncFlags flags,xmlCharEncConvImpl impl, void *implCtxt,xmlCharEncodingHandler **out)
 Find or create a handler matching the encoding.
xmlCharEncodingHandlerxmlGetCharEncodingHandler (xmlCharEncoding enc)
xmlCharEncodingHandlerxmlFindCharEncodingHandler (const char *name)
 If the encoding is UTF-8, this will return a no-op handler that shouldn't be used.
xmlCharEncodingHandlerxmlNewCharEncodingHandler (const char *name,xmlCharEncodingInputFunc input,xmlCharEncodingOutputFunc output)
 Create and registers anxmlCharEncodingHandler.
xmlParserErrors xmlCharEncNewCustomHandler (const char *name,xmlCharEncConvFunc input,xmlCharEncConvFunc output,xmlCharEncConvCtxtDtor ctxtDtor, void *inputCtxt, void *outputCtxt,xmlCharEncodingHandler **out)
 Create a customxmlCharEncodingHandler.
int xmlAddEncodingAlias (const char *name, const char *alias)
 Registers an aliasalias for an encoding namedname.
int xmlDelEncodingAlias (const char *alias)
 Unregisters an encoding alias.
const char * xmlGetEncodingAlias (const char *alias)
 Lookup an encoding name for the given alias.
void xmlCleanupEncodingAliases (void)
 Unregisters all aliases.
xmlCharEncoding xmlParseCharEncoding (const char *name)
 Compare the string to the encoding schemes already known.
const char * xmlGetCharEncodingName (xmlCharEncoding enc)
 The "canonical" name for XML encoding.
xmlCharEncoding xmlDetectCharEncoding (const unsigned char *in, int len)
 Guess the encoding of the entity using the first bytes of the entity content according to the non-normative appendix F of the XML-1.0 recommendation.
int xmlCharEncOutFunc (xmlCharEncodingHandler *handler, struct_xmlBuffer *out, struct_xmlBuffer *in)
 Generic front-end for output encoding conversion.
int xmlCharEncInFunc (xmlCharEncodingHandler *handler, struct_xmlBuffer *out, struct_xmlBuffer *in)
 Generic front-end for input encoding conversion.
int xmlCharEncFirstLine (xmlCharEncodingHandler *handler, struct_xmlBuffer *out, struct_xmlBuffer *in)
 DEPERECATED: Don't use.
int xmlCharEncCloseFunc (xmlCharEncodingHandler *handler)
 Releases anxmlCharEncodingHandler.
int xmlUTF8ToIsolat1 (unsigned char *out, int *outlen, const unsigned char *in, int *inlen)
 Take a block of UTF-8 chars in and try to convert it to an ISO Latin 1 block of chars out.
int xmlIsolat1ToUTF8 (unsigned char *out, int *outlen, const unsigned char *in, int *inlen)
 Take a block of ISO Latin 1 chars in and try to convert it to an UTF-8 block of chars out.

Detailed Description

Character encoding conversion functions.

Copyright
See Copyright for the status of this software.
Author
Daniel Veillard

Typedef Documentation

◆ xmlCharEncConvCtxtDtor

typedef void(* xmlCharEncConvCtxtDtor) (void *vctxt)

Free a conversion context.

Parameters
vctxtconversion context

◆ xmlCharEncConvFunc

typedefxmlCharEncError(* xmlCharEncConvFunc) (void *vctxt, unsigned char *out, int *outlen, const unsigned char *in, int *inlen, int flush)

Convert between character encodings.

The value ofinlen after return is the number of bytes consumed andoutlen is the number of bytes produced.

If the converter can consume partial multi-byte sequences, theflush flag can be used to detect truncated sequences at EOF. Otherwise, the flag can be ignored.

Parameters
vctxtconversion context
outa pointer to an array of bytes to store the result
outlenthe length ofout
ina pointer to an array of input bytes
inlenthe length ofin
flushend of input
Returns
anxmlCharEncError code.

◆ xmlCharEncConvImpl

typedefxmlParserErrors(* xmlCharEncConvImpl) (void *vctxt, const char *name,xmlCharEncFlags flags,xmlCharEncodingHandler **out)

If this function returns XML_ERR_OK, it must fill theout pointer with an encoding handler.

The handler can be obtained fromxmlCharEncNewCustomHandler.

flags can contain XML_ENC_INPUT, XML_ENC_OUTPUT or both.

Parameters
vctxtuser data
nameencoding name
flagsbit mask of flags
outpointer to resulting handler
Returns
anxmlParserErrors code.

◆ xmlCharEncodingInputFunc

typedef int(* xmlCharEncodingInputFunc) (unsigned char *out, int *outlen, const unsigned char *in, int *inlen)

Convert characters to UTF-8.

On success, the value ofinlen after return is the number of bytes consumed andoutlen is the number of bytes produced.

Parameters
outa pointer to an array of bytes to store the UTF-8 result
outlenthe length ofout
ina pointer to an array of chars in the original encoding
inlenthe length ofin
Returns
the number of bytes written or anxmlCharEncError code.

◆ xmlCharEncodingOutputFunc

typedef int(* xmlCharEncodingOutputFunc) (unsigned char *out, int *outlen, const unsigned char *in, int *inlen)

Convert characters from UTF-8.

On success, the value ofinlen after return is the number of bytes consumed andoutlen is the number of bytes produced.

Parameters
outa pointer to an array of bytes to store the result
outlenthe length ofout
ina pointer to an array of UTF-8 chars
inlenthe length ofin
Returns
the number of bytes written or anxmlCharEncError code.

Enumeration Type Documentation

◆ xmlCharEncError

enumxmlCharEncError

Encoding conversion errors.

Enumerator
XML_ENC_ERR_SUCCESS 

Success.

XML_ENC_ERR_INTERNAL 

Internal or unclassified error.

XML_ENC_ERR_INPUT 

Invalid or untranslatable input sequence.

XML_ENC_ERR_SPACE 

Not enough space in output buffer.

XML_ENC_ERR_MEMORY 

Out-of-memory error.

◆ xmlCharEncFlags

enumxmlCharEncFlags

Encoding conversion flags.

Enumerator
XML_ENC_INPUT 

Create converter for input (conversion to UTF-8)

XML_ENC_OUTPUT 

Create converter for output (conversion from UTF-8)

XML_ENC_HTML 

Use HTML5 mappings.

◆ xmlCharEncoding

enumxmlCharEncoding

Predefined values for some standard encodings.

Enumerator
XML_CHAR_ENCODING_ERROR 

No char encoding detected.

XML_CHAR_ENCODING_NONE 

No char encoding detected.

XML_CHAR_ENCODING_UTF8 

UTF-8.

XML_CHAR_ENCODING_UTF16LE 

UTF-16 little endian.

XML_CHAR_ENCODING_UTF16BE 

UTF-16 big endian.

XML_CHAR_ENCODING_UCS4LE 

UCS-4 little endian.

XML_CHAR_ENCODING_UCS4BE 

UCS-4 big endian.

XML_CHAR_ENCODING_EBCDIC 

EBCDIC uh!

XML_CHAR_ENCODING_UCS4_2143 

UCS-4 unusual ordering.

XML_CHAR_ENCODING_UCS4_3412 

UCS-4 unusual ordering.

XML_CHAR_ENCODING_UCS2 

UCS-2.

XML_CHAR_ENCODING_8859_1 

ISO-8859-1 ISO Latin 1.

XML_CHAR_ENCODING_8859_2 

ISO-8859-2 ISO Latin 2.

XML_CHAR_ENCODING_8859_3 

ISO-8859-3.

XML_CHAR_ENCODING_8859_4 

ISO-8859-4.

XML_CHAR_ENCODING_8859_5 

ISO-8859-5.

XML_CHAR_ENCODING_8859_6 

ISO-8859-6.

XML_CHAR_ENCODING_8859_7 

ISO-8859-7.

XML_CHAR_ENCODING_8859_8 

ISO-8859-8.

XML_CHAR_ENCODING_8859_9 

ISO-8859-9.

XML_CHAR_ENCODING_2022_JP 

ISO-2022-JP.

XML_CHAR_ENCODING_SHIFT_JIS 

Shift_JIS.

XML_CHAR_ENCODING_EUC_JP 

EUC-JP.

XML_CHAR_ENCODING_ASCII 

pure ASCII

XML_CHAR_ENCODING_UTF16 

UTF-16 native, available since 2.14.

XML_CHAR_ENCODING_HTML 

HTML (output only), available since 2.14.

XML_CHAR_ENCODING_8859_10 

ISO-8859-10, available since 2.14.

XML_CHAR_ENCODING_8859_11 

ISO-8859-11, available since 2.14.

XML_CHAR_ENCODING_8859_13 

ISO-8859-13, available since 2.14.

XML_CHAR_ENCODING_8859_14 

ISO-8859-14, available since 2.14.

XML_CHAR_ENCODING_8859_15 

ISO-8859-15, available since 2.14.

XML_CHAR_ENCODING_8859_16 

ISO-8859-16, available since 2.14.

XML_CHAR_ENCODING_WINDOWS_1252 

windows-1252, available since 2.15

Function Documentation

◆ xmlAddEncodingAlias()

int xmlAddEncodingAlias(const char *name,
const char *alias )

Registers an aliasalias for an encoding namedname.

Existing aliases will be overwritten.

Deprecated
This function modifies global state and is not thread-safe. SeexmlCtxtSetCharEncConvImpl for an alternative.
Parameters
namethe encoding name as parsed, in UTF-8 format (ASCII actually)
aliasthe alias name as parsed, in UTF-8 format (ASCII actually)
Returns
0 in case of success, -1 in case of error.

◆ xmlCharEncCloseFunc()

int xmlCharEncCloseFunc(xmlCharEncodingHandler *handler)

Releases anxmlCharEncodingHandler.

Must be called after a handler is no longer in use.

Parameters
handlerencoding handler
Returns
0.

◆ xmlCharEncFirstLine()

int xmlCharEncFirstLine(xmlCharEncodingHandler *handler,
struct_xmlBuffer *out,
struct_xmlBuffer *in )

DEPERECATED: Don't use.

Parameters
handlerencoding handler
outanxmlBuffer for the output.
inanxmlBuffer for the input
Returns
the number of bytes written or anxmlCharEncError code.

◆ xmlCharEncInFunc()

int xmlCharEncInFunc(xmlCharEncodingHandler *handler,
struct_xmlBuffer *out,
struct_xmlBuffer *in )

Generic front-end for input encoding conversion.

Parameters
handlerencoding handler
outanxmlBuffer for the output.
inanxmlBuffer for the input
Returns
the number of bytes written or anxmlCharEncError code.

◆ xmlCharEncNewCustomHandler()

xmlParserErrors xmlCharEncNewCustomHandler(const char *name,
xmlCharEncConvFuncinput,
xmlCharEncConvFuncoutput,
xmlCharEncConvCtxtDtorctxtDtor,
void *inputCtxt,
void *outputCtxt,
xmlCharEncodingHandler **out )

Create a customxmlCharEncodingHandler.

Parameters
namethe encoding name
inputinput callback which converts to UTF-8
outputoutput callback which converts from UTF-8
ctxtDtorcontext destructor
inputCtxtcontext for input callback
outputCtxtcontext for output callback
outpointer to resulting handler
Returns
anxmlParserErrors code.

◆ xmlCharEncOutFunc()

int xmlCharEncOutFunc(xmlCharEncodingHandler *handler,
struct_xmlBuffer *out,
struct_xmlBuffer *in )

Generic front-end for output encoding conversion.

A first call within set to NULL has to be made to write a BOM.

When using GNU libiconv, unsupported characters in the output encoding will be automatically replaced with a numeric character reference.

Parameters
handlerencoding handler
outanxmlBuffer for the output.
inanxmlBuffer for the input
Returns
the number of bytes written or anxmlCharEncError code.

◆ xmlCleanupCharEncodingHandlers()

void xmlCleanupCharEncodingHandlers(void)

Cleanup the memory allocated for the char encoding support, it unregisters all the encoding handlers and the aliases.

Deprecated
This function will be made private. CallxmlCleanupParser to free global state but see the warnings there.xmlCleanupParser should be only called once at program exit. In most cases, you don't have call cleanup functions at all.

◆ xmlCleanupEncodingAliases()

void xmlCleanupEncodingAliases(void)

Unregisters all aliases.

Deprecated
This function modifies global state and is not thread-safe. SeexmlCtxtSetCharEncConvImpl for an alternative.

◆ xmlCreateCharEncodingHandler()

xmlParserErrors xmlCreateCharEncodingHandler(const char *name,
xmlCharEncFlagsflags,
xmlCharEncConvImplimpl,
void *implCtxt,
xmlCharEncodingHandler **out )

Find or create a handler matching the encoding.

The following converters are looked up in order:

  • Built-in handler (UTF-8, UTF-16, ISO-8859-1, ASCII)
  • Custom implementation if provided
  • User-registered global handler (deprecated)
  • iconv if enabled
  • ICU if enabled

The handler must be closed withxmlCharEncCloseFunc.

If the encoding is UTF-8, a NULL handler and no error code will be returned.

flags can contain XML_ENC_INPUT, XML_ENC_OUTPUT or both.

Since
2.14.0
Parameters
namea string describing the char encoding.
flagsbit mask of flags
impla conversion implementation (optional)
implCtxtuser data for conversion implementation (optional)
outpointer to result
Returns
XML_ERR_OK, XML_ERR_UNSUPPORTED_ENCODING or anotherxmlParserErrors error code.

◆ xmlDelEncodingAlias()

int xmlDelEncodingAlias(const char *alias)

Unregisters an encoding alias.

Deprecated
This function modifies global state and is not thread-safe. SeexmlCtxtSetCharEncConvImpl for an alternative.
Parameters
aliasthe alias name as parsed, in UTF-8 format (ASCII actually)
Returns
0 in case of success, -1 in case of error.

◆ xmlDetectCharEncoding()

xmlCharEncoding xmlDetectCharEncoding(const unsigned char *in,
intlen )

Guess the encoding of the entity using the first bytes of the entity content according to the non-normative appendix F of the XML-1.0 recommendation.

Parameters
ina pointer to the first bytes of the XML entity, must be at least 2 bytes long (at least 4 if encoding is UTF4 variant).
lenpointer to the length of the buffer
Returns
axmlCharEncoding value.

◆ xmlFindCharEncodingHandler()

xmlCharEncodingHandler * xmlFindCharEncodingHandler(const char *name)

If the encoding is UTF-8, this will return a no-op handler that shouldn't be used.

Deprecated
UsexmlOpenCharEncodingHandler which has better error reporting.
Parameters
namea string describing the char encoding.
Returns
the handler or NULL if no handler was found or an error occurred.

◆ xmlGetCharEncodingHandler()

xmlCharEncodingHandler * xmlGetCharEncodingHandler(xmlCharEncodingenc)
Deprecated
UsexmlLookupCharEncodingHandler which has better error reporting.
Parameters
encanxmlCharEncoding value.
Returns
the handler or NULL if no handler was found or an error occurred.

◆ xmlGetCharEncodingName()

const char * xmlGetCharEncodingName(xmlCharEncodingenc)

The "canonical" name for XML encoding.

C.f.http://www.w3.org/TR/REC-xml#charencoding Section 4.3.3 Character Encoding in Entities

Parameters
encthe encoding
Returns
the canonical name for the given encoding.

◆ xmlGetEncodingAlias()

const char * xmlGetEncodingAlias(const char *alias)

Lookup an encoding name for the given alias.

Deprecated
This function is not thread-safe.
Parameters
aliasthe alias name as parsed, in UTF-8 format (ASCII actually)
Returns
NULL if not found, otherwise the original name.

◆ xmlInitCharEncodingHandlers()

void xmlInitCharEncodingHandlers(void)
Deprecated
Alias forxmlInitParser.

◆ xmlIsolat1ToUTF8()

int xmlIsolat1ToUTF8(unsigned char *out,
int *outlen,
const unsigned char *in,
int *inlen )

Take a block of ISO Latin 1 chars in and try to convert it to an UTF-8 block of chars out.

The value ofinlen after return is the number of bytes consumed. The value ofoutlen after return is the number of bytes produced.

Parameters
outa pointer to an array of bytes to store the result
outlenthe length ofout
ina pointer to an array of ISO Latin 1 chars
inlenthe length ofin
Returns
the number of bytes written or anxmlCharEncError code.

◆ xmlLookupCharEncodingHandler()

xmlParserErrors xmlLookupCharEncodingHandler(xmlCharEncodingenc,
xmlCharEncodingHandler **out )

Find or create a handler matching the encoding.

The following converters are looked up in order:

  • Built-in handler (UTF-8, UTF-16, ISO-8859-1, ASCII)
  • User-registered global handler (deprecated)
  • iconv if enabled
  • ICU if enabled

The handler must be closed withxmlCharEncCloseFunc.

If the encoding is UTF-8, a NULL handler and no error code will be returned.

Since
2.13.0
Parameters
encanxmlCharEncoding value.
outpointer to result
Returns
XML_ERR_OK, XML_ERR_UNSUPPORTED_ENCODING or anotherxmlParserErrors error code.

◆ xmlNewCharEncodingHandler()

xmlCharEncodingHandler * xmlNewCharEncodingHandler(const char *name,
xmlCharEncodingInputFuncinput,
xmlCharEncodingOutputFuncoutput )

Create and registers anxmlCharEncodingHandler.

Deprecated
This function modifies global state and is not thread-safe. SeexmlCtxtSetCharEncConvImpl for an alternative.
Parameters
namethe encoding name, in UTF-8 format (ASCII actually)
inputthexmlCharEncodingInputFunc to read that encoding
outputthexmlCharEncodingOutputFunc to write that encoding
Returns
thexmlCharEncodingHandler created (or NULL in case of error).

◆ xmlOpenCharEncodingHandler()

xmlParserErrors xmlOpenCharEncodingHandler(const char *name,
intoutput,
xmlCharEncodingHandler **out )

Find or create a handler matching the encoding.

The following converters are looked up in order:

  • Built-in handler (UTF-8, UTF-16, ISO-8859-1, ASCII)
  • User-registered global handler (deprecated)
  • iconv if enabled
  • ICU if enabled

The handler must be closed withxmlCharEncCloseFunc.

If the encoding is UTF-8, a NULL handler and no error code will be returned.

Since
2.13.0
Parameters
namea string describing the char encoding.
outputboolean, use handler for output
outpointer to result
Returns
XML_ERR_OK, XML_ERR_UNSUPPORTED_ENCODING or anotherxmlParserErrors error code.

◆ xmlParseCharEncoding()

xmlCharEncoding xmlParseCharEncoding(const char *name)

Compare the string to the encoding schemes already known.

Note that the comparison is case insensitive accordingly to the section [XML] 4.3.3 Character Encoding in Entities.

Parameters
namethe encoding name as parsed, in UTF-8 format (ASCII actually)
Returns
one of thexmlCharEncoding values or XML_CHAR_ENCODING_NONE if not recognized.

◆ xmlRegisterCharEncodingHandler()

void xmlRegisterCharEncodingHandler(xmlCharEncodingHandler *handler)

Register the char encoding handler.

Deprecated
This function modifies global state and is not thread-safe. SeexmlCtxtSetCharEncConvImpl for an alternative.
Parameters
handlerthexmlCharEncodingHandler handler block

◆ xmlUTF8ToIsolat1()

int xmlUTF8ToIsolat1(unsigned char *out,
int *outlen,
const unsigned char *in,
int *inlen )

Take a block of UTF-8 chars in and try to convert it to an ISO Latin 1 block of chars out.

The value ofinlen after return is the number of bytes consumed. The value ofoutlen after return is the number of bytes produced.

Parameters
outa pointer to an array of bytes to store the result
outlenthe length ofout
ina pointer to an array of UTF-8 chars
inlenthe length ofin
Returns
the number of bytes written or anxmlCharEncError code.

Generated by doxygen 1.14.0

[8]ページ先頭

©2009-2025 Movatter.jp