Movatterモバイル変換


[0]ホーム

URL:


D Logo
Menu
Search

Library Reference

version 2.112.0

overview

Report a bug
If you spot a problem with this page, click here to create a Bugzilla issue.
Improve this page
Quickly fork, edit online, and submit a pull request for this page.Requires a signed-in GitHub account. This works well for small changes.If you'd like to make larger changes you may want to consider usinga local clone.

std.encoding

Classes and functions for handling and transcoding between various encodings.
For cases where the encoding is known at compile-time, functions are providedfor arbitrary encoding and decoding of characters, arbitrary transcodingbetween strings of different type, as well as validation and sanitization.
Encodings currently supported are UTF-8, UTF-16, UTF-32, ASCII, ISO-8859-1(also known as LATIN-1), ISO-8859-2 (LATIN-2), WINDOWS-1250, WINDOWS-1251and WINDOWS-1252.
CategoryFunctions
DecodecodePointsdecodedecodeReversesafeDecode
ConversioncodeUnitssanitizetranscode
ClassificationcanEncodeisValidisValidCodePointisValidCodeUnit
BOMBOMBOMSeqgetBOMutfBOM
Length & IndexfirstSequenceencodedLengthindexlastSequencevalidLength
Encoding schemesencodingNameEncodingSchemeEncodingSchemeASCIIEncodingSchemeLatin1EncodingSchemeLatin2EncodingSchemeUtf16NativeEncodingSchemeUtf32NativeEncodingSchemeUtf8EncodingSchemeWindows1250EncodingSchemeWindows1251EncodingSchemeWindows1252
RepresentationAsciiCharAsciiStringLatin1CharLatin1StringLatin2CharLatin2StringWindows1250CharWindows1250StringWindows1251CharWindows1251StringWindows1252CharWindows1252String
ExceptionsINVALID_SEQUENCEEncodingException
For cases where the encoding is not known at compile-time, but isknown at run-time, the abstract classEncodingSchemeand its subclasses is provided. To construct a run-time encoder/decoder,one does e.g.
auto e = EncodingScheme.create("utf-8");
This library suppliesEncodingScheme subclasses for ASCII,ISO-8859-1 (also known as LATIN-1), ISO-8859-2 (LATIN-2), WINDOWS-1250,WINDOWS-1251, WINDOWS-1252, UTF-8, and (on little-endian architectures)UTF-16LE and UTF-32LE; or (on big-endian architectures) UTF-16BE and UTF-32BE.
This library provides a mechanism whereby other modules may addEncodingScheme subclasses for any other encoding.
License:
Boost License 1.0.
Authors:
Janice Caron

Sourcestd/encoding.d

enum dcharINVALID_SEQUENCE;
Special value returned bysafeDecode
enumAsciiChar: ubyte;

aliasAsciiString = immutable(AsciiChar)[];
Defines various character sets.
enumLatin1Char: ubyte;
Defines an Latin1-encoded character.
aliasLatin1String = immutable(Latin1Char)[];
Defines an Latin1-encoded string (as an array ofimmutable(Latin1Char)).
enumLatin2Char: ubyte;
Defines a Latin2-encoded character.
aliasLatin2String = immutable(Latin2Char)[];
Defines an Latin2-encoded string (as an array of immutable(Latin2Char)).
enumWindows1250Char: ubyte;
Defines a Windows1250-encoded character.
aliasWindows1250String = immutable(Windows1250Char)[];
Defines an Windows1250-encoded string (as an array of immutable(Windows1250Char)).
enumWindows1251Char: ubyte;
Defines a Windows1251-encoded character.
aliasWindows1251String = immutable(Windows1251Char)[];
Defines an Windows1251-encoded string (as an array of immutable(Windows1251Char)).
enumWindows1252Char: ubyte;
Defines a Windows1252-encoded character.
aliasWindows1252String = immutable(Windows1252Char)[];
Defines an Windows1252-encoded string (as an array of immutable(Windows1252Char)).
pure nothrow @nogc @safe boolisValidCodePoint(dcharc);
Returns true if c is a valid code point
Note that this includes the non-character code points U+FFFE and U+FFFF, since these are valid code points (even though they are not valid characters).

SupersedesThis function supersedesstd.utf.startsValidDchar().

Standards:
Unicode 5.0, ASCII, ISO-8859-1, ISO-8859-2, WINDOWS-1250, WINDOWS-1251, WINDOWS-1252
Parameters:
dcharcthe code point to be tested
@property stringencodingName(T)();
Returns the name of an encoding.
The type of encoding cannot be deduced. Therefore, it is necessary to explicitly specify the encoding type.
Standards:
Unicode 5.0, ASCII, ISO-8859-1, ISO-8859-2, WINDOWS-1250, WINDOWS-1251, WINDOWS-1252
Examples:
writeln(encodingName!(char));// "UTF-8"writeln(encodingName!(wchar));// "UTF-16"writeln(encodingName!(dchar));// "UTF-32"writeln(encodingName!(AsciiChar));// "ASCII"writeln(encodingName!(Latin1Char));// "ISO-8859-1"writeln(encodingName!(Latin2Char));// "ISO-8859-2"writeln(encodingName!(Windows1250Char));// "windows-1250"writeln(encodingName!(Windows1251Char));// "windows-1251"writeln(encodingName!(Windows1252Char));// "windows-1252"
boolcanEncode(E)(dcharc);
Returns true iff it is possible to represent the specified codepoint in the encoding.
The type of encoding cannot be deduced. Therefore, it is necessary to explicitly specify the encoding type.
Standards:
Unicode 5.0, ASCII, ISO-8859-1, ISO-8859-2, WINDOWS-1250, WINDOWS-1251, WINDOWS-1252
Examples:
assert(canEncode!(Latin1Char)('A'));assert(canEncode!(Latin2Char)('A'));assert(!canEncode!(AsciiChar)('\u00A0'));assert(canEncode!(Latin1Char)('\u00A0'));assert(canEncode!(Latin2Char)('\u00A0'));assert(canEncode!(Windows1250Char)('\u20AC'));assert(!canEncode!(Windows1250Char)('\u20AD'));assert(!canEncode!(Windows1250Char)('\uFFFD'));assert(canEncode!(Windows1251Char)('\u0402'));assert(!canEncode!(Windows1251Char)('\u20AD'));assert(!canEncode!(Windows1251Char)('\uFFFD'));assert(canEncode!(Windows1252Char)('\u20AC'));assert(!canEncode!(Windows1252Char)('\u20AD'));assert(!canEncode!(Windows1252Char)('\uFFFD'));assert(!canEncode!(char)(cast(dchar) 0x110000));
Examples:
How to check an entire string
import std.algorithm.searching : find;import std.utf : byDchar;assert("The quick brown fox"    .byDchar    .find!(x => !canEncode!AsciiChar(x))    .empty);
boolisValidCodeUnit(E)(Ec);
Returns true if the code unit is legal. For example, the byte 0x80 would not be legal in ASCII, because ASCII code units must always be in the range 0x00 to 0x7F.
Standards:
Unicode 5.0, ASCII, ISO-8859-1, ISO-8859-2, WINDOWS-1250, WINDOWS-1251, WINDOWS-1252
Parameters:
Ecthe code unit to be tested
Examples:
assert(!isValidCodeUnit(cast(char) 0xC0));assert(!isValidCodeUnit(cast(char) 0xFF));assert(isValidCodeUnit(cast(wchar) 0xD800));assert(!isValidCodeUnit(cast(dchar) 0xD800));assert(!isValidCodeUnit(cast(AsciiChar) 0xA0));assert(isValidCodeUnit(cast(Windows1250Char) 0x80));assert(!isValidCodeUnit(cast(Windows1250Char) 0x81));assert(isValidCodeUnit(cast(Windows1251Char) 0x80));assert(!isValidCodeUnit(cast(Windows1251Char) 0x98));assert(isValidCodeUnit(cast(Windows1252Char) 0x80));assert(!isValidCodeUnit(cast(Windows1252Char) 0x81));
boolisValid(E)(const(E)[]s);
Returns true if the string is encoded correctly

SupersedesThis function supersedes std.utf.validate(), however note that this function returns a bool indicating whether the input was valid or not, whereas the older function would throw an exception.

Standards:
Unicode 5.0, ASCII, ISO-8859-1, ISO-8859-2, WINDOWS-1250, WINDOWS-1251, WINDOWS-1252
Parameters:
const(E)[]sthe string to be tested
Examples:
assert(isValid("\u20AC100"));assert(!isValid(cast(char[3])[167, 133, 175]));
size_tvalidLength(E)(const(E)[]s);
Returns the length of the longest possible substring, starting from the first code unit, which is validly encoded.
Standards:
Unicode 5.0, ASCII, ISO-8859-1, ISO-8859-2, WINDOWS-1250, WINDOWS-1251, WINDOWS-1252
Parameters:
const(E)[]sthe string to be tested
immutable(E)[]sanitize(E)(immutable(E)[]s);
Sanitizes a string by replacing malformed code unit sequences with valid code unit sequences. The result is guaranteed to be valid for this encoding.
If the input string is already valid, this function returns the original, otherwise it constructs a new string by replacing all illegal code unit sequences with the encoding's replacement character, Invalid sequences will be replaced with the Unicode replacement character (U+FFFD) if the character repertoire contains it, otherwise invalid sequences will be replaced with '?'.
Standards:
Unicode 5.0, ASCII, ISO-8859-1, ISO-8859-2, WINDOWS-1250, WINDOWS-1251, WINDOWS-1252
Parameters:
immutable(E)[]sthe string to be sanitized
Examples:
writeln(sanitize("hello \xF0\x80world"));// "hello \xEF\xBF\xBDworld"
size_tfirstSequence(E)(const(E)[]s);
Returns the length of the first encoded sequence.
The input to this function MUST be validly encoded. This is enforced by the function's in-contract.
Standards:
Unicode 5.0, ASCII, ISO-8859-1, ISO-8859-2, WINDOWS-1250, WINDOWS-1251, WINDOWS-1252
Parameters:
const(E)[]sthe string to be sliced
Examples:
writeln(firstSequence("\u20AC1000"));// "\u20AC".lengthwriteln(firstSequence("hel"));// "h".length
size_tlastSequence(E)(const(E)[]s);
Returns the length of the last encoded sequence.
The input to this function MUST be validly encoded. This is enforced by the function's in-contract.
Standards:
Unicode 5.0, ASCII, ISO-8859-1, ISO-8859-2, WINDOWS-1250, WINDOWS-1251, WINDOWS-1252
Parameters:
const(E)[]sthe string to be sliced
Examples:
writeln(lastSequence("1000\u20AC"));// "\u20AC".lengthwriteln(lastSequence("hellö"));// "ö".length
ptrdiff_tindex(E)(const(E)[]s, intn);
Returns the array index at which the (n+1)th code point begins.
The input to this function MUST be validly encoded. This is enforced by the function's in-contract.

SupersedesThis function supersedes std.utf.toUTFindex().

Standards:
Unicode 5.0, ASCII, ISO-8859-1, ISO-8859-2, WINDOWS-1250, WINDOWS-1251, WINDOWS-1252
Parameters:
const(E)[]sthe string to be counted
intnthe current code point index
Examples:
writeln(index("\u20AC100", 1));// 3writeln(index("hällo", 2));// 3
dchardecode(S)(ref Ss);
Decodes a single code point.
This function removes one or more code units from the start of a string, and returns the decoded code point which those code units represent.
The input to this function MUST be validly encoded. This is enforced by the function's in-contract.

SupersedesThis function supersedes std.utf.decode(), however, note that the function codePoints() supersedes it more conveniently.

Standards:
Unicode 5.0, ASCII, ISO-8859-1, ISO-8859-2, WINDOWS-1250, WINDOWS-1251, WINDOWS-1252
Parameters:
Ssthe string whose first code point is to be decoded
dchardecodeReverse(E)(ref const(E)[]s);
Decodes a single code point from the end of a string.
This function removes one or more code units from the end of a string, and returns the decoded code point which those code units represent.
The input to this function MUST be validly encoded. This is enforced by the function's in-contract.
Standards:
Unicode 5.0, ASCII, ISO-8859-1, ISO-8859-2, WINDOWS-1250, WINDOWS-1251, WINDOWS-1252
Parameters:
const(E)[]sthe string whose first code point is to be decoded
dcharsafeDecode(S)(ref Ss);
Decodes a single code point. The input does not have to be valid.
This function removes one or more code units from the start of a string, and returns the decoded code point which those code units represent.
This function will accept an invalidly encoded string as input. If an invalid sequence is found at the start of the string, this function will remove it, and return the value INVALID_SEQUENCE.
Standards:
Unicode 5.0, ASCII, ISO-8859-1, ISO-8859-2, WINDOWS-1250, WINDOWS-1251, WINDOWS-1252
Parameters:
Ssthe string whose first code point is to be decoded
size_tencodedLength(E)(dcharc);
Returns the number of code units required to encode a single code point.
The input to this function MUST be a valid code point. This is enforced by the function's in-contract.
The type of the output cannot be deduced. Therefore, it is necessary to explicitly specify the encoding as a template parameter.
Standards:
Unicode 5.0, ASCII, ISO-8859-1, ISO-8859-2, WINDOWS-1250, WINDOWS-1251, WINDOWS-1252
Parameters:
dcharcthe code point to be encoded
E[]encode(E)(dcharc);
Encodes a single code point.
This function encodes a single code point into one or more code units. It returns a string containing those code units.
The input to this function MUST be a valid code point. This is enforced by the function's in-contract.
The type of the output cannot be deduced. Therefore, it is necessary to explicitly specify the encoding as a template parameter.

SupersedesThis function supersedes std.utf.encode(), however, note that the function codeUnits() supersedes it more conveniently.

Standards:
Unicode 5.0, ASCII, ISO-8859-1, ISO-8859-2, WINDOWS-1250, WINDOWS-1251, WINDOWS-1252
Parameters:
dcharcthe code point to be encoded
size_tencode(E)(dcharc, E[]array);
Encodes a single code point into an array.
This function encodes a single code point into one or more code units The code units are stored in a user-supplied fixed-size array, which must be passed by reference.
The input to this function MUST be a valid code point. This is enforced by the function's in-contract.
The type of the output cannot be deduced. Therefore, it is necessary to explicitly specify the encoding as a template parameter.

SupersedesThis function supersedes std.utf.encode(), however, note that the function codeUnits() supersedes it more conveniently.

Standards:
Unicode 5.0, ASCII, ISO-8859-1, ISO-8859-2, WINDOWS-1250, WINDOWS-1251, WINDOWS-1252
Parameters:
dcharcthe code point to be encoded
E[]arraythe destination array
Returns:
the number of code units written to the array
voidencode(E)(dcharc, void delegate(E)dg);
Encodes a single code point to a delegate.
This function encodes a single code point into one or more code units. The code units are passed one at a time to the supplied delegate.
The input to this function MUST be a valid code point. This is enforced by the function's in-contract.
The type of the output cannot be deduced. Therefore, it is necessary to explicitly specify the encoding as a template parameter.

SupersedesThis function supersedes std.utf.encode(), however, note that the function codeUnits() supersedes it more conveniently.

Standards:
Unicode 5.0, ASCII, ISO-8859-1, ISO-8859-2, WINDOWS-1250, WINDOWS-1251, WINDOWS-1252
Parameters:
dcharcthe code point to be encoded
void delegate(E)dgthe delegate to invoke for each code unit
size_tencode(Tgt, Src, R)(in Src[]s, Rrange);
Encodes the contents ofs in units of typeTgt, writing the result to anoutput range.
Returns:
The number ofTgt elements written.
Parameters:
TgtElement type ofrange.
Src[]sInput array.
RrangeOutput range.
CodePoints!EcodePoints(E)(immutable(E)[]s);
Returns a foreachable struct which can bidirectionally iterate over all code points in a string.
The input to this function MUST be validly encoded. This is enforced by the function's in-contract.
You can foreach either with or without an index. If an index is specified, it will be initialized at each iteration with the offset into the string at which the code point begins.

SupersedesThis function supersedes std.utf.decode().

Standards:
Unicode 5.0, ASCII, ISO-8859-1, ISO-8859-2, WINDOWS-1250, WINDOWS-1251, WINDOWS-1252
Parameters:
immutable(E)[]sthe string to be decoded

Example

strings ="hello world";foreach (c;codePoints(s)){// do something with c (which will always be a dchar)}
Note that, currently, foreach (c:codePoints(s)) is superior to foreach (c;s) in that the latter will fall over on encountering U+FFFF.

Examples:
strings ="hello";string t;foreach (c;codePoints(s)){    t ~=cast(char) c;}writeln(s);// t
CodeUnits!EcodeUnits(E)(dcharc);
Returns a foreachable struct which can bidirectionally iterate over all code units in a code point.
The input to this function MUST be a valid code point. This is enforced by the function's in-contract.
The type of the output cannot be deduced. Therefore, it is necessary to explicitly specify the encoding type in the template parameter.

SupersedesThis function supersedes std.utf.encode().

Standards:
Unicode 5.0, ASCII, ISO-8859-1, ISO-8859-2, WINDOWS-1250, WINDOWS-1251, WINDOWS-1252
Parameters:
dcharcthe code point to be encoded
Examples:
char[] a;foreach (c;codeUnits!(char)(cast(dchar)'\u20AC')){    a ~=c;}writeln(a.length);// 3writeln(a[0]);// 0xE2writeln(a[1]);// 0x82writeln(a[2]);// 0xAC
voidtranscode(Src, Dst)(Src[]s, out Dst[]r);
Convert a string from one encoding to another.

SupersedesThis function supersedes std.utf.toUTF8(), std.utf.toUTF16() and std.utf.toUTF32() (but note that to!() supersedes it more conveniently).

Standards:
Unicode 5.0, ASCII, ISO-8859-1, ISO-8859-2, WINDOWS-1250, WINDOWS-1251, WINDOWS-1252
Parameters:
Src[]sSource string.Must be validly encoded. This is enforced by the function's in-contract.
Dst[]rDestination string
See Also:
Examples:
wstring ws;// transcode from UTF-8 to UTF-16transcode("hello world",ws);writeln(ws);// "hello world"wLatin1String ls;// transcode from UTF-16 to ISO-8859-1transcode(ws, ls);writeln(ls);// "hello world"
classEncodingException:object.Exception;
The base class for exceptions thrown by this module
abstract classEncodingScheme;
Abstract base class of all encoding schemes
voidregister(Klass : EncodingScheme)();
Registers a subclass of EncodingScheme.
This function allows user-defined subclasses of EncodingScheme to be declared in other modules.
Parameters:
KlassThe subclass of EncodingScheme to register.

Example

class Amiga1251 : EncodingScheme{sharedstaticthis()    {        EncodingScheme.register!Amiga1251;    }}

static EncodingSchemecreate(stringencodingName);
Obtains a subclass of EncodingScheme which is capable of encoding and decoding the named encoding scheme.
This function is only aware of EncodingSchemes which have been registered with the register() function.

Example

auto scheme = EncodingScheme.create("Amiga-1251");

abstract stringtoString() const;
Returns the standard name of the encoding scheme
abstract string[]names() const;
Returns an array of all known names for this encoding scheme
abstract boolcanEncode(dcharc) const;
Returns true if the character c can be represented in this encoding scheme.
abstract size_tencodedLength(dcharc) const;
Returns the number of ubytes required to encode this code point.
The input to this function MUST be a valid code point.
Parameters:
dcharcthe code point to be encoded
Returns:
the number of ubytes required.
abstract size_tencode(dcharc, ubyte[]buffer) const;
Encodes a single code point into a user-supplied, fixed-size buffer.
This function encodes a single code point into one or more ubytes. The supplied buffer must be code unit aligned. (For example, UTF-16LE or UTF-16BE must be wchar-aligned, UTF-32LE or UTF-32BE must be dchar-aligned, etc.)
The input to this function MUST be a valid code point.
Parameters:
dcharcthe code point to be encoded
ubyte[]bufferthe destination array
Returns:
the number of ubytes written.
abstract dchardecode(ref const(ubyte)[]s) const;
Decodes a single code point.
This function removes one or more ubytes from the start of an array, and returns the decoded code point which those ubytes represent.
The input to this function MUST be validly encoded.
Parameters:
const(ubyte)[]sthe array whose first code point is to be decoded
abstract dcharsafeDecode(ref const(ubyte)[]s) const;
Decodes a single code point. The input does not have to be valid.
This function removes one or more ubytes from the start of an array, and returns the decoded code point which those ubytes represent.
This function will accept an invalidly encoded array as input. If an invalid sequence is found at the start of the string, this function will remove it, and return the value INVALID_SEQUENCE.
Parameters:
const(ubyte)[]sthe array whose first code point is to be decoded
abstract @property immutable(ubyte)[]replacementSequence() const;
Returns the sequence of ubytes to be used to represent any character which cannot be represented in the encoding scheme.
Normally this will be a representation of some substitution character, such as U+FFFD or '?'.
boolisValid(const(ubyte)[]s);
Returns true if the array is encoded correctly
Parameters:
const(ubyte)[]sthe array to be tested
size_tvalidLength()(const(ubyte)[]s);
Returns the length of the longest possible substring, starting from the first element, which is validly encoded.
Parameters:
const(ubyte)[]sthe array to be tested
immutable(ubyte)[]sanitize()(immutable(ubyte)[]s);
Sanitizes an array by replacing malformed ubyte sequences with valid ubyte sequences. The result is guaranteed to be valid for this encoding scheme.
If the input array is already valid, this function returns the original, otherwise it constructs a new array by replacing all illegal sequences with the encoding scheme's replacement sequence.
Parameters:
immutable(ubyte)[]sthe string to be sanitized
size_tfirstSequence()(const(ubyte)[]s);
Returns the length of the first encoded sequence.
The input to this function MUST be validly encoded. This is enforced by the function's in-contract.
Parameters:
const(ubyte)[]sthe array to be sliced
size_tcount()(const(ubyte)[]s);
Returns the total number of code points encoded in a ubyte array.
The input to this function MUST be validly encoded. This is enforced by the function's in-contract.
Parameters:
const(ubyte)[]sthe string to be counted
ptrdiff_tindex()(const(ubyte)[]s, size_tn);
Returns the array index at which the (n+1)th code point begins.
The input to this function MUST be validly encoded. This is enforced by the function's in-contract.
Parameters:
const(ubyte)[]sthe string to be counted
size_tnthe current code point index
classEncodingSchemeASCII:std.encoding.EncodingScheme;
EncodingScheme to handle ASCII
This scheme recognises the following names: "ANSI_X3.4-1968", "ANSI_X3.4-1986", "ASCII", "IBM367", "ISO646-US", "ISO_646.irv:1991", "US-ASCII", "cp367", "csASCII" "iso-ir-6", "us"
classEncodingSchemeLatin1:std.encoding.EncodingScheme;
EncodingScheme to handle Latin-1
This scheme recognises the following names: "CP819", "IBM819", "ISO-8859-1", "ISO_8859-1", "ISO_8859-1:1987", "csISOLatin1", "iso-ir-100", "l1", "latin1"
classEncodingSchemeLatin2:std.encoding.EncodingScheme;
EncodingScheme to handle Latin-2
This scheme recognises the following names: "Latin 2", "ISO-8859-2", "ISO_8859-2", "ISO_8859-2:1999", "Windows-28592"
classEncodingSchemeWindows1250:std.encoding.EncodingScheme;
EncodingScheme to handle Windows-1250
This scheme recognises the following names: "windows-1250"
classEncodingSchemeWindows1251:std.encoding.EncodingScheme;
EncodingScheme to handle Windows-1251
This scheme recognises the following names: "windows-1251"
classEncodingSchemeWindows1252:std.encoding.EncodingScheme;
EncodingScheme to handle Windows-1252
This scheme recognises the following names: "windows-1252"
classEncodingSchemeUtf8:std.encoding.EncodingScheme;
EncodingScheme to handle UTF-8
This scheme recognises the following names: "UTF-8"
classEncodingSchemeUtf16Native:std.encoding.EncodingScheme;
EncodingScheme to handle UTF-16 in native byte order
This scheme recognises the following names: "UTF-16LE" (little-endian architecture only) "UTF-16BE" (big-endian architecture only)
classEncodingSchemeUtf32Native:std.encoding.EncodingScheme;
EncodingScheme to handle UTF-32 in native byte order
This scheme recognises the following names: "UTF-32LE" (little-endian architecture only) "UTF-32BE" (big-endian architecture only)
enumBOM: int;
Definitions of common Byte Order Marks.The elements of theenum can used as indices intobomTable to getmatchingBOMSeq.
none
no BOM was found
utf32be
[0x00, 0x00, 0xFE, 0xFF]
utf32le
[0xFF, 0xFE, 0x00, 0x00]
utf7
[0x2B, 0x2F, 0x76, 0x38] [0x2B, 0x2F, 0x76, 0x39], [0x2B, 0x2F, 0x76, 0x2B], [0x2B, 0x2F, 0x76, 0x2F], [0x2B, 0x2F, 0x76, 0x38, 0x2D]
utf1
[0xF7, 0x64, 0x4C]
utfebcdic
[0xDD, 0x73, 0x66, 0x73]
scsu
[0x0E, 0xFE, 0xFF]
bocu1
[0xFB, 0xEE, 0x28]
gb18030
[0x84, 0x31, 0x95, 0x33]
utf8
[0xEF, 0xBB, 0xBF]
utf16be
[0xFE, 0xFF]
utf16le
[0xFF, 0xFE]
aliasBOMSeq = std.typecons.Tuple!(BOM, "schema", ubyte[], "sequence").Tuple;
The type stored insidebomTable.
immutable Tuple!(BOM, "schema", ubyte[], "sequence")[]bomTable;
Mapping of a byte sequence toByte Order Mark (BOM)
immutable(BOMSeq)getBOM(Range)(Rangeinput)
if (isForwardRange!Range && is(immutable(ElementType!Range) == immutable(ubyte)));
Returns aBOMSeq for a giveninput.If noBOM is present theBOMSeq forBOM.none isreturned. TheBOM sequence at the beginning of the range willnot be comsumed from the passed range. If you pass a reference typerange make sure thatsave creates a deep copy.
Parameters:
RangeinputThe sequence to check for theBOM
Returns:
the foundBOMSeq corresponding to the passedinput.
Examples:
import std.format : format;auto ts =dchar(0x0000FEFF) ~"Hello World"d;auto entry =getBOM(cast(ubyte[]) ts);version (BigEndian){    writeln(entry.schema);// BOM.utf32be}else{    writeln(entry.schema);// BOM.utf32le}
enum dcharutfBOM;
Constant defining a fully decoded BOM
Copyright © 1999-2026 by theD Language Foundation | Page generated byDdoc on Fri Feb 20 17:58:40 2026

[8]ページ先頭

©2009-2026 Movatter.jp