Movatterモバイル変換


[0]ホーム

URL:


HomeClassesMethods

In Files

  • encoding.c
  • transcode.c

Parent

Object

Namespace

Methods

Files

Class/Module Index[+]

Quicksearch
No matching classes.

Encoding

AnEncoding instance represents a characterencoding usable in Ruby. It is defined as a constant under theEncoding namespace. It has a name and optionally,aliases:

Encoding::ISO_8859_1.name#=> "ISO-8859-1"Encoding::ISO_8859_1.names#=> ["ISO-8859-1", "ISO8859-1"]

Ruby methods dealing with encodings return or acceptEncoding instances as arguments (when a methodaccepts anEncoding instance as an argument, itcan be passed anEncoding name or aliasinstead).

"some string".encoding#=> #<Encoding:UTF-8>string ="some string".encode(Encoding::ISO_8859_1)#=> "some string"string.encoding#=> #<Encoding:ISO-8859-1>"some string".encode"ISO-8859-1"#=> "some string"

Encoding::ASCII_8BIT is a special encoding that is usually used for a bytestring, not a character string. But as the name insists, its characters inthe range of ASCII are considered as ASCII characters. This is useful whenyou use ASCII-8BIT characters with other ASCII compatible characters.

Changing an encoding

The associatedEncoding of aString can be changed in two different ways.

First, it is possible to set theEncoding of astring to a newEncoding without changing theinternal byte representation of the string, withString#force_encoding. Thisis how you can tell Ruby the correct encoding of a string.

string#=> "R\xC3\xA9sum\xC3\xA9"string.encoding#=> #<Encoding:ISO-8859-1>string.force_encoding(Encoding::UTF_8)#=> "R\u00E9sum\u00E9"

Second, it is possible to transcode a string, i.e. translate its internalbyte representation to another encoding. Its associated encoding is alsoset to the other encoding. SeeString#encode for the various formsof transcoding, and theEncoding::Converter class for additionalcontrol over the transcoding process.

string#=> "R\u00E9sum\u00E9"string.encoding#=> #<Encoding:UTF-8>string =string.encode!(Encoding::ISO_8859_1)#=> "R\xE9sum\xE9"string.encoding#=> #<Encoding::ISO-8859-1>

Script encoding

All Ruby script code has an associatedEncodingwhich anyString literal created in the sourcecode will be associated to.

The default script encoding is Encoding::UTF_8 after v2.0, but it can bechanged by a magic comment on the first line of the source code file (orsecond line, if there is a shebang line on the first). The comment mustcontain the wordcoding orencoding, followed bya colon, space and theEncoding name or alias:

# encoding: UTF-8"some string".encoding#=> #<Encoding:UTF-8>

The__ENCODING__ keyword returns the script encoding of thefile which the keyword is written:

# encoding: ISO-8859-1__ENCODING__#=> #<Encoding:ISO-8859-1>

ruby -K will change the default locale encoding, but this isnot recommended. Ruby source files should declare its script encoding by amagic comment even when they only depend on US-ASCII strings or regularexpressions.

Locale encoding

The default encoding of the environment. Usually derived from locale.

see::locale_charmap,::find('locale')

Filesystem encoding

The default encoding of strings from the filesystem of the environment.This is used for strings of file names or paths.

see::find('filesystem')

External encoding

EachIO object has an external encoding whichindicates the encoding that Ruby will use to read its data. By default Rubysets the external encoding of anIO object to thedefault external encoding. The default external encoding is set by localeencoding or the interpreter-E option.::default_externalreturns the current value of the external encoding.

ENV["LANG"]#=> "UTF-8"Encoding.default_external#=> #<Encoding:UTF-8>$ ruby -E ISO-8859-1 -e "p Encoding.default_external"#<Encoding:ISO-8859-1>$ LANG=C ruby -e 'p Encoding.default_external'#<Encoding:US-ASCII>

The default external encoding may also be set through::default_external=,but you should not do this as strings created before and after the changewill have inconsistent encodings. Instead useruby -E toinvoke ruby with the correct external encoding.

When you know that the actual encoding of the data of anIO object is not the default external encoding, you canreset its external encoding withIO#set_encoding or set it atIO object creation (seeIO.new options).

Internal encoding

To process the data of anIO object which has anencoding different from its external encoding, you can set its internalencoding. Ruby will use this internal encoding to transcode the data whenit is read from theIO object.

Conversely, when data is written to theIO object itis transcoded from the internal encoding to the external encoding of theIO object.

The internal encoding of anIO object can be set withIO#set_encoding or atIO object creation (seeIO.new options).

The internal encoding is optional and when not set, the Ruby defaultinternal encoding is used. If not explicitly set this default internalencoding isnil meaning that by default, no transcodingoccurs.

The default internal encoding can be set with the interpreter option-E.::default_internalreturns the current internal encoding.

$ ruby -e 'p Encoding.default_internal'nil$ ruby -E ISO-8859-1:UTF-8 -e "p [Encoding.default_external, \  Encoding.default_internal]"[#<Encoding:ISO-8859-1>, #<Encoding:UTF-8>]

The default internal encoding may also be set through::default_internal=,but you should not do this as strings created before and after the changewill have inconsistent encodings. Instead useruby -E toinvoke ruby with the correct internal encoding.

IO encoding example

In the following example a UTF-8 encoded string “Ru00E9sumu00E9” istranscoded for output to ISO-8859-1 encoding, then read back in andtranscoded to UTF-8:

string ="R\u00E9sum\u00E9"open("transcoded.txt","w:ISO-8859-1")do|io|io.write(string)endputs"raw text:"pFile.binread("transcoded.txt")putsopen("transcoded.txt","r:ISO-8859-1:UTF-8")do|io|puts"transcoded text:"pio.readend

While writing the file, the internal encoding is not specified as it isonly necessary for reading. While reading the file both the internal andexternal encoding must be specified to obtain the correct result.

$ ruby t.rbraw text:"R\xE9sum\xE9"transcoded text:"R\u00E9sum\u00E9"

Public Class Methods

aliases → {"alias1" => "orig1", "alias2" => "orig2", ...}click to toggle source

Returns the hash of available encoding alias and original encoding name.

Encoding.aliases#=> {"BINARY"=>"ASCII-8BIT", "ASCII"=>"US-ASCII", "ANSI_X3.4-1968"=>"US-ASCII",      "SJIS"=>"Windows-31J", "eucJP"=>"EUC-JP", "CP932"=>"Windows-31J"}
                static VALUErb_enc_aliases(VALUE klass){    VALUE aliases[2];    aliases[0] = rb_hash_new();    aliases[1] = rb_ary_new();    GLOBAL_ENC_TABLE_EVAL(enc_table,                          st_foreach(enc_table->names, rb_enc_aliases_enc_i, (st_data_t)aliases));    return aliases[0];}
compatible?(obj1, obj2) → enc or nilclick to toggle source

Checks the compatibility of two objects.

If the objects are both strings they are compatible when they areconcatenatable. The encoding of the concatenated string will be returnedif they are compatible, nil if they are not.

Encoding.compatible?("\xa1".force_encoding("iso-8859-1"),"b")#=> #<Encoding:ISO-8859-1>Encoding.compatible?("\xa1".force_encoding("iso-8859-1"),"\xa1\xa1".force_encoding("euc-jp"))#=> nil

If the objects are non-strings their encodings are compatible when theyhave an encoding and:

  • Either encoding is US-ASCII compatible

  • One of the encodings is a 7-bit encoding

                static VALUEenc_compatible_p(VALUE klass, VALUE str1, VALUE str2){    rb_encoding *enc;    if (!enc_capable(str1)) return Qnil;    if (!enc_capable(str2)) return Qnil;    enc = rb_enc_compatible(str1, str2);    if (!enc) return Qnil;    return rb_enc_from_encoding(enc);}
default_external → encclick to toggle source

Returns default external encoding.

The default external encoding is used by default for strings created fromthe following locations:

While strings created from these locations will have this encoding, theencoding may not be valid. Be sure to checkString#valid_encoding?.

File data written to disk will be transcoded to thedefault external encoding when written, if::default_internal isnot nil.

The default external encoding is initialized by the -E option. If -Eisn't set, it is initialized to UTF-8 on Windows and the locale onother operating systems.

                static VALUEget_default_external(VALUE klass){    return rb_enc_default_external();}
default_external = encclick to toggle source

Sets default external encoding. You should not set::default_external inruby code as strings created before changing the value may have a differentencoding from strings created after the value was changed., instead youshould useruby -E to invoke ruby with the correctdefault_external.

See::default_external forinformation on how the default external encoding is used.

                static VALUEset_default_external(VALUE klass, VALUE encoding){    rb_warning("setting Encoding.default_external");    rb_enc_set_default_external(encoding);    return encoding;}
default_internal → encclick to toggle source

Returns default internal encoding. Strings will be transcoded to thedefault internal encoding in the following places if the default internalencoding is not nil:

AdditionallyString#encode andString#encode! use the defaultinternal encoding if no encoding is given.

The script encoding (__ENCODING__), not::default_internal, isused as the encoding of created strings.

::default_internal isinitialized with -E option or nil otherwise.

                static VALUEget_default_internal(VALUE klass){    return rb_enc_default_internal();}
default_internal = enc or nilclick to toggle source

Sets default internal encoding or removes default internal encoding whenpassed nil. You should not set::default_internal inruby code as strings created before changing the value may have a differentencoding from strings created after the change. Instead you should useruby -E to invoke ruby with the correct default_internal.

See::default_internal forinformation on how the default internal encoding is used.

                static VALUEset_default_internal(VALUE klass, VALUE encoding){    rb_warning("setting Encoding.default_internal");    rb_enc_set_default_internal(encoding);    return encoding;}
find(string) → encclick to toggle source

Search the encoding with specifiedname.name should be astring.

Encoding.find("US-ASCII")#=> #<Encoding:US-ASCII>

Names which this method accept are encoding names and aliases includingfollowing special aliases

“external”

default external encoding

“internal”

default internal encoding

“locale”

locale encoding

“filesystem”

filesystem encoding

AnArgumentError is raised when noencoding withname. OnlyEncoding.find("internal") however returns nil whenno encoding named “internal”, in other words, when Ruby has no defaultinternal encoding.

                static VALUEenc_find(VALUE klass, VALUE enc){    int idx;    if (is_obj_encoding(enc))        return enc;    idx = str_to_encindex(enc);    if (idx == UNSPECIFIED_ENCODING) return Qnil;    return rb_enc_from_encoding_index(idx);}
list → [enc1, enc2, ...]click to toggle source

Returns the list of loaded encodings.

Encoding.list#=> [#<Encoding:ASCII-8BIT>, #<Encoding:UTF-8>,#<Encoding:ISO-2022-JP (dummy)>]Encoding.find("US-ASCII")#=> #<Encoding:US-ASCII>Encoding.list#=> [#<Encoding:ASCII-8BIT>, #<Encoding:UTF-8>,#<Encoding:US-ASCII>, #<Encoding:ISO-2022-JP (dummy)>]
                static VALUEenc_list(VALUE klass){    VALUE ary = rb_ary_new2(0);    RB_VM_LOCK_ENTER();    {        rb_ary_replace(ary, rb_default_encoding_list);        rb_ary_concat(ary, rb_additional_encoding_list);    }    RB_VM_LOCK_LEAVE();    return ary;}
locale_charmap → stringclick to toggle source

Returns the locale charmap name. It returns nil if no appropriateinformation.

DebianGNU/LinuxLANG=CEncoding.locale_charmap#=> "ANSI_X3.4-1968"LANG=ja_JP.EUC-JPEncoding.locale_charmap#=> "EUC-JP"SunOS5LANG=CEncoding.locale_charmap#=> "646"LANG=jaEncoding.locale_charmap#=> "eucJP"

The result is highly platform dependent. So::find may cause an error. If youneed some encoding object even for unknown locale,::find(“locale”) can be used.

                VALUErb_locale_charmap(VALUE klass){#if NO_LOCALE_CHARMAP    return rb_usascii_str_new_cstr("US-ASCII");#else    return locale_charmap(rb_usascii_str_new_cstr);#endif}
name_list → ["enc1", "enc2", ...]click to toggle source

Returns the list of available encoding names.

Encoding.name_list#=> ["US-ASCII", "ASCII-8BIT", "UTF-8",      "ISO-8859-1", "Shift_JIS", "EUC-JP",      "Windows-31J",      "BINARY", "CP932", "eucJP"]
                static VALUErb_enc_name_list(VALUE klass){    VALUE ary;    GLOBAL_ENC_TABLE_ENTER(enc_table);    {        ary = rb_ary_new2(enc_table->names->num_entries);        st_foreach(enc_table->names, rb_enc_name_list_i, (st_data_t)ary);    }    GLOBAL_ENC_TABLE_LEAVE();    return ary;}

Public Instance Methods

ascii_compatible? → true or falseclick to toggle source

Returns whether ASCII-compatible or not.

Encoding::UTF_8.ascii_compatible?#=> trueEncoding::UTF_16BE.ascii_compatible?#=> false
                static VALUEenc_ascii_compatible_p(VALUE enc){    return rb_enc_asciicompat(must_encoding(enc)) ? Qtrue : Qfalse;}
dummy? → true or falseclick to toggle source

Returns true for dummy encodings. A dummy encoding is an encoding for whichcharacter handling is not properly implemented. It is used for statefulencodings.

Encoding::ISO_2022_JP.dummy?#=> trueEncoding::UTF_8.dummy?#=> false
                static VALUEenc_dummy_p(VALUE enc){    return ENC_DUMMY_P(must_encoding(enc)) ? Qtrue : Qfalse;}
inspect → stringclick to toggle source

Returns a string which represents the encoding for programmers.

Encoding::UTF_8.inspect#=> "#<Encoding:UTF-8>"Encoding::ISO_2022_JP.inspect#=> "#<Encoding:ISO-2022-JP (dummy)>"
                static VALUEenc_inspect(VALUE self){    rb_encoding *enc;    if (!is_data_encoding(self)) {        not_encoding(self);    }    if (!(enc = DATA_PTR(self)) || rb_enc_from_index(rb_enc_to_index(enc)) != enc) {        rb_raise(rb_eTypeError, "broken Encoding");    }    return rb_enc_sprintf(rb_usascii_encoding(),                          "#<%"PRIsVALUE":%s%s%s>", rb_obj_class(self),                          rb_enc_name(enc),                          (ENC_DUMMY_P(enc) ? " (dummy)" : ""),                          enc_autoload_p(enc) ? " (autoload)" : "");}
name → stringclick to toggle source

Returns the name of the encoding.

Encoding::UTF_8.name#=> "UTF-8"
                static VALUEenc_name(VALUE self){    return rb_fstring_cstr(rb_enc_name((rb_encoding*)DATA_PTR(self)));}
names → arrayclick to toggle source

Returns the list of name and aliases of the encoding.

Encoding::WINDOWS_31J.names#=> ["Windows-31J", "CP932", "csWindows31J", "SJIS", "PCK"]
                static VALUEenc_names(VALUE self){    VALUE args[2];    args[0] = (VALUE)rb_to_encoding_index(self);    args[1] = rb_ary_new2(0);    GLOBAL_ENC_TABLE_EVAL(enc_table,                          st_foreach(enc_table->names, enc_names_i, (st_data_t)args));    return args[1];}
replicate(name) → encodingclick to toggle source

Returns a replicated encoding ofenc whose name isname.The new encoding should have the same byte structure ofenc. Ifname is used by another encoding, raiseArgumentError.

                static VALUEenc_replicate_m(VALUE encoding, VALUE name){    int idx = rb_enc_replicate(name_for_encoding(&name), rb_to_encoding(encoding));    RB_GC_GUARD(name);    return rb_enc_from_encoding_index(idx);}
to_s → stringclick to toggle source

Returns the name of the encoding.

Encoding::UTF_8.name#=> "UTF-8"
                static VALUEenc_name(VALUE self){    return rb_fstring_cstr(rb_enc_name((rb_encoding*)DATA_PTR(self)));}

This page was generated for Ruby 3.0.0

Generated with Ruby-doc Rdoc Generator 0.42.0.


[8]ページ先頭

©2009-2025 Movatter.jp