AnEncoding instance represents a characterencoding usable in Ruby. It is defined as a constant under theEncoding namespace. It has a name and optionally,aliases:
Encoding::ISO_8859_1.name#=> "ISO-8859-1"Encoding::ISO_8859_1.names#=> ["ISO-8859-1", "ISO8859-1"]
Ruby methods dealing with encodings return or acceptEncoding instances as arguments (when a methodaccepts anEncoding instance as an argument, itcan be passed anEncoding name or aliasinstead).
"some string".encoding#=> #<Encoding:UTF-8>string ="some string".encode(Encoding::ISO_8859_1)#=> "some string"string.encoding#=> #<Encoding:ISO-8859-1>"some string".encode"ISO-8859-1"#=> "some string"
Encoding::ASCII_8BIT is a special encoding that is usually used for a bytestring, not a character string. But as the name insists, its characters inthe range of ASCII are considered as ASCII characters. This is useful whenyou use ASCII-8BIT characters with other ASCII compatible characters.
The associatedEncoding of aString can be changed in two different ways.
First, it is possible to set theEncoding of astring to a newEncoding without changing theinternal byte representation of the string, withString#force_encoding. Thisis how you can tell Ruby the correct encoding of a string.
string#=> "R\xC3\xA9sum\xC3\xA9"string.encoding#=> #<Encoding:ISO-8859-1>string.force_encoding(Encoding::UTF_8)#=> "R\u00E9sum\u00E9"
Second, it is possible to transcode a string, i.e. translate its internalbyte representation to another encoding. Its associated encoding is alsoset to the other encoding. SeeString#encode for the various formsof transcoding, and theEncoding::Converter class for additionalcontrol over the transcoding process.
string#=> "R\u00E9sum\u00E9"string.encoding#=> #<Encoding:UTF-8>string =string.encode!(Encoding::ISO_8859_1)#=> "R\xE9sum\xE9"string.encoding#=> #<Encoding::ISO-8859-1>
All Ruby script code has an associatedEncodingwhich anyString literal created in the sourcecode will be associated to.
The default script encoding is Encoding::UTF_8 after v2.0, but it can bechanged by a magic comment on the first line of the source code file (orsecond line, if there is a shebang line on the first). The comment mustcontain the wordcoding
orencoding
, followed bya colon, space and theEncoding name or alias:
# encoding: UTF-8"some string".encoding#=> #<Encoding:UTF-8>
The__ENCODING__
keyword returns the script encoding of thefile which the keyword is written:
# encoding: ISO-8859-1__ENCODING__#=> #<Encoding:ISO-8859-1>
ruby -K
will change the default locale encoding, but this isnot recommended. Ruby source files should declare its script encoding by amagic comment even when they only depend on US-ASCII strings or regularexpressions.
The default encoding of the environment. Usually derived from locale.
see::locale_charmap,::find('locale')
The default encoding of strings from the filesystem of the environment.This is used for strings of file names or paths.
see::find('filesystem')
EachIO object has an external encoding whichindicates the encoding that Ruby will use to read its data. By default Rubysets the external encoding of anIO object to thedefault external encoding. The default external encoding is set by localeencoding or the interpreter-E
option.::default_externalreturns the current value of the external encoding.
ENV["LANG"]#=> "UTF-8"Encoding.default_external#=> #<Encoding:UTF-8>$ ruby -E ISO-8859-1 -e "p Encoding.default_external"#<Encoding:ISO-8859-1>$ LANG=C ruby -e 'p Encoding.default_external'#<Encoding:US-ASCII>
The default external encoding may also be set through::default_external=,but you should not do this as strings created before and after the changewill have inconsistent encodings. Instead useruby -E
toinvoke ruby with the correct external encoding.
When you know that the actual encoding of the data of anIO object is not the default external encoding, you canreset its external encoding withIO#set_encoding or set it atIO object creation (seeIO.new options).
To process the data of anIO object which has anencoding different from its external encoding, you can set its internalencoding. Ruby will use this internal encoding to transcode the data whenit is read from theIO object.
Conversely, when data is written to theIO object itis transcoded from the internal encoding to the external encoding of theIO object.
The internal encoding of anIO object can be set withIO#set_encoding or atIO object creation (seeIO.new options).
The internal encoding is optional and when not set, the Ruby defaultinternal encoding is used. If not explicitly set this default internalencoding isnil
meaning that by default, no transcodingoccurs.
The default internal encoding can be set with the interpreter option-E
.::default_internalreturns the current internal encoding.
$ ruby -e 'p Encoding.default_internal'nil$ ruby -E ISO-8859-1:UTF-8 -e "p [Encoding.default_external, \ Encoding.default_internal]"[#<Encoding:ISO-8859-1>, #<Encoding:UTF-8>]
The default internal encoding may also be set through::default_internal=,but you should not do this as strings created before and after the changewill have inconsistent encodings. Instead useruby -E
toinvoke ruby with the correct internal encoding.
In the following example a UTF-8 encoded string “Ru00E9sumu00E9” istranscoded for output to ISO-8859-1 encoding, then read back in andtranscoded to UTF-8:
string ="R\u00E9sum\u00E9"open("transcoded.txt","w:ISO-8859-1")do|io|io.write(string)endputs"raw text:"pFile.binread("transcoded.txt")putsopen("transcoded.txt","r:ISO-8859-1:UTF-8")do|io|puts"transcoded text:"pio.readend
While writing the file, the internal encoding is not specified as it isonly necessary for reading. While reading the file both the internal andexternal encoding must be specified to obtain the correct result.
$ ruby t.rbraw text:"R\xE9sum\xE9"transcoded text:"R\u00E9sum\u00E9"
Returns the hash of available encoding alias and original encoding name.
Encoding.aliases#=> {"BINARY"=>"ASCII-8BIT", "ASCII"=>"US-ASCII", "ANSI_X3.4-1968"=>"US-ASCII", "SJIS"=>"Windows-31J", "eucJP"=>"EUC-JP", "CP932"=>"Windows-31J"}
static VALUErb_enc_aliases(VALUE klass){ VALUE aliases[2]; aliases[0] = rb_hash_new(); aliases[1] = rb_ary_new(); GLOBAL_ENC_TABLE_EVAL(enc_table, st_foreach(enc_table->names, rb_enc_aliases_enc_i, (st_data_t)aliases)); return aliases[0];}
Checks the compatibility of two objects.
If the objects are both strings they are compatible when they areconcatenatable. The encoding of the concatenated string will be returnedif they are compatible, nil if they are not.
Encoding.compatible?("\xa1".force_encoding("iso-8859-1"),"b")#=> #<Encoding:ISO-8859-1>Encoding.compatible?("\xa1".force_encoding("iso-8859-1"),"\xa1\xa1".force_encoding("euc-jp"))#=> nil
If the objects are non-strings their encodings are compatible when theyhave an encoding and:
Either encoding is US-ASCII compatible
One of the encodings is a 7-bit encoding
static VALUEenc_compatible_p(VALUE klass, VALUE str1, VALUE str2){ rb_encoding *enc; if (!enc_capable(str1)) return Qnil; if (!enc_capable(str2)) return Qnil; enc = rb_enc_compatible(str1, str2); if (!enc) return Qnil; return rb_enc_from_encoding(enc);}
Returns default external encoding.
The default external encoding is used by default for strings created fromthe following locations:
CSV
File data read from disk
SDBM
StringIO
Zlib::GzipReader
Zlib::GzipWriter
While strings created from these locations will have this encoding, theencoding may not be valid. Be sure to checkString#valid_encoding?.
File data written to disk will be transcoded to thedefault external encoding when written, if::default_internal isnot nil.
The default external encoding is initialized by the -E option. If -Eisn't set, it is initialized to UTF-8 on Windows and the locale onother operating systems.
static VALUEget_default_external(VALUE klass){ return rb_enc_default_external();}
Sets default external encoding. You should not set::default_external inruby code as strings created before changing the value may have a differentencoding from strings created after the value was changed., instead youshould useruby -E
to invoke ruby with the correctdefault_external.
See::default_external forinformation on how the default external encoding is used.
static VALUEset_default_external(VALUE klass, VALUE encoding){ rb_warning("setting Encoding.default_external"); rb_enc_set_default_external(encoding); return encoding;}
Returns default internal encoding. Strings will be transcoded to thedefault internal encoding in the following places if the default internalencoding is not nil:
CSV
Etc.sysconfdir and Etc.systmpdir
File data read from disk
Strings returned from Readline
Strings returned from SDBM
Values fromENV
Values in ARGV including $PROGRAM_NAME
AdditionallyString#encode andString#encode! use the defaultinternal encoding if no encoding is given.
The script encoding (__ENCODING__), not::default_internal, isused as the encoding of created strings.
::default_internal isinitialized with -E option or nil otherwise.
static VALUEget_default_internal(VALUE klass){ return rb_enc_default_internal();}
Sets default internal encoding or removes default internal encoding whenpassed nil. You should not set::default_internal inruby code as strings created before changing the value may have a differentencoding from strings created after the change. Instead you should useruby -E
to invoke ruby with the correct default_internal.
See::default_internal forinformation on how the default internal encoding is used.
static VALUEset_default_internal(VALUE klass, VALUE encoding){ rb_warning("setting Encoding.default_internal"); rb_enc_set_default_internal(encoding); return encoding;}
Search the encoding with specifiedname.name should be astring.
Encoding.find("US-ASCII")#=> #<Encoding:US-ASCII>
Names which this method accept are encoding names and aliases includingfollowing special aliases
default external encoding
default internal encoding
locale encoding
filesystem encoding
AnArgumentError is raised when noencoding withname. OnlyEncoding.find("internal")
however returns nil whenno encoding named “internal”, in other words, when Ruby has no defaultinternal encoding.
static VALUEenc_find(VALUE klass, VALUE enc){ int idx; if (is_obj_encoding(enc)) return enc; idx = str_to_encindex(enc); if (idx == UNSPECIFIED_ENCODING) return Qnil; return rb_enc_from_encoding_index(idx);}
Returns the list of loaded encodings.
Encoding.list#=> [#<Encoding:ASCII-8BIT>, #<Encoding:UTF-8>,#<Encoding:ISO-2022-JP (dummy)>]Encoding.find("US-ASCII")#=> #<Encoding:US-ASCII>Encoding.list#=> [#<Encoding:ASCII-8BIT>, #<Encoding:UTF-8>,#<Encoding:US-ASCII>, #<Encoding:ISO-2022-JP (dummy)>]
static VALUEenc_list(VALUE klass){ VALUE ary = rb_ary_new2(0); RB_VM_LOCK_ENTER(); { rb_ary_replace(ary, rb_default_encoding_list); rb_ary_concat(ary, rb_additional_encoding_list); } RB_VM_LOCK_LEAVE(); return ary;}
Returns the locale charmap name. It returns nil if no appropriateinformation.
DebianGNU/LinuxLANG=CEncoding.locale_charmap#=> "ANSI_X3.4-1968"LANG=ja_JP.EUC-JPEncoding.locale_charmap#=> "EUC-JP"SunOS5LANG=CEncoding.locale_charmap#=> "646"LANG=jaEncoding.locale_charmap#=> "eucJP"
The result is highly platform dependent. So::find may cause an error. If youneed some encoding object even for unknown locale,::find(“locale”) can be used.
VALUErb_locale_charmap(VALUE klass){#if NO_LOCALE_CHARMAP return rb_usascii_str_new_cstr("US-ASCII");#else return locale_charmap(rb_usascii_str_new_cstr);#endif}
Returns the list of available encoding names.
Encoding.name_list#=> ["US-ASCII", "ASCII-8BIT", "UTF-8", "ISO-8859-1", "Shift_JIS", "EUC-JP", "Windows-31J", "BINARY", "CP932", "eucJP"]
static VALUErb_enc_name_list(VALUE klass){ VALUE ary; GLOBAL_ENC_TABLE_ENTER(enc_table); { ary = rb_ary_new2(enc_table->names->num_entries); st_foreach(enc_table->names, rb_enc_name_list_i, (st_data_t)ary); } GLOBAL_ENC_TABLE_LEAVE(); return ary;}
Returns whether ASCII-compatible or not.
Encoding::UTF_8.ascii_compatible?#=> trueEncoding::UTF_16BE.ascii_compatible?#=> false
static VALUEenc_ascii_compatible_p(VALUE enc){ return rb_enc_asciicompat(must_encoding(enc)) ? Qtrue : Qfalse;}
Returns true for dummy encodings. A dummy encoding is an encoding for whichcharacter handling is not properly implemented. It is used for statefulencodings.
Encoding::ISO_2022_JP.dummy?#=> trueEncoding::UTF_8.dummy?#=> false
static VALUEenc_dummy_p(VALUE enc){ return ENC_DUMMY_P(must_encoding(enc)) ? Qtrue : Qfalse;}
Returns a string which represents the encoding for programmers.
Encoding::UTF_8.inspect#=> "#<Encoding:UTF-8>"Encoding::ISO_2022_JP.inspect#=> "#<Encoding:ISO-2022-JP (dummy)>"
static VALUEenc_inspect(VALUE self){ rb_encoding *enc; if (!is_data_encoding(self)) { not_encoding(self); } if (!(enc = DATA_PTR(self)) || rb_enc_from_index(rb_enc_to_index(enc)) != enc) { rb_raise(rb_eTypeError, "broken Encoding"); } return rb_enc_sprintf(rb_usascii_encoding(), "#<%"PRIsVALUE":%s%s%s>", rb_obj_class(self), rb_enc_name(enc), (ENC_DUMMY_P(enc) ? " (dummy)" : ""), enc_autoload_p(enc) ? " (autoload)" : "");}
Returns the name of the encoding.
Encoding::UTF_8.name#=> "UTF-8"
static VALUEenc_name(VALUE self){ return rb_fstring_cstr(rb_enc_name((rb_encoding*)DATA_PTR(self)));}
Returns the list of name and aliases of the encoding.
Encoding::WINDOWS_31J.names#=> ["Windows-31J", "CP932", "csWindows31J", "SJIS", "PCK"]
static VALUEenc_names(VALUE self){ VALUE args[2]; args[0] = (VALUE)rb_to_encoding_index(self); args[1] = rb_ary_new2(0); GLOBAL_ENC_TABLE_EVAL(enc_table, st_foreach(enc_table->names, enc_names_i, (st_data_t)args)); return args[1];}
Returns a replicated encoding ofenc whose name isname.The new encoding should have the same byte structure ofenc. Ifname is used by another encoding, raiseArgumentError.
static VALUEenc_replicate_m(VALUE encoding, VALUE name){ int idx = rb_enc_replicate(name_for_encoding(&name), rb_to_encoding(encoding)); RB_GC_GUARD(name); return rb_enc_from_encoding_index(idx);}
This page was generated for Ruby 3.0.0
Generated with Ruby-doc Rdoc Generator 0.42.0.