Movatterモバイル変換


[0]ホーム

URL:


Jump to content
WikipediaThe Free Encyclopedia
Search

Base64

From Wikipedia, the free encyclopedia
Encoding for a sequence of byte values using 64 printable characters

Base64 is abinary-to-text encoding that uses 64printable characters to represent each 6-bit segment of a sequence of byte[1] values. As for all binary-to-text encodings, Base64 encoding enablestransmittingbinary data on acommunication channel that only supports text.

When comparing the original data to the resulting encoded data, Base64 encoding increases the size by 33% plus about 4% additional if inserting line breaks for typical line length.

The earliest uses of this encoding were for dial-up communication between systems running the sameoperating system – for example,uuencode forUNIX andBinHex for theTRS-80 (later adapted for theMacintosh) – and could therefore make more assumptions about what characters were safe to use. For instance, uuencode uses uppercase letters, digits, and many punctuation characters, but no lowercase.[2][3][4][5]

Applications

[edit]
Example of an SVG file containing embedded JPEG images encoded in Base64[6]

Notable applications of Base64:

Web pages
Encoding as Base64 is prevalent on theWorld Wide Web[7] where it is often used to embed binary data such as a digital image in text such asHTML andCSS.[8]
E-mail attachment
Base64 is widely used for sendinge-mail attachments, becauseSMTP – in its original form – was designed to transport7-bit ASCII characters only. Encoding an attachment as Base64 before sending, and then decoding when received, assures older SMTP servers correctly transmit messages with attached binary information.
Embed binary data in a text file
For example, to include the data of an image in a script to avoid depending on external files.
Embed binary data in XML
To embed binary data in anXML file, using a syntax similar to<data encoding="base64">...</data> e.g.favicons inFirefox's exportedbookmarks.html.
Embed PDF file
To embed aPDF file in an HTML page.[9]
Embedded elements
Although not part of the official specification for theSVG format, some viewers can interpret Base64 when used for embedded elements, such as raster images inside SVG files.[10]
Prevent delimiter collision
To transmit and store text that might otherwise causedelimiter collision.
LDAP Data Interchange Format
To encode character strings inLDAP Data Interchange Format files.
Data URI scheme
Thedata URI scheme can use Base64 to represent file contents. For instance, background images and fonts can be specified in aCSS stylesheet file asdata: URIs, instead of being supplied in separate files.
Leverage clipboard
To store/transmit relatively small amounts of binary data via a computer's textclipboard functionality, especially in cases where the information doesn't warrant being permanently saved or when information must be quickly sent between a wide variety of different, potentially incompatible programs. An example is the representation of the public keys ofcryptocurrency recipients as Base64 encoded text strings, which can be easily copied and pasted into users'wallet software.
Support human verification
Binary data that must be quickly verified by humans as a safety mechanism, such asfile checksums orkey fingerprints, is often represented in Base64 for easy checking, sometimes with additional formatting, such as separating each group of four characters in the representation of aPGP key fingerprint with a space.
QR code encoding
AQR code, which contains binary data, is sometimes stored as Base64 since it is more likely that a QR code reader accurately decodes text than binary data. Also, some devices more readily save text from a QR code than potentially malicious binary data.

Alphabet

[edit]

The set of characters used to represent the values for each base-64 digit (value from 0 to 63) differs slightly between the variations of Base64. The general strategy is to use printable characters that are common to mostcharacter encodings. This tends to result in data remaining unchanged as it moves through information systems, such as email, that were traditionally not8-bit clean.[5] Typically, an encoding usesAZ,az, and09 for the first 62 values. Many variants use+ and/ for the last two.

PerRFC 4648 §4, the following table lists the characters used for each numeric value. To indicate padding,= is used.

Base64 alphabet
valuecharvaluecharvaluecharvaluechar
0A16Q32g48w
1B17R33h49x
2C18S34i50y
3D19T35j51z
4E20U36k520
5F21V37l531
6G22W38m542
7H23X39n553
8I24Y40o564
9J25Z41p575
10K26a42q586
11L27b43r597
12M28c44s608
13N29d45t619
14O30e46u62+
15P31f47v63/

Note that Base64URL encoding replaces '+' with '-' and '/' with '_' to make the encoded string HTTP-safe and avoid the need for escaping.

Examples

[edit]

To simplify explanation, the example below usesASCII text for input even though this is not a typical use. More commonly, input isbinary data, such as an image, and the result then represents binary data in a printable text format.

For the input data:

Many hands make light work.

The typical Base64 represented is:

TWFueSBoYW5kcyBtYWtlIGxpZ2h0IHdvcmsu

Encoding when no padding needed

[edit]

Each input sequence of 6 bits (which can encode 26 = 64 values) is mapped to a Base64 alphabet letter. Therefore, Base64 encoding results in four characters for each three input bytes. Assuming the input is ASCII or similar, the byte-data for the first three characters 'M', 'a', 'n' are values77,97, and110 which in 8-bit binary representation are01001101,01100001, and01101110. Joining these representations and splitting into 6-bit groups gives:

010011 010110 000101 101110

Which encodes the stringTWFu (per ASCII or similar).

The following table shows how input is encoded. For example, the letter 'M' has the value 77 (per ASCII and similar). The first 6 bits of the value is010011 or 19 decimal which maps to Base64 letter 'T' which has a value 84 (per ASCII and similar).

Encoding 'M', 'a', 'n' as Base64
input
(ASCII)
letter (ASCII)Man
8-bit
decimal value
7797110
bits010011010110000101101110
encoded
(Base64)
6-bit
decimal value
1922546
letter
(Base64 alphabet)
TWFu
byte848770117

Encoding with one padding character

[edit]

If the input consists of a number of bytes that is 2 more than a multiple of 3 (e.g. 'M', 'a'), then the last 2 bytes (16 bits) are encoded in 3 Base64 digits (18 bits). The twoleast significant bits of the last content-bearing 6-bit block are treated as zero for encoding and discarded for decoding (along with the trailing= padding character).

input
(ASCII)
letter (ASCII)Ma
8-bit
decimal value
7797
bits010011010110000100
encoded
(Base64)
6-bit
decimal value
19224Padding
letter
(Base64 alphabet)
TWE=
byte84876961

Encoding with two padding characters

[edit]

If the input consists of a number of bytes that is 1 more than a multiple of 3 (e.g. 'M'), then the last 8 bits are represented in 2 Base64 digits (12 bits). The fourleast significant bits of the last content-bearing 6-bit block are treated as zero for encoding and discarded for decoding (along with the trailing two= padding characters):

input
(ASCII)
letter (ASCII)M
8-bit
decimal value
77
bits010011010000
encoded
(Base64)
6-bit
decimal value
1916PaddingPadding
letter
(Base64 alphabet)
TQ==
byte84816161

Decoding with padding

[edit]

When decoding, each sequence of four encoded characters is converted to three output bytes, but with a single padding character the final 4 characters decode to only two bytes, or with two padding characters, the final 4 characters decode to a single byte. For example:

EncodedPaddingLengthDecoded
bGlnaHQgdw====1lightw
bGlnaHQgd28==2lightwo
bGlnaHQgd29yNone3lightwor

Another way to interpret the padding character is to consider it as an instruction to discard 2 trailing bits from the bit string each time a= is encountered. For example, whenbGlnaHQgdw== is decoded, we convert each character (except the trailing occurrences of=) into their corresponding 6-bit representation, and then discard 2 trailing bits for the first= and another 2 trailing bits for the other=. In this instance, we would get 6 bits from thed, and another 6 bits from thew for a bit string of length 12, but since we remove 2 bits for each= (for a total of 4 bits), thedw== ends up producing 8 bits (1 byte) when decoded.

Decoding without padding

[edit]

Use of the padding character in encoded text isnot essential for decoding. The number of missing bytes can be inferred from the length of the encoded text. In some variants, the padding character is mandatory, while for others it is not used. Notably, whenconcatenating Base64 encoded strings, then use of padding characters is required.

Without padding, after decoding each sequence of 4 encoded characters, there may be 2 or 3 encoded characters left over. A single remaining encoded character is not possible because a single Base64 character only contains 6 bits, and 8 bits are required to create a byte. The first character contributes 6 bits, and the second character contributes its first 2 bits. The following table demonstrates decoding encoded strings that have 2, 3 or no left-over characters.

EncodedLength
of last group
DecodedDecoded length
of last group
bGlnaHQgdw2lightw1
bGlnaHQgd283lightwo2
bGlnaHQgd29y4lightwor3

Decoding without padding is not performed consistently among decoders[clarification needed]. In addition, allowing padless decoding by definition allows multiple strings to decode into the same set of bytes[clarification needed], which can be a security risk.[11]

Variants

[edit]

Variations of Base64 differ in the alphabet used and structural aspects like maximum line length. The most commonly used alphabet is that described by RFC 4648 and most variations only differ in the last two letters used. The following table describes more commonly used encodings that are specified by anRFC.

Encoding[12]SpecificationAlphabetLines
62nd63rdpadSeparatorsLengthChecksum
Base 64 EncodingRFC 4648 §4+/=NoNo
Base 64 Encoding with URL and Filename Safe AlphabetRFC 4648 §5-_=
optional
NoNo
forMIMERFC 2045+/=Yes76No
forPrivacy-Enhanced Mail (deprecated)RFC 1421+/=Yes64Yes, in PEM CRC
forUTF-7RFC 2152+/NoNo
for IMAP mailbox namesRFC 3501+,NoNo
Textual Encodings of PKIX, PKCS, and CMS StructuresRFC 7468+/=Yes64No
ASCII armor forOpenPGPRFC 9580+/=Yes76Yes, (CRC24)

RFC 4648

[edit]

RFC 4648 describes a various encodings including Base64, and it discusses the use of line feeds in encoded data, the use of padding in encoded data, the use of non-alphabet characters in encoded data, use of different encoding alphabets, and canonical encodings. The variant that it callsBase 64 Encoding andbase64 is intended for general-use.

The RFC also specifies a second Base64 encoding that is callsBase 64 Encoding with URL and Filename Safe Alphabet that is intended for representing relatively long identifying information. For example, a database persistence framework forJava objects might use Base64 encoding to encode a relatively large unique id (generally 128-bitUUIDs) as a string for use as an HTTP parameter in an HTTP form or an HTTP GETURL. Also, manyapplications need to encode binary data in a way that is convenient for inclusion in a URL, including in hidden web form fields, and Base64 is a convenient encoding to render them in a compact way.

Using standard Base64 in aURL requires encoding the+,/ and= characters as specialpercent-encoded hexadecimal sequences (+ becomes%2B,/ becomes%2F and= becomes%3D), which makes the string longer and harder to read. Using a different alphabet allows for encoding as Base64 without requiring this extra markup. Typically,+ and/ are replaced by- and_, respectively, so that using URL encoders/decoders is no longer necessary and has no effect on the length of the encoded value, leaving the same encoded form intact for use in relational databases, web forms, and object identifiers in general. A popular site to make use of such isYouTube.[13] Some variants allow or require omitting the padding= signs to avoid them being confused with field separators, or require that any such padding be percent-encoded. Some libraries[which?] encode= as., potentially exposing applications torelative path attacks when a folder name is encoded from user data.[citation needed]

RFC 3548

[edit]

RFC 3548, entitledThe Base16, Base32, and Base64 Data Encodings, is an informational (non-normative) memo that attempts to unify theRFC 1421 andRFC 2045 specifications of Base64 encodings, alternative-alphabet encodings, and the Base32 (which is seldom used) and Base16 encodings. RFC 4648 obsoletes RFC 3548.

Unless an encoder is written to a specification that refers toRFC 3548 and specifically requires otherwise[clarification needed], RFC 3548 forbids an encoder from generating messages containing characters outside the encoding alphabet or without padding, and it also declares that a decoder must reject data that contain characters other than the encoding alphabet.[4]

MIME

[edit]

TheMIME (Multipurpose Internet Mail Extensions) specification lists Base64 as one of twobinary-to-text encoding schemes (the other beingquoted-printable).[3] MIME's Base64 encoding is based on that of theRFC 1421 version of PEM: it uses the same 64-character alphabet and encoding mechanism as PEM and uses the= symbol for output padding in the same way, as described atRFC 2045.

MIME does not specify a fixed length for Base64-encoded lines, but it does specify a maximum line length of 76 characters. Additionally, it specifies that any character outside the standard set of 64 encoding characters (for example CRLF sequences), must be ignored by a compliant decoder, although most implementations use a CR/LFnewline pair to delimit encoded lines.

Thus, the actual length of MIME-compliant Base64-encoded binary data is usually about 137% of the original data length (43×7876), though for very short messages the overhead can be much higher due to the overhead of the headers. Very roughly, the final size of Base64-encoded binary data is equal to 1.37 times the original data size + 814 bytes (for headers). The size of the decoded data can be approximated with this formula:

bytes = (string_length(encoded_string) − 814) / 1.37

Privacy-enhanced mail

[edit]

The first known standardized use of the encoding now called MIME Base64 was in thePrivacy-Enhanced Mail (PEM) protocol, proposed byRFC 989 in 1987. PEM defines a "printable encoding" scheme that uses Base64 encoding to transform an arbitrary sequence of bytes to a format that can be expressed in short lines of 6-bit characters, as required by transfer protocols such asSMTP.[14]

The current version of PEM (specified inRFC 1421) uses a 64-character alphabet consisting of upper- and lower-caseRoman letters (AZ,az), the numerals (09), and the+ and/ symbols. The= symbol is also used as a padding suffix.[2] The original specification,RFC 989, additionally used the* symbol to delimit encoded but unencrypted data within the output stream.

To convert data to PEM printable encoding, the first byte is placed in themost significant eight bits of a 24-bitbuffer, the next in the middle eight, and the third in theleast significant eight bits. If there are fewer than three bytes left to encode (or in total), the remaining buffer bits will be zero. The buffer is then used, six bits at a time, most significant first, as indices into the string: "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/", and the indicated character is output.

The process is repeated on the remaining data until fewer than four bytes remain. If three bytes remain, they are processed normally. If fewer than three bytes (24 bits) are remaining to encode, the input data is right-padded with zero bits to form an integral multiple of six bits.

After encoding the non-padded data, if two bytes of the 24-bit buffer are padded-zeros, two= characters are appended to the output; if one byte of the 24-bit buffer is filled with padded-zeros, one= character is appended. This signals the decoder that the zero bits added due to padding should be excluded from the reconstructed data. This also guarantees that the encoded output length is a multiple of 4 bytes.

PEM requires that all encoded lines consist of exactly 64 printable characters, with the exception of the last line, which may contain fewer printable characters. Lines are delimited by whitespace characters according to local (platform-specific) conventions.

UTF-7

[edit]

UTF-7, described first inRFC 1642, which was later superseded byRFC 2152, introduced a system calledmodified Base64. This data encoding scheme is used to encodeUTF-16 asASCII characters for use in 7-bit transports such asSMTP. It is a variant of the Base64 encoding used in MIME.[15][16]

The "Modified Base64" alphabet consists of the MIME Base64 alphabet, but does not use the "=" padding character. UTF-7 is intended for use in mail headers (defined inRFC 2047), and the "=" character is reserved in that context as the escape character for "quoted-printable" encoding. Modified Base64 simply omits the padding and ends immediately after the last Base64 digit containing useful bits leaving up to three unused bits in the last Base64 digit.

OpenPGP

[edit]
Further information:Pretty Good Privacy § OpenPGP

OpenPGP, described inRFC 9580, specifies "ASCII armor", which is identical to the "Base64" encoding described by MIME, with the addition of an optional 24-bitCRC. Thechecksum is calculated on the input data before encoding; the checksum is then encoded with the same Base64 algorithm and, prefixed by the "=" symbol as the separator, appended to the encoded output data.[17]

Javascript (DOM Web API)

[edit]

Theatob() andbtoa() JavaScript methods, defined in the HTML5 draft specification,[18][19] provide Base64 encoding and decoding functionality to web pages. Thebtoa() method outputs padding characters, but these are optional in the input of theatob() method.
Example: Encoding of the beginning of a GIF file:btoa("GIF89a")"R0lGODlh".

With atypical alphabet order

[edit]

Several variants use alphabets similar to the common variants, but in a different order.

Unix password
Unix stores password hashes computed withcrypt in the/etc/passwd file using an encoding calledB64. crypt's alphabet puts the punctuation. and/ before the alphanumeric characters. crypt uses the alphabet "./0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz" without padding. An advantage over RFC 4648 is that sorting encoded ASCII data results in the same order as sorting the plain ASCII data.
GEDCOM
TheGEDCOM 5.5 standard for genealogical data interchange encodes multimedia files in its text-line hierarchical file format. GEDCOM uses the same alphabet as crypt, which is"./0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz".[20]
bcrypt
bcrypt hashes are designed to be used in the same way as traditional crypt(3) hashes, but bcrypt's alphabet is in a different order than crypt's. bcrypt uses the alphabet"./ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789".[21]
Xxencoding
Xxencoding uses a mostly-alphanumeric character set similar to crypt, but using+ and- rather than. and/. Xxencoding uses the alphabet"+-0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz".
6PACK
Used with someterminal node controllers, uses an alphabet from 0x00 to 0x3f.[22]
Bash
Bash supports numeric literals in Base64. Bash uses the alphabet"0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ@_".[23]

With atypical alphabet

[edit]

Some variants use a Base64 alphabet that is significantly different from the alphabets used in the most common Base64 variants (like RFC 4648).

Uuencoding
TheUuencoding alphabet includes no lowercase characters, instead using ASCII codes 32 (" " (space)) through 95 ("_"), consecutively. Uuencoding uses the alphabet" !"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_". Avoiding all lower-case letters was helpful, because many older printers only printed uppercase. Using consecutive ASCII characters saved computing power, because it was only necessary to add 32, without requiring a lookup table. Its use of most punctuation characters and the space character may limit its usefulness in some applications, such as those that use these characters as syntax.[citation needed]
BinHex
BinHex 4 (HQX), which was used within theclassic Mac OS, excludes some visually confusable characters like '7', 'O', 'g' and 'o'. Its alphabet includes additional punctuation characters. It uses the alphabet"!"#$%&'()*+,-012345689@ABCDEFGHIJKLMNPQRSTUVXYZ[`abcdefhijklmpqr".
UTF-8
AUTF-8 environment can use non-synchronized continuation bytes as base64:0b10xxxxxx. SeeUTF-8#Self-synchronization.

See also

[edit]
  • 8BITMIME – 8-bit data transmission for SMTP
  • Ascii85 – Encoding for a sequence of byte values using 85 printable characters
  • Base16 – Encoding for a sequence of byte values using hexadecimal
  • Base32 – Encoding for a sequence of byte values using 32 printable characters
  • Base36 – Encoding for a sequence of byte values using 36 printable characters
  • Base62 – Encoding for a sequence of byte values using 62 printable characters
  • Binary number – Number expressed in the base-2 numeral system

References

[edit]
  1. ^technicallyoctet
  2. ^abPrivacy Enhancement for InternetElectronic Mail: Part I: Message Encryption and Authentication Procedures.IETF. February 1993.doi:10.17487/RFC1421.RFC1421. RetrievedMarch 18, 2010.
  3. ^abMultipurpose Internet Mail Extensions: (MIME) Part One: Format of Internet Message Bodies.IETF. November 1996.doi:10.17487/RFC2045.RFC2045. RetrievedMarch 18, 2010.
  4. ^abThe Base16, Base32, and Base64 Data Encodings.IETF. July 2003.doi:10.17487/RFC3548.RFC3548. RetrievedMarch 18, 2010.
  5. ^abThe Base16, Base32, and Base64 Data Encodings.IETF. October 2006.doi:10.17487/RFC4648.RFC4648. RetrievedMarch 18, 2010.
  6. ^<image xlink:href=" contents encoded in Base64" ... />
  7. ^"Base64 encoding and decoding – Web APIs". MDN Web Docs.Archived from the original on 2014-11-11.
  8. ^"When to base64 encode images (and when not to)". 28 August 2011.Archived from the original on 2023-08-29.
  9. ^"Encode PDF (Portable Document Format) File (.pdf) to Base64 Data".base64.online. Retrieved2024-03-21.
  10. ^"Edit fiddle".jsfiddle.net.
  11. ^Chalkias, Konstantinos; Chatzigiannis, Panagiotis (30 May 2022).Base64 Malleability in Practice(PDF). ASIA CCS '22: 2022 ACM on Asia Conference on Computer and Communications Security. pp. 1219–1221.doi:10.1145/3488932.3527284.
  12. ^Some specifications describe a Base64 encoding without naming it. This column identifies Base64 encodings in a descriptive way if no particular name is specified.
  13. ^"Here's Why YouTube Will Practically Never Run Out of Unique Video IDs".www.mentalfloss.com. 23 March 2016. Retrieved27 December 2021.
  14. ^Privacy Enhancement for Internet Electronic Mail.IETF. February 1987.doi:10.17487/RFC0989.RFC989. RetrievedMarch 18, 2010.
  15. ^UTF-7 A Mail-Safe Transformation Format of Unicode.IETF. July 1994.doi:10.17487/RFC1642.RFC1642. RetrievedMarch 18, 2010.
  16. ^UTF-7 A Mail-Safe Transformation Format of Unicode.IETF. May 1997.doi:10.17487/RFC2152.RFC2152. RetrievedMarch 18, 2010.
  17. ^OpenPGP Message Format.IETF. July 2024.doi:10.17487/RFC9580.RFC9580. RetrievedFebruary 13, 2025.
  18. ^"7.3. Base64 utility methods".HTML 5.2 Editor's Draft.World Wide Web Consortium. Retrieved2 January 2018. Introduced bychangeset 5814Archived 2014-02-22 at theWayback Machine, 2021-02-01.
  19. ^"Window: btoa() method". 24 June 2025. Retrieved2025-07-31.
  20. ^"The GEDCOM Standard Release 5.5". Homepages.rootsweb.ancestry.com. Retrieved2012-06-21.
  21. ^Provos, Niels (1997-02-13)."src/lib/libc/crypt/bcrypt.c r1.1". Retrieved2018-05-18.
  22. ^"6PACK a "real time" PC to TNC protocol". Archived fromthe original on 2012-02-24. Retrieved2013-05-19.
  23. ^"Shell Arithmetic".Bash Reference Manual. Retrieved8 April 2020.Otherwise, numbers take the form [base#]n, where the optional base is a decimal number between 2 and 64 representing the arithmetic base, and n is a number in that base.
Human
readable
Binary
Retrieved from "https://en.wikipedia.org/w/index.php?title=Base64&oldid=1337772386"
Categories:
Hidden categories:

[8]ページ先頭

©2009-2026 Movatter.jp