email.header:國際化標頭

原始碼:Lib/email/header.py


This module is part of the legacy (Compat32) email API. In the current APIencoding and decoding of headers is handled transparently by thedictionary-like API of theEmailMessage class. Inaddition to uses in legacy code, this module can be useful in applications thatneed to completely control the character sets used when encoding headers.

The remaining text in this section is the original documentation of the module.

RFC 2822 is the base standard that describes the format of email messages.It derives from the olderRFC 822 standard which came into widespread use ata time when most email was composed of ASCII characters only.RFC 2822 is aspecification written assuming email contains only 7-bit ASCII characters.

Of course, as email has been deployed worldwide, it has becomeinternationalized, such that language specific character sets can now be used inemail messages. The base standard still requires email messages to betransferred using only 7-bit ASCII characters, so a slew of RFCs have beenwritten describing how to encode email containing non-ASCII characters intoRFC 2822-compliant format. These RFCs includeRFC 2045,RFC 2046,RFC 2047, andRFC 2231. Theemail package supports these standardsin itsemail.header andemail.charset modules.

If you want to include non-ASCII characters in your email headers, say in theSubject orTo fields, you should use theHeader class and assign the field in theMessageobject to an instance ofHeader instead of using a string for the headervalue. Import theHeader class from theemail.header module.For example:

>>>fromemail.messageimportMessage>>>fromemail.headerimportHeader>>>msg=Message()>>>h=Header('p\xf6stal','iso-8859-1')>>>msg['Subject']=h>>>msg.as_string()'Subject: =?iso-8859-1?q?p=F6stal?=\n\n'

Notice here how we wanted theSubject field to contain a non-ASCIIcharacter? We did this by creating aHeader instance and passing inthe character set that the byte string was encoded in. When the subsequentMessage instance was flattened, theSubjectfield was properlyRFC 2047 encoded. MIME-aware mail readers would show thisheader using the embedded ISO-8859-1 character.

Here is theHeader class description:

classemail.header.Header(s=None,charset=None,maxlinelen=None,header_name=None,continuation_ws='',errors='strict')

Create a MIME-compliant header that can contain strings in different charactersets.

Optionals is the initial header value. IfNone (the default), theinitial header value is not set. You can later append to the header withappend() method calls.s may be an instance ofbytes orstr, but see theappend() documentation for semantics.

Optionalcharset serves two purposes: it has the same meaning as thecharsetargument to theappend() method. It also sets the default character setfor all subsequentappend() calls that omit thecharset argument. Ifcharset is not provided in the constructor (the default), theus-asciicharacter set is used both ass's initial charset and as the default forsubsequentappend() calls.

The maximum line length can be specified explicitly viamaxlinelen. Forsplitting the first line to a shorter value (to account for the field headerwhich isn't included ins, e.g.Subject) pass in the name of thefield inheader_name. The defaultmaxlinelen is 78, and the default valueforheader_name isNone, meaning it is not taken into account for thefirst line of a long, split header.

Optionalcontinuation_ws must beRFC 2822-compliant foldingwhitespace, and is usually either a space or a hard tab character. Thischaracter will be prepended to continuation lines.continuation_wsdefaults to a single space character.

Optionalerrors is passed straight through to theappend() method.

append(s,charset=None,errors='strict')

Append the strings to the MIME header.

Optionalcharset, if given, should be aCharsetinstance (seeemail.charset) or the name of a character set, whichwill be converted to aCharset instance. A valueofNone (the default) means that thecharset given in the constructoris used.

s may be an instance ofbytes orstr. If it is aninstance ofbytes, thencharset is the encoding of that bytestring, and aUnicodeError will be raised if the string cannot bedecoded with that character set.

Ifs is an instance ofstr, thencharset is a hint specifyingthe character set of the characters in the string.

In either case, when producing anRFC 2822-compliant header usingRFC 2047 rules, the string will be encoded using the output codec ofthe charset. If the string cannot be encoded using the output codec, aUnicodeError will be raised.

Optionalerrors is passed as the errors argument to the decode callifs is a byte string.

encode(splitchars=';,\t',maxlinelen=None,linesep='\n')

Encode a message header into an RFC-compliant format, possibly wrappinglong lines and encapsulating non-ASCII parts in base64 or quoted-printableencodings.

Optionalsplitchars is a string containing characters which should begiven extra weight by the splitting algorithm during normal headerwrapping. This is in very rough support ofRFC 2822's 'higher levelsyntactic breaks': split points preceded by a splitchar are preferredduring line splitting, with the characters preferred in the order inwhich they appear in the string. Space and tab may be included in thestring to indicate whether preference should be given to one over theother as a split point when other split chars do not appear in the linebeing split. Splitchars does not affectRFC 2047 encoded lines.

maxlinelen, if given, overrides the instance's value for the maximumline length.

linesep specifies the characters used to separate the lines of thefolded header. It defaults to the most useful value for Pythonapplication code (\n), but\r\n can be specified in orderto produce headers with RFC-compliant line separators.

在 3.2 版的變更:新增引數linesep

TheHeader class also provides a number of methods to supportstandard operators and built-in functions.

__str__()

Returns an approximation of theHeader as a string, using anunlimited line length. All pieces are converted to unicode using thespecified encoding and joined together appropriately. Any pieces with acharset of'unknown-8bit' are decoded as ASCII using the'replace'error handler.

在 3.2 版的變更:Added handling for the'unknown-8bit' charset.

__eq__(other)

This method allows you to compare twoHeader instances forequality.

__ne__(other)

This method allows you to compare twoHeader instances forinequality.

Theemail.header module also provides the following convenient functions.

email.header.decode_header(header)

Decode a message header value without converting the character set. The headervalue is inheader.

This function returns a list of(decoded_string,charset) pairs containingeach of the decoded parts of the header.charset isNone for non-encodedparts of the header, otherwise a lower case string containing the name of thecharacter set specified in the encoded string.

以下是個範例:

>>>fromemail.headerimportdecode_header>>>decode_header('=?iso-8859-1?q?p=F6stal?=')[(b'p\xf6stal', 'iso-8859-1')]
email.header.make_header(decoded_seq,maxlinelen=None,header_name=None,continuation_ws='')

Create aHeader instance from a sequence of pairs as returned bydecode_header().

decode_header() takes a header value string and returns a sequence ofpairs of the format(decoded_string,charset) wherecharset is the name ofthe character set.

This function takes one of those sequence of pairs and returns aHeader instance. Optionalmaxlinelen,header_name, andcontinuation_ws are as in theHeader constructor.