email.parser
:剖析電子郵件訊息¶
Message object structures can be created in one of two ways: they can becreated from whole cloth by creating anEmailMessage
object, adding headers using the dictionary interface, and adding payload(s)usingset_content()
and related methods, orthey can be created by parsing a serialized representation of the emailmessage.
Theemail
package provides a standard parser that understands most emaildocument structures, including MIME documents. You can pass the parser abytes, string or file object, and the parser will return to you the rootEmailMessage
instance of the object structure. Forsimple, non-MIME messages the payload of this root object will likely be astring containing the text of the message. For MIME messages, the root objectwill returnTrue
from itsis_multipart()
method, and the subparts can be accessed via the payload manipulation methods,such asget_body()
,iter_parts()
, andwalk()
.
There are actually two parser interfaces available for use, theParser
API and the incrementalFeedParser
API. TheParser
API ismost useful if you have the entire text of the message in memory, or if theentire message lives in a file on the file system.FeedParser
is moreappropriate when you are reading the message from a stream which might blockwaiting for more input (such as reading an email message from a socket). TheFeedParser
can consume and parse the message incrementally, and onlyreturns the root object when you close the parser.
Note that the parser can be extended in limited ways, and of course you canimplement your own parser completely from scratch. All of the logic thatconnects theemail
package's bundled parser and theEmailMessage
class is embodied in thePolicy
class, so a custom parser can create message object trees any way it findsnecessary by implementing custom versions of the appropriatePolicy
methods.
FeedParser API¶
TheBytesFeedParser
, imported from theemail.feedparser
module,provides an API that is conducive to incremental parsing of email messages,such as would be necessary when reading the text of an email message from asource that can block (such as a socket). TheBytesFeedParser
can ofcourse be used to parse an email message fully contained in abytes-likeobject, string, or file, but theBytesParser
API may be moreconvenient for such use cases. The semantics and results of the two parserAPIs are identical.
TheBytesFeedParser
's API is simple; you create an instance, feed it abunch of bytes until there's no more to feed it, then close the parser toretrieve the root message object. TheBytesFeedParser
is extremelyaccurate when parsing standards-compliant messages, and it does a very good jobof parsing non-compliant messages, providing information about how a messagewas deemed broken. It will populate a message object'sdefects
attribute with a list of anyproblems it found in a message. See theemail.errors
module for thelist of defects that it can find.
Here is the API for theBytesFeedParser
:
- classemail.parser.BytesFeedParser(_factory=None,*,policy=policy.compat32)¶
Create a
BytesFeedParser
instance. Optional_factory is ano-argument callable; if not specified use themessage_factory
from thepolicy. Call_factory whenever a new message object is needed.Ifpolicy is specified use the rules it specifies to update therepresentation of the message. Ifpolicy is not set, use the
compat32
policy, which maintains backwardcompatibility with the Python 3.2 version of the email package and providesMessage
as the default factory. All other policiesprovideEmailMessage
as the default_factory. Formore information on what elsepolicy controls, see thepolicy
documentation.Note:The policy keyword should always be specified; The default willchange to
email.policy.default
in a future version of Python.在 3.2 版被加入.
在 3.3 版的變更:新增policy 關鍵字。
在 3.6 版的變更:_factory defaults to the policy
message_factory
.- feed(data)¶
Feed the parser some more data.data should be abytes-likeobject containing one or more lines. The lines can be partial and theparser will stitch such partial lines together properly. The lines canhave any of the three common line endings: carriage return, newline, orcarriage return and newline (they can even be mixed).
- classemail.parser.FeedParser(_factory=None,*,policy=policy.compat32)¶
Works like
BytesFeedParser
except that the input to thefeed()
method must be a string. This is of limitedutility, since the only way for such a message to be valid is for it tocontain only ASCII text or, ifutf8
isTrue
, no binary attachments.在 3.3 版的變更:新增policy 關鍵字。
Parser API¶
TheBytesParser
class, imported from theemail.parser
module,provides an API that can be used to parse a message when the complete contentsof the message are available in abytes-like object or file. Theemail.parser
module also providesParser
for parsing strings,and header-only parsers,BytesHeaderParser
andHeaderParser
, which can be used if you're only interested in theheaders of the message.BytesHeaderParser
andHeaderParser
can be much faster in these situations, since they do not attempt to parse themessage body, instead setting the payload to the raw body.
- classemail.parser.BytesParser(_class=None,*,policy=policy.compat32)¶
Create a
BytesParser
instance. The_class andpolicyarguments have the same meaning and semantics as the_factoryandpolicy arguments ofBytesFeedParser
.Note:The policy keyword should always be specified; The default willchange to
email.policy.default
in a future version of Python.在 3.3 版的變更:Removed thestrict argument that was deprecated in 2.4. Added thepolicy keyword.
在 3.6 版的變更:_class defaults to the policy
message_factory
.- parse(fp,headersonly=False)¶
Read all the data from the binary file-like objectfp, parse theresulting bytes, and return the message object.fp must supportboth the
readline()
and theread()
methods.The bytes contained infp must be formatted as a block ofRFC 5322(or, if
utf8
isTrue
,RFC 6532)style headers and header continuation lines, optionally preceded by anenvelope header. The header block is terminated either by the end of thedata or by a blank line. Following the header block is the body of themessage (which may contain MIME-encoded subparts, including subpartswith aContent-Transfer-Encoding of8bit
).Optionalheadersonly is a flag specifying whether to stop parsing afterreading the headers or not. The default is
False
, meaning it parsesthe entire contents of the file.
- parsebytes(bytes,headersonly=False)¶
Similar to the
parse()
method, except it takes abytes-likeobject instead of a file-like object. Calling this method on abytes-like object is equivalent to wrappingbytes in aBytesIO
instance first and callingparse()
.Optionalheadersonly is as with the
parse()
method.
在 3.2 版被加入.
- classemail.parser.BytesHeaderParser(_class=None,*,policy=policy.compat32)¶
Exactly like
BytesParser
, except thatheadersonlydefaults toTrue
.在 3.3 版被加入.
- classemail.parser.Parser(_class=None,*,policy=policy.compat32)¶
This class is parallel to
BytesParser
, but handles string input.在 3.3 版的變更:Removed thestrict argument. Added thepolicy keyword.
在 3.6 版的變更:_class defaults to the policy
message_factory
.- parse(fp,headersonly=False)¶
Read all the data from the text-mode file-like objectfp, parse theresulting text, and return the root message object.fp must supportboth the
readline()
and theread()
methods on file-like objects.Other than the text mode requirement, this method operates like
BytesParser.parse()
.
- classemail.parser.HeaderParser(_class=None,*,policy=policy.compat32)¶
Exactly like
Parser
, except thatheadersonlydefaults toTrue
.
Since creating a message object structure from a string or a file object is sucha common task, four functions are provided as a convenience. They are availablein the top-levelemail
package namespace.
- email.message_from_bytes(s,_class=None,*,policy=policy.compat32)¶
Return a message object structure from abytes-like object. This isequivalent to
BytesParser().parsebytes(s)
. Optional_class andpolicy are interpreted as with theBytesParser
classconstructor.在 3.2 版被加入.
在 3.3 版的變更:Removed thestrict argument. Added thepolicy keyword.
- email.message_from_binary_file(fp,_class=None,*,policy=policy.compat32)¶
Return a message object structure tree from an open binaryfileobject. This is equivalent to
BytesParser().parse(fp)
._class andpolicy are interpreted as with theBytesParser
classconstructor.在 3.2 版被加入.
在 3.3 版的變更:Removed thestrict argument. Added thepolicy keyword.
- email.message_from_string(s,_class=None,*,policy=policy.compat32)¶
Return a message object structure from a string. This is equivalent to
Parser().parsestr(s)
._class andpolicy are interpreted aswith theParser
class constructor.在 3.3 版的變更:Removed thestrict argument. Added thepolicy keyword.
- email.message_from_file(fp,_class=None,*,policy=policy.compat32)¶
Return a message object structure tree from an openfile object.This is equivalent to
Parser().parse(fp)
._class andpolicy areinterpreted as with theParser
class constructor.在 3.3 版的變更:Removed thestrict argument. Added thepolicy keyword.
在 3.6 版的變更:_class defaults to the policy
message_factory
.
Here's an example of how you might usemessage_from_bytes()
at aninteractive Python prompt:
>>>importemail>>>msg=email.message_from_bytes(myBytes)
Additional notes¶
Here are some notes on the parsing semantics:
Most non-multipart type messages are parsed as a single messageobject with a string payload. These objects will return
False
foris_multipart()
, anditer_parts()
will yield an empty list.Allmultipart type messages will be parsed as a container messageobject with a list of sub-message objects for their payload. The outercontainer message will return
True
foris_multipart()
, anditer_parts()
will yield a list of subparts.Most messages with a content type ofmessage/* (such asmessage/delivery-status andmessage/rfc822) will alsobe parsed as container object containing a list payload of length 1. Their
is_multipart()
method will returnTrue
.The single element yielded byiter_parts()
will be a sub-message object.Some non-standards-compliant messages may not be internally consistent abouttheirmultipart-edness. Such messages may have aContent-Type header of typemultipart, but their
is_multipart()
method may returnFalse
.If such messages were parsed with theFeedParser
,they will have an instance of theMultipartInvariantViolationDefect
class in theirdefects attribute list. Seeemail.errors
for details.