Movatterモバイル変換


[0]ホーム

URL:


Navigation

19.1.2.email.parser: Parsing email messages

Message object structures can be created in one of two ways: they can be createdfrom whole cloth by instantiatingMessage objects andstringing them together viaattach() andset_payload() calls, or theycan be created by parsing a flat text representation of the email message.

Theemail package provides a standard parser that understands most emaildocument structures, including MIME documents. You can pass the parser a stringor a file object, and the parser will return to you the rootMessage instance of the object structure. For simple,non-MIME messages the payload of this root object will likely be a stringcontaining the text of the message. For MIME messages, the root object willreturnTrue from itsis_multipart() method, andthe subparts can be accessed via theget_payload()andwalk() methods.

There are actually two parser interfaces available for use, the classicParser API and the incrementalFeedParser API. The classicParser API is fine if you have the entire text of the message in memoryas a string, or if the entire message lives in a file on the file system.FeedParser is more appropriate for when you’re reading the message froma stream which might block waiting for more input (e.g. reading an email messagefrom a socket). TheFeedParser can consume and parse the messageincrementally, and only returns the root object when you close the parser[1].

Note that the parser can be extended in limited ways, and of course you canimplement your own parser completely from scratch. There is no magicalconnection between theemail package’s bundled parser and theMessage class, so your custom parser can create messageobject trees any way it finds necessary.

19.1.2.1. FeedParser API

TheFeedParser, imported from theemail.feedparser module,provides an API that is conducive to incremental parsing of email messages, suchas would be necessary when reading the text of an email message from a sourcethat can block (e.g. a socket). TheFeedParser can of course be usedto parse an email message fully contained in a string or a file, but the classicParser API may be more convenient for such use cases. The semanticsand results of the two parser APIs are identical.

TheFeedParser‘s API is simple; you create an instance, feed it a bunchof text until there’s no more to feed it, then close the parser to retrieve theroot message object. TheFeedParser is extremely accurate when parsingstandards-compliant messages, and it does a very good job of parsingnon-compliant messages, providing information about how a message was deemedbroken. It will populate a message object’sdefects attribute with a list ofany problems it found in a message. See theemail.errors module for thelist of defects that it can find.

Here is the API for theFeedParser:

classemail.parser.FeedParser(_factory=email.message.Message,*,policy=policy.default)

Create aFeedParser instance. Optional_factory is a no-argumentcallable that will be called whenever a new message object is needed. Itdefaults to theemail.message.Message class.

Thepolicy keyword specifies apolicy object that controls anumber of aspects of the parser’s operation. The default policy maintainsbackward compatibility.

Changed in version 3.3:Added thepolicy keyword.

feed(data)

Feed theFeedParser some more data.data should be a stringcontaining one or more lines. The lines can be partial and theFeedParser will stitch such partial lines together properly. Thelines in the string can have any of the common three line endings,carriage return, newline, or carriage return and newline (they can even bemixed).

close()

Closing aFeedParser completes the parsing of all previously feddata, and returns the root message object. It is undefined what happensif you feed more data to a closedFeedParser.

classemail.parser.BytesFeedParser(_factory=email.message.Message)

Works exactly likeFeedParser except that the input to thefeed() method must be bytes and not string.

New in version 3.2.

19.1.2.2. Parser class API

TheParser class, imported from theemail.parser module,provides an API that can be used to parse a message when the complete contentsof the message are available in a string or file. Theemail.parsermodule also provides header-only parsers, calledHeaderParser andBytesHeaderParser, which can be used if you’re only interested in theheaders of the message.HeaderParser andBytesHeaderParsercan be much faster in these situations, since they do not attempt to parse themessage body, instead setting the payload to the raw body as a string. Theyhave the same API as theParser andBytesParser classes.

New in version 3.3:The BytesHeaderParser class.

classemail.parser.Parser(_class=email.message.Message,*,policy=policy.default)

The constructor for theParser class takes an optional argument_class. This must be a callable factory (such as a function or a class), andit is used whenever a sub-message object needs to be created. It defaults toMessage (seeemail.message). The factory willbe called without arguments.

Thepolicy keyword specifies apolicy object that controls anumber of aspects of the parser’s operation. The default policy maintainsbackward compatibility.

Changed in version 3.3:Removed thestrict argument that was deprecated in 2.4. Added thepolicy keyword.

The other publicParser methods are:

parse(fp,headersonly=False)

Read all the data from the file-like objectfp, parse the resultingtext, and return the root message object.fp must support both thereadline() and theread()methods on file-like objects.

The text contained infp must be formatted as a block ofRFC 2822style headers and header continuation lines, optionally preceded by aenvelope header. The header block is terminated either by the end of thedata or by a blank line. Following the header block is the body of themessage (which may contain MIME-encoded subparts).

Optionalheadersonly is a flag specifying whether to stop parsing afterreading the headers or not. The default isFalse, meaning it parsesthe entire contents of the file.

parsestr(text,headersonly=False)

Similar to theparse() method, except it takes a string objectinstead of a file-like object. Calling this method on a string is exactlyequivalent to wrappingtext in aStringIO instance first andcallingparse().

Optionalheadersonly is as with theparse() method.

classemail.parser.BytesParser(_class=email.message.Message,*,policy=policy.default)

This class is exactly parallel toParser, but handles bytes input.The_class andstrict arguments are interpreted in the same way as fortheParser constructor.

Thepolicy keyword specifies apolicy object thatcontrols a number of aspects of the parser’s operation. The defaultpolicy maintains backward compatibility.

Changed in version 3.3:Removed thestrict argument. Added thepolicy keyword.

parse(fp,headeronly=False)

Read all the data from the binary file-like objectfp, parse theresulting bytes, and return the message object.fp must supportboth thereadline() and theread()methods on file-like objects.

The bytes contained infp must be formatted as a block ofRFC 2822style headers and header continuation lines, optionally preceded by aenvelope header. The header block is terminated either by the end of thedata or by a blank line. Following the header block is the body of themessage (which may contain MIME-encoded subparts, including subpartswith aContent-Transfer-Encoding of8bit.

Optionalheadersonly is a flag specifying whether to stop parsing afterreading the headers or not. The default isFalse, meaning it parsesthe entire contents of the file.

parsebytes(bytes,headersonly=False)

Similar to theparse() method, except it takes a byte string objectinstead of a file-like object. Calling this method on a byte string isexactly equivalent to wrappingtext in aBytesIO instancefirst and callingparse().

Optionalheadersonly is as with theparse() method.

New in version 3.2.

Since creating a message object structure from a string or a file object is sucha common task, four functions are provided as a convenience. They are availablein the top-levelemail package namespace.

email.message_from_string(s,_class=email.message.Message,*,policy=policy.default)

Return a message object structure from a string. This is exactly equivalent toParser().parsestr(s)._class andpolicy are interpreted aswith theParser class constructor.

Changed in version 3.3:Removed thestrict argument. Added thepolicy keyword.

email.message_from_bytes(s,_class=email.message.Message,*,policy=policy.default)

Return a message object structure from a byte string. This is exactlyequivalent toBytesParser().parsebytes(s). Optional_class andstrict are interpreted as with theParser classconstructor.

New in version 3.2.

Changed in version 3.3:Removed thestrict argument. Added thepolicy keyword.

email.message_from_file(fp,_class=email.message.Message,*,policy=policy.default)

Return a message object structure tree from an openfile object.This is exactly equivalent toParser().parse(fp)._classandpolicy are interpreted as with theParser classconstructor.

Changed in version Removed:thestrict argument. Added thepolicy keyword.

email.message_from_binary_file(fp,_class=email.message.Message,*,policy=policy.default)

Return a message object structure tree from an open binaryfileobject. This is exactly equivalent toBytesParser().parse(fp)._class andpolicy are interpreted as with theParser class constructor.

New in version 3.2.

Changed in version 3.3:Removed thestrict argument. Added thepolicy keyword.

Here’s an example of how you might use this at an interactive Python prompt:

>>>importemail>>>msg=email.message_from_string(myString)

19.1.2.3. Additional notes

Here are some notes on the parsing semantics:

  • Most non-multipart type messages are parsed as a single messageobject with a string payload. These objects will returnFalse foris_multipart(). Theirget_payload() method will return a string object.
  • Allmultipart type messages will be parsed as a container messageobject with a list of sub-message objects for their payload. The outercontainer message will returnTrue foris_multipart() and theirget_payload() method will return the list ofMessage subparts.
  • Most messages with a content type ofmessage/* (e.g.message/delivery-status andmessage/rfc822) will also beparsed as container object containing a list payload of length 1. Theiris_multipart() method will returnTrue.The single element in the list payload will be a sub-message object.
  • Some non-standards compliant messages may not be internally consistent abouttheirmultipart-edness. Such messages may have aContent-Type header of typemultipart, but theiris_multipart() method may returnFalse.If such messages were parsed with theFeedParser,they will have an instance of theMultipartInvariantViolationDefect class in theirdefects attribute list. Seeemail.errors for details.

Footnotes

[1]As of email package version 3.0, introduced in Python 2.4, the classicParser was re-implemented in terms of theFeedParser, so the semantics and results areidentical between the two parsers.

Table Of Contents

Previous topic

19.1.1.email.message: Representing an email message

Next topic

19.1.3.email.generator: Generating MIME documents

This Page

Quick search

Enter search terms or a module, class or function name.

Navigation

©Copyright 1990-2017, Python Software Foundation.
The Python Software Foundation is a non-profit corporation.Please donate.
Last updated on Sep 19, 2017.Found a bug?
Created usingSphinx 1.2.

[8]ページ先頭

©2009-2025 Movatter.jp