Movatterモバイル変換


[0]ホーム

URL:


Skip to main content
OurBuilding Ambient Agents with LangGraph course is now available on LangChain Academy!
Open In ColabOpen on GitHub

Email

This notebook shows how to load email (.eml) orMicrosoft Outlook (.msg) files.

Please seethis guide for more instructions on setting up Unstructured locally, including setting up required system dependencies.

Using Unstructured

%pip install--upgrade--quiet unstructured
from langchain_community.document_loadersimport UnstructuredEmailLoader

loader= UnstructuredEmailLoader("./example_data/fake-email.eml")

data= loader.load()

data
[Document(page_content='This is a test email to use for unit tests.\n\nImportant points:\n\nRoses are red\n\nViolets are blue', metadata={'source': './example_data/fake-email.eml'})]

Retain Elements

Under the hood, Unstructured creates different "elements" for different chunks of text. By default we combine those together, but you can easily keep that separation by specifyingmode="elements".

loader= UnstructuredEmailLoader("example_data/fake-email.eml", mode="elements")

data= loader.load()

data[0]
Document(page_content='This is a test email to use for unit tests.', metadata={'source': 'example_data/fake-email.eml', 'file_directory': 'example_data', 'filename': 'fake-email.eml', 'last_modified': '2022-12-16T17:04:16-05:00', 'sent_from': ['Matthew Robinson <mrobinson@unstructured.io>'], 'sent_to': ['Matthew Robinson <mrobinson@unstructured.io>'], 'subject': 'Test Email', 'languages': ['eng'], 'filetype': 'message/rfc822', 'category': 'NarrativeText'})

Processing Attachments

You can process attachments withUnstructuredEmailLoader by settingprocess_attachments=True in the constructor. By default, attachments will be partitioned using thepartition function fromunstructured. You can use a different partitioning function by passing the function to theattachment_partitioner kwarg.

loader= UnstructuredEmailLoader(
"example_data/fake-email.eml",
mode="elements",
process_attachments=True,
)

data= loader.load()

data[0]
Document(page_content='This is a test email to use for unit tests.', metadata={'source': 'example_data/fake-email.eml', 'file_directory': 'example_data', 'filename': 'fake-email.eml', 'last_modified': '2022-12-16T17:04:16-05:00', 'sent_from': ['Matthew Robinson <mrobinson@unstructured.io>'], 'sent_to': ['Matthew Robinson <mrobinson@unstructured.io>'], 'subject': 'Test Email', 'languages': ['eng'], 'filetype': 'message/rfc822', 'category': 'NarrativeText'})

Using OutlookMessageLoader

%pip install--upgrade--quiet extract_msg
from langchain_community.document_loadersimport OutlookMessageLoader

loader= OutlookMessageLoader("example_data/fake-email.msg")

data= loader.load()

data[0]
API Reference:OutlookMessageLoader
Document(page_content='This is a test email to experiment with the MS Outlook MSG Extractor\r\n\r\n\r\n-- \r\n\r\n\r\nKind regards\r\n\r\n\r\n\r\n\r\nBrian Zhou\r\n\r\n', metadata={'source': 'example_data/fake-email.msg', 'subject': 'Test for TIF files', 'sender': 'Brian Zhou <brizhou@gmail.com>', 'date': datetime.datetime(2013, 11, 18, 0, 26, 24, tzinfo=zoneinfo.ZoneInfo(key='America/Los_Angeles'))})

Related


[8]ページ先頭

©2009-2025 Movatter.jp