Movatterモバイル変換


[0]ホーム

URL:


Skip to main content
OurBuilding Ambient Agents with LangGraph course is now available on LangChain Academy!
Open In ColabOpen on GitHub

UnstructuredMarkdownLoader

This notebook provides a quick overview for getting started with UnstructuredMarkdowndocument loader. For detailed documentation of all __ModuleName__Loader features and configurations head to theAPI reference.

Overview

Integration details

ClassPackageLocalSerializableJS support
UnstructuredMarkdownLoaderlangchain_community

Loader features

SourceDocument Lazy LoadingNative Async Support
UnstructuredMarkdownLoader

Setup

To access UnstructuredMarkdownLoader document loader you'll need to install thelangchain-community integration package and theunstructured python package.

Credentials

No credentials are needed to use this loader.

To enable automated tracing of your model calls, set yourLangSmith API key:

# os.environ["LANGSMITH_API_KEY"] = getpass.getpass("Enter your LangSmith API key: ")
# os.environ["LANGSMITH_TRACING"] = "true"

Installation

Installlangchain_community andunstructured

%pip install-qU langchain_community unstructured

Initialization

Now we can instantiate our model object and load documents.

You can run the loader in one of two modes: "single" and "elements". If you use "single" mode, the document will be returned as a singleDocument object. If you use "elements" mode, the unstructured library will split the document into elements such asTitle andNarrativeText. You can pass in additionalunstructured kwargs after mode to apply differentunstructured settings.

from langchain_community.document_loadersimport UnstructuredMarkdownLoader

loader= UnstructuredMarkdownLoader(
"./example_data/example.md",
mode="single",
strategy="fast",
)

Load

docs= loader.load()
docs[0]
Document(metadata={'source': './example_data/example.md'}, page_content='Sample Markdown Document\n\nIntroduction\n\nWelcome to this sample Markdown document. Markdown is a lightweight markup language used for formatting text. It\'s widely used for documentation, readme files, and more.\n\nFeatures\n\nHeaders\n\nMarkdown supports multiple levels of headers:\n\nHeader 1: # Header 1\n\nHeader 2: ## Header 2\n\nHeader 3: ### Header 3\n\nLists\n\nUnordered List\n\nItem 1\n\nItem 2\n\nSubitem 2.1\n\nSubitem 2.2\n\nOrdered List\n\nFirst item\n\nSecond item\n\nThird item\n\nLinks\n\nOpenAI is an AI research organization.\n\nImages\n\nHere\'s an example image:\n\nCode\n\nInline Code\n\nUse code for inline code snippets.\n\nCode Block\n\n\`\`\`python def greet(name): return f"Hello, {name}!"\n\nprint(greet("World")) \`\`\`')
print(docs[0].metadata)
{'source': './example_data/example.md'}

Lazy Load

page=[]
for docin loader.lazy_load():
page.append(doc)
iflen(page)>=10:
# do some paged operation, e.g.
# index.upsert(page)

page=[]
page[0]
Document(metadata={'source': './example_data/example.md', 'link_texts': ['OpenAI'], 'link_urls': ['https://www.openai.com'], 'last_modified': '2024-08-14T15:04:18', 'languages': ['eng'], 'parent_id': 'de1f74bf226224377ab4d8b54f215bb9', 'filetype': 'text/markdown', 'file_directory': './example_data', 'filename': 'example.md', 'category': 'NarrativeText', 'element_id': '898a542a261f7dc65e0072d1e847d535'}, page_content='OpenAI is an AI research organization.')

Load Elements

In this example we will load in theelements mode, which will return a list of the different elements in the markdown document:

from langchain_community.document_loadersimport UnstructuredMarkdownLoader

loader= UnstructuredMarkdownLoader(
"./example_data/example.md",
mode="elements",
strategy="fast",
)

docs= loader.load()
len(docs)
29

As you see there are 29 elements that were pulled from theexample.md file. The first element is the title of the document as expected:

docs[0].page_content
'Sample Markdown Document'

API reference

For detailed documentation of all UnstructuredMarkdownLoader features and configurations head to the API reference:https://python.langchain.com/api_reference/community/document_loaders/langchain_community.document_loaders.markdown.UnstructuredMarkdownLoader.html

Related


[8]ページ先頭

©2009-2025 Movatter.jp