Azure AI Document Intelligence

Azure AI Document Intelligence (formerly known asAzure Form Recognizer) is machine-learningbased service that extracts texts (including handwriting), tables, document structures (e.g., titles, section headings, etc.) and key-value-pairs fromdigital or scanned PDFs, images, Office and HTML files.
Document Intelligence supportsPDF,JPEG/JPG,PNG,BMP,TIFF,HEIF,DOCX,XLSX,PPTX andHTML.

This current implementation of a loader usingDocument Intelligence can incorporate content page-wise and turn it into LangChain documents. The default output format is markdown, which can be easily chained withMarkdownHeaderTextSplitter for semantic document chunking. You can also usemode="single" ormode="page" to return pure texts in a single page or document split by page.

Prerequisite

An Azure AI Document Intelligence resource in one of the 3 preview regions:East US,West US2,West Europe - followthis document to create one if you don't have. You will be passing<endpoint> and<key> as parameters to the loader.

%pip install--upgrade--quiet  langchain langchain-community azure-ai-documentintelligence

Example 1

The first example uses a local file which will be sent to Azure AI Document Intelligence.

With the initialized document analysis client, we can proceed to create an instance of the DocumentIntelligenceLoader:

from langchain_community.document_loadersimport AzureAIDocumentIntelligenceLoader

file_path="<filepath>"
endpoint="<endpoint>"
key="<key>"
loader= AzureAIDocumentIntelligenceLoader(
    api_endpoint=endpoint, api_key=key, file_path=file_path, api_model="prebuilt-layout"
)

documents= loader.load()

API Reference:AzureAIDocumentIntelligenceLoader

The default output contains one LangChain document with markdown format content:

documents

Example 2

The input file can also be a public URL path. E.g.,https://raw.githubusercontent.com/Azure-Samples/cognitive-services-REST-api-samples/master/curl/form-recognizer/rest-api/layout.png.

url_path="<url>"
loader= AzureAIDocumentIntelligenceLoader(
    api_endpoint=endpoint, api_key=key, url_path=url_path, api_model="prebuilt-layout"
)

documents= loader.load()

documents

Example 3

You can also specifymode="page" to load document by pages.

from langchain_community.document_loadersimport AzureAIDocumentIntelligenceLoader

file_path="<filepath>"
endpoint="<endpoint>"
key="<key>"
loader= AzureAIDocumentIntelligenceLoader(
    api_endpoint=endpoint,
    api_key=key,
    file_path=file_path,
    api_model="prebuilt-layout",
    mode="page",
)

documents= loader.load()

API Reference:AzureAIDocumentIntelligenceLoader

The output will be each page stored as a separate document in the list:

for documentin documents:
print(f"Page Content:{document.page_content}")
print(f"Metadata:{document.metadata}")

Example 4

You can also specifyanalysis_feature=["ocrHighResolution"] to enable add-on capabilities. For more information, see:https://aka.ms/azsdk/python/documentintelligence/analysisfeature.

from langchain_community.document_loadersimport AzureAIDocumentIntelligenceLoader

file_path="<filepath>"
endpoint="<endpoint>"
key="<key>"
analysis_features=["ocrHighResolution"]
loader= AzureAIDocumentIntelligenceLoader(
    api_endpoint=endpoint,
    api_key=key,
    file_path=file_path,
    api_model="prebuilt-layout",
    analysis_features=analysis_features,
)

documents= loader.load()

API Reference:AzureAIDocumentIntelligenceLoader

The output contains the LangChain document recognized with high resolution add-on capability:

documents

Document loaderconceptual guide
Document loaderhow-to guides

Movatterモバイル変換

Azure AI Document Intelligence

Prerequisite

Example 1

Example 2

Example 3

Example 4

Related

Movatterモバイル変換

Prerequisite​

Example 1​

Example 2​

Example 3​

Example 4​

Related​

Prerequisite

Example 1

Example 2

Example 3

Example 4

Related