Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Commit1892a67

Browse files
docs: adding context for Textract linearization-config param (#32064)
Before jumping into tech implementation, I added a context forlinearization-config param, and explained what's linealization in thiscontext.I also linked an AWS blog for more advanced use cases, as this singleexample doesn't cover all use cases.---------Co-authored-by: Mason Daugherty <mason@langchain.dev>
1 parent2ab2cab commit1892a67

File tree

1 file changed

+7
-1
lines changed

1 file changed

+7
-1
lines changed

‎docs/docs/integrations/document_loaders/amazon_textract.ipynb

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -216,7 +216,13 @@
216216
"source": [
217217
"## Example 4: Customizing the output format\n",
218218
"\n",
219-
"You have the option to pass an additional parameter called `linearization_config` to the AmazonTextractPDFLoader which will determine how the text output will be linearized by the parser after Textract runs."
219+
"When Amazon Textract processes a PDF, it extracts all text, including elements like headers, footers, and page numbers. This extra information can be\"noisy\" and reduce the effectiveness of the output.\n",
220+
"\n",
221+
"The process of converting a document's 2D layout into a clean, one-dimensional string of text is called linearization.\n",
222+
"\n",
223+
"The AmazonTextractPDFLoader gives you precise control over this process with the `linearization_config` parameter. You can use it to specify which elements to exclude from the final output.\n",
224+
"\n",
225+
"The following example shows how to hide headers, footers, and figures, resulting in a much cleaner text block, for more advanced use cases see this [AWS blog post](https://aws.amazon.com/blogs/machine-learning/amazon-textracts-new-layout-feature-introduces-efficiencies-in-general-purpose-and-generative-ai-document-processing-tasks/)."
220226
]
221227
},
222228
{

0 commit comments

Comments
 (0)

[8]ページ先頭

©2009-2025 Movatter.jp