Movatterモバイル変換


[0]ホーム

URL:


Skip to main content
Unstructured open source
Getting started with open source
Ingestion
Best practices
Integrations
Installation

Full installation

To install the Unstructured open source library on a local development machine, run one or more of the following commands.These commands assume that you are using the Python package and project manageruv, running within an activatedvenvvirtual environment that was created withuv. However,uv andvenv are not required.To work with allsupported file types, run:
uv add "unstructured[all-docs]"
To conserve disk space and reduce code dependencies, you can run the following command instead to work with adefault set of supported file types:
uv add unstructured
The preceding command supports plain text files (.txt), HTML files (.html), XML files (.xml), and emails (.eml,.msg, and.p7s) by default.To further conserve disk space and reduce code dependencies, you can run the following command instead, replacing<extra> with the appropriate extra for the target file type:
uv add "unstructured[<extra>]"
The following file type extras are available:
  • all-docs (for all supported file types in this list)
  • csv (for.csv files only)
  • docx (for.doc and.docx files only)
  • epub (for.epub files only)
  • image (for all supported image file types:.bmp,.heic,.jpeg,.png, and.tiff)
  • md (for.md files only)
  • odt (for.odt files only)
  • org (for.org files only)
  • pdf (for.pdf files only)
  • pptx (for.ppt and.pptx files only)
  • rst (for.rst files only)
  • rtf (for.rtf files only)
  • tsv (for.tsv files only)
  • xlsx (for.xls and.xlsx files only)
Note that you can install multiple extras at the same time by separating them with commas, for example:
uv add "unstructured[pdf,docx]"
For maximum compatiblity, you should also install the following system dependencies:
  • libmagic-dev (for filetype detection)
  • poppler-utils andtesseract-ocr (for images and PDFs), andtesseract-lang (for additional language support)
  • libreoffice (for Microsoft Office documents)
  • pandoc (for.epub,.odt, and.rtf files. For.rtf files, you must have version 2.14.2 or newer. Runningthis script will install the correct version for you.)
Installation instructured for these system dependencies vary by operating system type. For details, follow the preceding links or see youroperating system’s documentation.

Was this page helpful?

⌘I

[8]ページ先頭

©2009-2026 Movatter.jp