- Notifications
You must be signed in to change notification settings - Fork15
A JavaScript/Typescript client for the Unstructured Platform API
License
Unstructured-IO/unstructured-js-client
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
This is a HTTP client for theUnstructured Platform API. You can sign uphere and process 1000 free pages per day for 14 days.
Please refer to the our documentation for a full guide on integrating thePartition Endpoint into your JavaScript/TypeScript code. Support for theWorkflow Endpoint is coming soon.
npm install unstructured-client --include=dev
yarn add unstructured-client --dev
This SDK is also an installable MCP server where the various SDK methods areexposed as tools that can be invoked by AI applications.
Node.js v20 or greater is required to run the MCP server.
Claude installation steps
Add the following server definition to yourclaude_desktop_config.json file:
{"mcpServers": {"Unstructured": {"command":"npx","args": ["-y","--package","unstructured-client","--","mcp","start", ] } }}Cursor installation steps
Go toCursor Settings > Features > MCP Servers > Add new MCP server and use the following settings:
- Name: Unstructured
- Type:
command - Command:
npx -y --package unstructured-client -- mcp start
For a full list of server arguments, run:
npx -y --package unstructured-client -- mcp start --help
import{UnstructuredClient}from"unstructured-client";import{PartitionResponse}from"unstructured-client/sdk/models/operations";import{Strategy}from"unstructured-client/sdk/models/shared";import*asfsfrom"fs";constunstructuredClient=newUnstructuredClient({security:{apiKeyAuth:"YOUR_API_KEY",},});constfilename="./sample-file";constdata=fs.readFileSync(filename);unstructuredClient.general.partition({partitionParameters:{files:{content:data,fileName:filename,},strategy:Strategy.Auto,}}).then((res:PartitionResponse)=>{if(res.statusCode==200){console.log(res.elements);}}).catch((e)=>{console.log(e.statusCode);console.log(e.body);});
Refer to theAPI parameters page for all available parameters.
If you are self hosting the API, or developing locally, you can change the server URL when setting up the client.
constclient=newUnstructuredClient({serverURL:"http://localhost:8000",security:{apiKeyAuth:key,},});// ORconstclient=newUnstructuredClient({serverURL:"https://my-server-url",security:{apiKeyAuth:key,},});
The TypeScript SDK makes API calls using anHTTPClient that wraps the nativeFetch API. Thisclient is a thin wrapper aroundfetch and provides the ability to attach hooksaround the request lifecycle that can be used to modify the request or handleerrors and response.
TheHTTPClient constructor takes an optionalfetcher argument that can beused to integrate a third-party HTTP client or when writing tests to mock outthe HTTP client and feed in fixtures.
The following example shows how to use the"beforeRequest" hook to to add acustom header and a timeout to requests and how to use the"requestError" hookto log errors:
import{UnstructuredClient}from"unstructured-client";import{HTTPClient}from"unstructured-client/lib/http";consthttpClient=newHTTPClient({// fetcher takes a function that has the same signature as native `fetch`.fetcher:(request)=>{returnfetch(request);}});httpClient.addHook("beforeRequest",(request)=>{constnextRequest=newRequest(request,{signal:request.signal||AbortSignal.timeout(5000)});nextRequest.headers.set("x-custom-header","custom value");returnnextRequest;});httpClient.addHook("requestError",(error,request)=>{console.group("Request Error");console.log("Reason:",`${error}`);console.log("Endpoint:",`${request.method}${request.url}`);console.groupEnd();});constsdk=newUnstructuredClient({httpClient:httpClient});
Some of the endpoints in this SDK support retries. If you use the SDK without any configuration, it will fall back to the default retry strategy provided by the API. However, the default retry strategy can be overridden on a per-operation basis, or across the entire SDK.
To change the default retry strategy for a single API call, simply provide a retryConfig object to the call:
import{openAsBlob}from"node:fs";import{UnstructuredClient}from"unstructured-client";import{Strategy,VLMModelProvider,}from"unstructured-client/sdk/models/shared";constunstructuredClient=newUnstructuredClient();asyncfunctionrun(){constresult=awaitunstructuredClient.general.partition({partitionParameters:{chunkingStrategy:"by_title",files:awaitopenAsBlob("example.file"),splitPdfPageRange:[1,10,],strategy:Strategy.Auto,vlmModel:"gpt-4o",vlmModelProvider:VLMModelProvider.Openai,},},{retries:{strategy:"backoff",backoff:{initialInterval:1,maxInterval:50,exponent:1.1,maxElapsedTime:100,},retryConnectionErrors:false,},});console.log(result);}run();
If you'd like to override the default retry strategy for all operations that support retries, you can provide a retryConfig at SDK initialization:
import{openAsBlob}from"node:fs";import{UnstructuredClient}from"unstructured-client";import{Strategy,VLMModelProvider,}from"unstructured-client/sdk/models/shared";constunstructuredClient=newUnstructuredClient({retryConfig:{strategy:"backoff",backoff:{initialInterval:1,maxInterval:50,exponent:1.1,maxElapsedTime:100,},retryConnectionErrors:false,},});asyncfunctionrun(){constresult=awaitunstructuredClient.general.partition({partitionParameters:{chunkingStrategy:"by_title",files:awaitopenAsBlob("example.file"),splitPdfPageRange:[1,10,],strategy:Strategy.Auto,vlmModel:"gpt-4o",vlmModelProvider:VLMModelProvider.Openai,},});console.log(result);}run();
Seepage splitting for more details.
In order to speed up processing of large PDF files, the client splits up PDFs into smaller files, sends these to the API concurrently, and recombines the results.splitPdfPage can be set tofalse to disable this.
The amount of parallel requests is controlled bysplitPdfConcurrencyLevel parameter. By default it equals to 5. It can't be more than 15, to avoid too high resource usage and costs. The size of each batch is determined internally and it can vary between 2 and 20 pages per split.
client.general.partition({partitionParameters:{files:{content:data,fileName:filename,},// Set splitPdfPage parameter to false in order to disable splitting PDFsplitPdfPage:true,// Modify splitPdfConcurrencyLevel to change the limit of parallel requestssplitPdfConcurrencyLevel:10,},}};
- SDK Installation
- SDK Example Usage
- Change the base URL
- Custom HTTP Client
- Retries
- Requirements
- Standalone functions
- File uploads
- Debugging
For supported JavaScript runtimes, please consultRUNTIMES.md.
All the methods listed above are available as standalone functions. Thesefunctions are ideal for use in applications running in the browser, serverlessruntimes or other environments where application bundle size is a primaryconcern. When using a bundler to build your application, all unusedfunctionality will be either excluded from the final bundle or tree-shaken away.
To read more about standalone functions, checkFUNCTIONS.md.
Available standalone functions
generalPartition- Summary
Certain SDK methods accept files as part of a multi-part request. It is possible and typically recommended to upload files as a stream rather than reading the entire contents into memory. This avoids excessive memory consumption and potentially crashing with out-of-memory errors when working with very large files. The following example demonstrates how to attach a file stream to a request.
Tip
Depending on your JavaScript runtime, there are convenient utilities that return a handle to a file without reading the entire contents into memory:
- Node.js v20+: Since v20, Node.js comes with a native
openAsBlobfunction innode:fs. - Bun: The native
Bun.filefunction produces a file handle that can be used for streaming file uploads. - Browsers: All supported browsers return an instance to a
Filewhen reading the value from an<input type="file">element. - Node.js v18: A file stream can be created using the
fileFromhelper fromfetch-blob/from.js.
import{openAsBlob}from"node:fs";import{UnstructuredClient}from"unstructured-client";import{Strategy,VLMModelProvider,}from"unstructured-client/sdk/models/shared";constunstructuredClient=newUnstructuredClient();asyncfunctionrun(){constresult=awaitunstructuredClient.general.partition({partitionParameters:{chunkingStrategy:"by_title",files:awaitopenAsBlob("example.file"),splitPdfPageRange:[1,10,],strategy:Strategy.Auto,vlmModel:"gpt-4o",vlmModelProvider:VLMModelProvider.Openai,},});console.log(result);}run();
You can setup your SDK to emit debug logs for SDK requests and responses.
You can pass a logger that matchesconsole's interface as an SDK option.
Warning
Beware that debug logging will reveal secrets, like API tokens in headers, in log messages printed to a console or files. It's recommended to use this feature only during local development and not in production.
import{UnstructuredClient}from"unstructured-client";constsdk=newUnstructuredClient({debugLogger:console});
This SDK is in beta, and there may be breaking changes between versions without a major version update. Therefore, we recommend pinning usageto a specific package version. This way, you can install the same version each time without breaking changes unless you are intentionallylooking for the latest version.
While we value open-source contributions to this SDK, this library is generated programmatically.Feel free to open a PR or a Github issue as a proof of concept and we'll do our best to include it in a future release!
SDK Created bySpeakeasy
About
A JavaScript/Typescript client for the Unstructured Platform API
Resources
License
Code of conduct
Contributing
Security policy
Uh oh!
There was an error while loading.Please reload this page.
