Movatterモバイル変換


[0]ホーム

URL:


Skip to main content
OurBuilding Ambient Agents with LangGraph course is now available on LangChain Academy!
Open In ColabOpen on GitHub

BoxLoader and BoxBlobLoader

Thelangchain-box package provides two methods to index your files from Box:BoxLoader andBoxBlobLoader.BoxLoader allows you to ingest text representations of files that have a text representation in Box. TheBoxBlobLoader allows you download the blob for any document or image file for processing with the blob parser of your choice.

This notebook details getting started with both of these. For detailed documentation of all BoxLoader features and configurations head to the API Reference pages forBoxLoader andBoxBlobLoader.

Overview

TheBoxLoader class helps you get your unstructured content from Box in Langchain'sDocument format. You can do this with either aList[str] containing Box file IDs, or with astr containing a Box folder ID.

TheBoxBlobLoader class helps you get your unstructured content from Box in Langchain'sBlob format. You can do this with aList[str] containing Box file IDs, astr containing a Box folder ID, a search query, or aBoxMetadataQuery.

If getting files from a folder with folder ID, you can also set aBool to tell the loader to get all sub-folders in that folder, as well.

info

A Box instance can contain Petabytes of files, and folders can contain millions of files. Be intentional when choosing what folders you choose to index. And we recommend never getting all files from folder 0 recursively. Folder ID 0 is your root folder.

TheBoxLoader will skip files without a text representation, while theBoxBlobLoader will return blobs for all document and image files.

Integration details

ClassPackageLocalSerializableJS support
BoxLoaderlangchain_box
BoxBlobLoaderlangchain_box

Loader features

SourceDocument Lazy LoadingAsync Support
BoxLoader
BoxBlobLoader

Setup

In order to use the Box package, you will need a few things:

  • A Box account — If you are not a current Box customer or want to test outside of your production Box instance, you can use afree developer account.
  • A Box app — This is configured in thedeveloper console, and for Box AI, must have theManage AI scope enabled. Here you will also select your authentication method
  • The app must beenabled by the administrator. For free developer accounts, this is whomever signed up for the account.

Credentials

For these examples, we will usetoken authentication. This can be used with anyauthentication method. Just get the token with whatever methodology. If you want to learn more about how to use other authentication types withlangchain-box, visit theBox provider document.

import getpass
import os

box_developer_token= getpass.getpass("Enter your Box Developer Token: ")
Enter your Box Developer Token:  ········

To enable automated tracing of your model calls, set yourLangSmith API key:

# os.environ["LANGSMITH_API_KEY"] = getpass.getpass("Enter your LangSmith API key: ")
# os.environ["LANGSMITH_TRACING"] = "true"

Installation

Installlangchain_box.

%pip install-qU langchain_box

Initialization

Load files

If you wish to load files, you must provide theList of file ids at instantiation time.

This requires 1 piece of information:

  • box_file_ids (List[str])- A list of Box file IDs.

BoxLoader

from langchain_box.document_loadersimport BoxLoader

box_file_ids=["1514555423624","1514553902288"]

loader= BoxLoader(
box_developer_token=box_developer_token,
box_file_ids=box_file_ids,
character_limit=10000,# Optional. Defaults to no limit
)

BoxBlobLoader

from langchain_box.blob_loadersimport BoxBlobLoader

box_file_ids=["1514555423624","1514553902288"]

loader= BoxBlobLoader(
box_developer_token=box_developer_token, box_file_ids=box_file_ids
)

Load from folder

If you wish to load files from a folder, you must provide astr with the Box folder ID at instantiation time.

This requires 1 piece of information:

  • box_folder_id (str)- A string containing a Box folder ID.

BoxLoader

from langchain_box.document_loadersimport BoxLoader

box_folder_id="260932470532"

loader= BoxLoader(
box_folder_id=box_folder_id,
recursive=False,# Optional. return entire tree, defaults to False
character_limit=10000,# Optional. Defaults to no limit
)

BoxBlobLoader

from langchain_box.blob_loadersimport BoxBlobLoader

box_folder_id="260932470532"

loader= BoxBlobLoader(
box_folder_id=box_folder_id,
recursive=False,# Optional. return entire tree, defaults to False
)

Search for files with BoxBlobLoader

If you need to search for files, theBoxBlobLoader offers two methods. First you can perform a full text search with optional search options to narrow down that search.

This requires 1 piece of information:

  • query (str)- A string containing the search query to perform.

You can also provide aBoxSearchOptions object to narrow down that search

  • box_search_options (BoxSearchOptions)

BoxBlobLoader search

from langchain_box.blob_loadersimport BoxBlobLoader
from langchain_box.utilitiesimport BoxSearchOptions, DocumentFiles, SearchTypeFilter

box_folder_id="260932470532"

box_search_options= BoxSearchOptions(
ancestor_folder_ids=[box_folder_id],
search_type_filter=[SearchTypeFilter.FILE_CONTENT],
created_date_range=["2023-01-01T00:00:00-07:00","2024-08-01T00:00:00-07:00,"],
file_extensions=[DocumentFiles.DOCX, DocumentFiles.PDF],
k=200,
size_range=[1,1000000],
updated_data_range=None,
)

loader= BoxBlobLoader(
box_developer_token=box_developer_token,
query="Victor",
box_search_options=box_search_options,
)

You can also search for content based on Box Metadata. If your Box instance uses Metadata, you can search for any documents that have a specific Metadata Template attached that meet a certain criteria, like returning any invoices with a total greater than or equal to $500 that were created last quarter.

This requires 1 piece of information:

  • query (str)- A string containing the search query to perform.

You can also provide aBoxSearchOptions object to narrow down that search

  • box_search_options (BoxSearchOptions)

BoxBlobLoader Metadata query

from langchain_box.blob_loadersimport BoxBlobLoader
from langchain_box.utilitiesimport BoxMetadataQuery

query= BoxMetadataQuery(
template_key="enterprise_1234.myTemplate",
query="total >= :value",
query_params={"value":100},
ancestor_folder_id="260932470532",
)

loader= BoxBlobLoader(box_metadata_query=query)

Load

BoxLoader

docs= loader.load()
docs[0]
Document(metadata={'source': 'https://dl.boxcloud.com/api/2.0/internal_files/1514555423624/versions/1663171610024/representations/extracted_text/content/', 'title': 'Invoice-A5555_txt'}, page_content='Vendor: AstroTech Solutions\nInvoice Number: A5555\n\nLine Items:\n    - Gravitational Wave Detector Kit: $800\n    - Exoplanet Terrarium: $120\nTotal: $920')
print(docs[0].metadata)
{'source': 'https://dl.boxcloud.com/api/2.0/internal_files/1514555423624/versions/1663171610024/representations/extracted_text/content/', 'title': 'Invoice-A5555_txt'}

BoxBlobLoader

for blobin loader.yield_blobs():
print(f"Blob({blob})")
Blob(id='1514555423624' metadata={'source': 'https://app.box.com/0/260935730128/260931903795/Invoice-A5555.txt', 'name': 'Invoice-A5555.txt', 'file_size': 150} data="b'Vendor: AstroTech Solutions\\nInvoice Number: A5555\\n\\nLine Items:\\n    - Gravitational Wave Detector Kit: $800\\n    - Exoplanet Terrarium: $120\\nTotal: $920'" mimetype='text/plain' path='https://app.box.com/0/260935730128/260931903795/Invoice-A5555.txt')
Blob(id='1514553902288' metadata={'source': 'https://app.box.com/0/260935730128/260931903795/Invoice-B1234.txt', 'name': 'Invoice-B1234.txt', 'file_size': 168} data="b'Vendor: Galactic Gizmos Inc.\\nInvoice Number: B1234\\nPurchase Order Number: 001\\nLine Items:\\n - Quantum Flux Capacitor: $500\\n - Anti-Gravity Pen Set: $75\\nTotal: $575'" mimetype='text/plain' path='https://app.box.com/0/260935730128/260931903795/Invoice-B1234.txt')

Lazy Load

BoxLoader only

page=[]
for docin loader.lazy_load():
page.append(doc)
iflen(page)>=10:
# do some paged operation, e.g.
# index.upsert(page)

page=[]

Extra fields

All Box connectors offer the ability to select additional fields from the BoxFileFull object to return as custom LangChain metadata. Each object accepts an optionalList[str] calledextra_fields containing the json key from the return object, likeextra_fields=["shared_link"].

The connector will add this field to the list of fields the integration needs to function and then add the results to the metadata returned in theDocument orBlob, like"metadata" : { "source" : "source, "shared_link" : "shared_link" }. If the field is unavailable for that file, it will be returned as an empty string, like"shared_link" : "".

API reference

For detailed documentation of all BoxLoader features and configurations head to theAPI reference

Help

If you have questions, you can check out ourdeveloper documentation or reach out to use in ourdeveloper community.

Related


[8]ページ先頭

©2009-2025 Movatter.jp