JSONLoader

This notebook provides a quick overview for getting started with JSONdocument loader. For detailed documentation of all JSONLoader features and configurations head to theAPI reference.

TODO: Add any other relevant links, like information about underlying API, etc.

Overview

Integration details

Class	Package	Local	Serializable	JS support
JSONLoader	langchain_community	✅	❌	✅

Loader features

Source	Document Lazy Loading	Native Async Support
JSONLoader	✅	❌

Setup

To access JSON document loader you'll need to install thelangchain-community integration package as well as thejq python package.

Credentials

No credentials are required to use theJSONLoader class.

To enable automated tracing of your model calls, set yourLangSmith API key:

# os.environ["LANGSMITH_API_KEY"] = getpass.getpass("Enter your LangSmith API key: ")
# os.environ["LANGSMITH_TRACING"] = "true"

Installation

Installlangchain_community andjq:

%pip install-qU langchain_community jq

Initialization

Now we can instantiate our model object and load documents:

TODO: Update model instantiation with relevant params.

from langchain_community.document_loadersimport JSONLoader

loader= JSONLoader(
    file_path="./example_data/facebook_chat.json",
    jq_schema=".messages[].content",
    text_content=False,
)

API Reference:JSONLoader

Load

docs= loader.load()
docs[0]

Document(metadata={'source': '/Users/isaachershenson/Documents/langchain/docs/docs/integrations/document_loaders/example_data/facebook_chat.json', 'seq_num': 1}, page_content='Bye!')

print(docs[0].metadata)

{'source': '/Users/isaachershenson/Documents/langchain/docs/docs/integrations/document_loaders/example_data/facebook_chat.json', 'seq_num': 1}

Lazy Load

pages=[]
for docin loader.lazy_load():
    pages.append(doc)
iflen(pages)>=10:
# do some paged operation, e.g.
# index.upsert(pages)

        pages=[]

Read from JSON Lines file

If you want to load documents from a JSON Lines file, you passjson_lines=Trueand specifyjq_schema to extractpage_content from a single JSON object.

loader= JSONLoader(
    file_path="./example_data/facebook_chat_messages.jsonl",
    jq_schema=".content",
    text_content=False,
    json_lines=True,
)

docs= loader.load()
print(docs[0])

page_content='Bye!' metadata={'source': '/Users/isaachershenson/Documents/langchain/docs/docs/integrations/document_loaders/example_data/facebook_chat_messages.jsonl', 'seq_num': 1}

Read specific content keys

Another option is to setjq_schema='.' and provide acontent_key in order to only load specific content:

loader= JSONLoader(
    file_path="./example_data/facebook_chat_messages.jsonl",
    jq_schema=".",
    content_key="sender_name",
    json_lines=True,
)

docs= loader.load()
print(docs[0])

page_content='User 2' metadata={'source': '/Users/isaachershenson/Documents/langchain/docs/docs/integrations/document_loaders/example_data/facebook_chat_messages.jsonl', 'seq_num': 1}

JSON file with jq schema`content_key`

To load documents from a JSON file using thecontent_key within the jq schema, setis_content_key_jq_parsable=True. Ensure thatcontent_key is compatible and can be parsed using the jq schema.

loader= JSONLoader(
    file_path="./example_data/facebook_chat.json",
    jq_schema=".messages[]",
    content_key=".content",
    is_content_key_jq_parsable=True,
)

docs= loader.load()
print(docs[0])

page_content='Bye!' metadata={'source': '/Users/isaachershenson/Documents/langchain/docs/docs/integrations/document_loaders/example_data/facebook_chat.json', 'seq_num': 1}

Extracting metadata

Generally, we want to include metadata available in the JSON file into the documents that we create from the content.

The following demonstrates how metadata can be extracted using theJSONLoader.

There are some key changes to be noted. In the previous example where we didn't collect the metadata, we managed to directly specify in the schema where the value for thepage_content can be extracted from.

In this example, we have to tell the loader to iterate over the records in themessages field. The jq_schema then has to be.messages[]

This allows us to pass the records (dict) into themetadata_func that has to be implemented. Themetadata_func is responsible for identifying which pieces of information in the record should be included in the metadata stored in the finalDocument object.

Additionally, we now have to explicitly specify in the loader, via thecontent_key argument, the key from the record where the value for thepage_content needs to be extracted from.

# Define the metadata extraction function.
defmetadata_func(record:dict, metadata:dict)->dict:
    metadata["sender_name"]= record.get("sender_name")
    metadata["timestamp_ms"]= record.get("timestamp_ms")

return metadata


loader= JSONLoader(
    file_path="./example_data/facebook_chat.json",
    jq_schema=".messages[]",
    content_key="content",
    metadata_func=metadata_func,
)

docs= loader.load()
print(docs[0].metadata)

{'source': '/Users/isaachershenson/Documents/langchain/docs/docs/integrations/document_loaders/example_data/facebook_chat.json', 'seq_num': 1, 'sender_name': 'User 2', 'timestamp_ms': 1675597571851}

API reference

For detailed documentation of all JSONLoader features and configurations head to the API reference:https://python.langchain.com/api_reference/community/document_loaders/langchain_community.document_loaders.json_loader.JSONLoader.html

Document loaderconceptual guide
Document loaderhow-to guides

Movatterモバイル変換

JSONLoader

Overview

Integration details

Loader features

Setup

Credentials

Installation

Initialization

Load

Lazy Load

Read from JSON Lines file

Read specific content keys

JSON file with jq schema`content_key`

Extracting metadata

API reference

Related

Movatterモバイル変換

Overview​

Integration details​

Loader features​

Setup​

Credentials​

Installation​

Initialization​

Load​

Lazy Load​

Read from JSON Lines file​

Read specific content keys​

JSON file with jq schemacontent_key​

Extracting metadata​

API reference​

Related​

Overview

Integration details

Loader features

Setup

Credentials

Installation

Initialization

Load

Lazy Load

Read from JSON Lines file

Read specific content keys

JSON file with jq schema`content_key`

Extracting metadata

API reference

Related