AgentQLLoader
AgentQL's document loader provides structured data extraction from any web page using anAgentQL query. AgentQL can be used across multiple languages and web pages without breaking over time and change.
Overview
AgentQLLoader
requires the following two parameters:
url
: The URL of the web page you want to extract data from.query
: The AgentQL query to execute. Learn more abouthow to write an AgentQL query in the docs or test one out in theAgentQL Playground.
Setting the following parameters are optional:
api_key
: Your AgentQL API key fromdev.agentql.com.Optional
.timeout
: The number of seconds to wait for a request before timing out.Defaults to900
.is_stealth_mode_enabled
: Whether to enable experimental anti-bot evasion strategies. This feature may not work for all websites at all times. Data extraction may take longer to complete with this mode enabled.Defaults toFalse
.wait_for
: The number of seconds to wait for the page to load before extracting data.Defaults to0
.is_scroll_to_bottom_enabled
: Whether to scroll to bottom of the page before extracting data.Defaults toFalse
.mode
:"standard"
uses deep data analysis, while"fast"
trades some depth of analysis for speed and is adequate for most usecases.Learn more about the modes in this guide.Defaults to"fast"
.is_screenshot_enabled
: Whether to take a screenshot before extracting data. Returned in 'metadata' as a Base64 string.Defaults toFalse
.
AgentQLLoader is implemented with AgentQL'sREST API
Integration details
Class | Package | Local | Serializable | JS support |
---|---|---|---|---|
AgentQLLoader | langchain-agentql | ✅ | ❌ | ❌ |
Loader features
Source | Document Lazy Loading | Native Async Support |
---|---|---|
AgentQLLoader | ✅ | ❌ |
Setup
To use the AgentQL Document Loader, you will need to configure theAGENTQL_API_KEY
environment variable, or use theapi_key
parameter. You can acquire an API key from ourDev Portal.
Installation
Installlangchain-agentql.
%pip install-qU langchain_agentql
Set Credentials
import os
os.environ["AGENTQL_API_KEY"]="YOUR_AGENTQL_API_KEY"
Initialization
Next instantiate your model object:
from langchain_agentql.document_loadersimport AgentQLLoader
loader= AgentQLLoader(
url="https://www.agentql.com/blog",
query="""
{
posts[] {
title
url
date
author
}
}
""",
is_scroll_to_bottom_enabled=True,
)
Load
docs= loader.load()
docs[0]
Document(metadata={'request_id': 'bdb9dbe7-8a7f-427f-bc16-839ccc02cae6', 'generated_query': None, 'screenshot': None}, page_content="{'posts': [{'title': 'Launch Week Recap—make the web AI-ready', 'url': 'https://www.agentql.com/blog/2024-launch-week-recap', 'date': 'Nov 18, 2024', 'author': 'Rachel-Lee Nabors'}, {'title': 'Accurate data extraction from PDFs and images with AgentQL', 'url': 'https://www.agentql.com/blog/accurate-data-extraction-pdfs-images', 'date': 'Feb 1, 2025', 'author': 'Rachel-Lee Nabors'}, {'title': 'Introducing Scheduled Scraping Workflows', 'url': 'https://www.agentql.com/blog/scheduling', 'date': 'Dec 2, 2024', 'author': 'Rachel-Lee Nabors'}, {'title': 'Updates to Our Pricing Model', 'url': 'https://www.agentql.com/blog/2024-pricing-update', 'date': 'Nov 19, 2024', 'author': 'Rachel-Lee Nabors'}, {'title': 'Get data from any page: AgentQL’s REST API Endpoint—Launch week day 5', 'url': 'https://www.agentql.com/blog/data-rest-api', 'date': 'Nov 15, 2024', 'author': 'Rachel-Lee Nabors'}]}")
print(docs[0].metadata)
{'request_id': 'bdb9dbe7-8a7f-427f-bc16-839ccc02cae6', 'generated_query': None, 'screenshot': None}
Lazy Load
AgentQLLoader
currently only loads oneDocument
at a time. Therefore,load()
andlazy_load()
behave the same:
pages=[docfor docin loader.lazy_load()]
pages
[Document(metadata={'request_id': '06273abd-b2ef-4e15-b0ec-901cba7b4825', 'generated_query': None, 'screenshot': None}, page_content="{'posts': [{'title': 'Launch Week Recap—make the web AI-ready', 'url': 'https://www.agentql.com/blog/2024-launch-week-recap', 'date': 'Nov 18, 2024', 'author': 'Rachel-Lee Nabors'}, {'title': 'Accurate data extraction from PDFs and images with AgentQL', 'url': 'https://www.agentql.com/blog/accurate-data-extraction-pdfs-images', 'date': 'Feb 1, 2025', 'author': 'Rachel-Lee Nabors'}, {'title': 'Introducing Scheduled Scraping Workflows', 'url': 'https://www.agentql.com/blog/scheduling', 'date': 'Dec 2, 2024', 'author': 'Rachel-Lee Nabors'}, {'title': 'Updates to Our Pricing Model', 'url': 'https://www.agentql.com/blog/2024-pricing-update', 'date': 'Nov 19, 2024', 'author': 'Rachel-Lee Nabors'}, {'title': 'Get data from any page: AgentQL’s REST API Endpoint—Launch week day 5', 'url': 'https://www.agentql.com/blog/data-rest-api', 'date': 'Nov 15, 2024', 'author': 'Rachel-Lee Nabors'}]}")]
API reference
For more information on how to use this integration, please refer to thegit repo or thelangchain integration documentation
Related
- Document loaderconceptual guide
- Document loaderhow-to guides