Movatterモバイル変換


[0]ホーム

URL:


Skip to main content
OurBuilding Ambient Agents with LangGraph course is now available on LangChain Academy!
Open In ColabOpen on GitHub

scrapfly

ScrapFly is a web scraping API with headless browser capabilities, proxies, and anti-bot bypass. It allows for extracting web page data into accessible LLM markdown or text.

Installation

Install ScrapFly Python SDK and he required Langchain packages using pip:

pip install scrapfly-sdk langchain langchain-community

Usage

from langchain_community.document_loadersimport ScrapflyLoader

scrapfly_loader= ScrapflyLoader(
["https://web-scraping.dev/products"],
api_key="Your ScrapFly API key",# Get your API key from https://www.scrapfly.io/
continue_on_failure=True,# Ignore unprocessable web pages and log their exceptions
)

# Load documents from URLs as markdown
documents= scrapfly_loader.load()
print(documents)
API Reference:ScrapflyLoader

The ScrapflyLoader also allows passing ScrapeConfig object for customizing the scrape request. See the documentation for the full feature details and their API params:https://scrapfly.io/docs/scrape-api/getting-started

from langchain_community.document_loadersimport ScrapflyLoader

scrapfly_scrape_config={
"asp":True,# Bypass scraping blocking and antibot solutions, like Cloudflare
"render_js":True,# Enable JavaScript rendering with a cloud headless browser
"proxy_pool":"public_residential_pool",# Select a proxy pool (datacenter or residnetial)
"country":"us",# Select a proxy location
"auto_scroll":True,# Auto scroll the page
"js":"",# Execute custom JavaScript code by the headless browser
}

scrapfly_loader= ScrapflyLoader(
["https://web-scraping.dev/products"],
api_key="Your ScrapFly API key",# Get your API key from https://www.scrapfly.io/
continue_on_failure=True,# Ignore unprocessable web pages and log their exceptions
scrape_config=scrapfly_scrape_config,# Pass the scrape_config object
scrape_format="markdown",# The scrape result format, either `markdown`(default) or `text`
)

# Load documents from URLs as markdown
documents= scrapfly_loader.load()
print(documents)
API Reference:ScrapflyLoader

Related


[8]ページ先頭

©2009-2025 Movatter.jp