HyperbrowserLoader

Hyperbrowser is a platform for running and scaling headless browsers. It lets you launch and manage browser sessions at scale and provides easy to use solutions for any webscraping needs, such as scraping a single page or crawling an entire site.

Key Features:

Instant Scalability - Spin up hundreds of browser sessions in seconds without infrastructure headaches
Simple Integration - Works seamlessly with popular tools like Puppeteer and Playwright
Powerful APIs - Easy to use APIs for scraping/crawling any site, and much more
Bypass Anti-Bot Measures - Built-in stealth mode, ad blocking, automatic CAPTCHA solving, and rotating proxies

This notebook provides a quick overview for getting started with Hyperbrowserdocument loader.

For more information about Hyperbrowser, please visit theHyperbrowser website or if you want to check out the docs, you can visit theHyperbrowser docs.

Overview

Integration details

Class	Package	Local	Serializable	JS support
HyperbrowserLoader	langchain-hyperbrowser	❌	❌	❌

Loader features

Source	Document Lazy Loading	Native Async Support
HyperbrowserLoader	✅	✅

Setup

To access Hyperbrowser document loader you'll need to install thelangchain-hyperbrowser integration package, and create a Hyperbrowser account and get an API key.

Credentials

Head toHyperbrowser to sign up and generate an API key. Once you've done this set the HYPERBROWSER_API_KEY environment variable:

Installation

Installlangchain-hyperbrowser.

%pip install-qU langchain-hyperbrowser

Initialization

Now we can instantiate our model object and load documents:

from langchain_hyperbrowserimport HyperbrowserLoader

loader= HyperbrowserLoader(
    urls="https://example.com",
    api_key="YOUR_API_KEY",
)

Load

docs= loader.load()
docs[0]

Document(metadata={'title': 'Example Domain', 'viewport': 'width=device-width, initial-scale=1', 'sourceURL': 'https://example.com'}, page_content='Example Domain\n\n# Example Domain\n\nThis domain is for use in illustrative examples in documents. You may use this\ndomain in literature without prior coordination or asking for permission.\n\n[More information...](https://www.iana.org/domains/example)')

print(docs[0].metadata)

Lazy Load

page=[]
for docin loader.lazy_load():
    page.append(doc)
iflen(page)>=10:
# do some paged operation, e.g.
# index.upsert(page)

        page=[]

Advanced Usage

You can specify the operation to be performed by the loader. The default operation isscrape. Forscrape, you can provide a single URL or a list of URLs to be scraped. Forcrawl, you can only provide a single URL. Thecrawl operation will crawl the provided page and subpages and return a document for each page.

loader= HyperbrowserLoader(
    urls="https://hyperbrowser.ai", api_key="YOUR_API_KEY", operation="crawl"
)

Optional params for the loader can also be provided in theparams argument. For more information on the supported params, visithttps://docs.hyperbrowser.ai/reference/sdks/python/scrape#start-scrape-job-and-wait orhttps://docs.hyperbrowser.ai/reference/sdks/python/crawl#start-crawl-job-and-wait.

loader= HyperbrowserLoader(
    urls="https://example.com",
    api_key="YOUR_API_KEY",
    operation="scrape",
    params={"scrape_options":{"include_tags":["h1","h2","p"]}},
)

API reference

Document loaderconceptual guide
Document loaderhow-to guides

Movatterモバイル変換

HyperbrowserLoader

Overview

Integration details

Loader features

Setup

Credentials

Installation

Initialization

Load

Lazy Load

Advanced Usage

API reference

Related

Movatterモバイル変換

Overview​

Integration details​

Loader features​

Setup​

Credentials​

Installation​

Initialization​

Load​

Lazy Load​

Advanced Usage​

API reference​

Related​

Overview

Integration details

Loader features

Setup

Credentials

Installation

Initialization

Load

Lazy Load

Advanced Usage

API reference

Related