Movatterモバイル変換


[0]ホーム

URL:


Skip to main content
Official SDKs

Python SDK

Official Python SDK for ScrapeGraphAI

ScrapeGraph API Banner

PyPI Package

PyPI version

Python Support

Python Support

Installation

Install the package using pip:
pip install scrapegraph-py

Features

  • AI-Powered Extraction: Advanced web scraping using artificial intelligence
  • Flexible Clients: Both synchronous and asynchronous support
  • Type Safety: Structured output with Pydantic schemas
  • Production Ready: Detailed logging and automatic retries
  • Developer Friendly: Comprehensive error handling

Quick Start

Initialize the client with your API key:
from scrapegraph_pyimport Clientclient= Client(api_key="your-api-key-here")
You can also set theSGAI_API_KEY environment variable and initialize the client without parameters:client = Client()

Services

SmartScraper

Extract specific information from any webpage using AI:
response= client.smartscraper(    website_url="https://example.com",    user_prompt="Extract the main heading and description")

Parameters

ParameterTypeRequiredDescription
website_urlstringYesThe URL of the webpage that needs to be scraped.
user_promptstringYesA textual description of what you want to achieve.
output_schemaobjectNoThe Pydantic object that describes the structure and format of the response.
render_heavy_jsbooleanNoEnable enhanced JavaScript rendering for heavy JS websites (React, Vue, Angular, etc.). Default: False
Define a simple schema for basic data extraction:
from pydanticimport BaseModel, Fieldclass ArticleData(BaseModel):    title:str = Field(description="The article title")    author:str = Field(description="The author's name")    publish_date:str = Field(description="Article publication date")    content:str = Field(description="Main article content")    category:str = Field(description="Article category")response= client.smartscraper(    website_url="https://example.com/blog/article",    user_prompt="Extract the article information",    output_schema=ArticleData)print(f"Title:{response.title}")print(f"Author:{response.author}")print(f"Published:{response.publish_date}")
Define a complex schema for nested data structures:
from typingimport Listfrom pydanticimport BaseModel, Fieldclass Employee(BaseModel):    name:str = Field(description="Employee's full name")    position:str = Field(description="Job title")    department:str = Field(description="Department name")    email:str = Field(description="Email address")class Office(BaseModel):    location:str = Field(description="Office location/city")    address:str = Field(description="Full address")    phone:str = Field(description="Contact number")class CompanyData(BaseModel):    name:str = Field(description="Company name")    description:str = Field(description="Company description")    industry:str = Field(description="Industry sector")    founded_year:int = Field(description="Year company was founded")    employees: List[Employee]= Field(description="List of key employees")    offices: List[Office]= Field(description="Company office locations")    website:str = Field(description="Company website URL")# Extract comprehensive company informationresponse= client.smartscraper(    website_url="https://example.com/about",    user_prompt="Extract detailed company information including employees and offices",    output_schema=CompanyData)# Access nested dataprint(f"Company:{response.name}")print("\nKey Employees:")for employeein response.employees:    print(f"-{employee.name} ({employee.position})")print("\nOffice Locations:")for officein response.offices:    print(f"-{office.location}:{office.address}")
For modern web applications built with React, Vue, Angular, or other JavaScript frameworks:
from scrapegraph_pyimport Clientfrom pydanticimport BaseModel, Fieldclass ProductInfo(BaseModel):    name:str = Field(description="Product name")    price:str = Field(description="Product price")    description:str = Field(description="Product description")    availability:str = Field(description="Product availability status")client= Client(api_key="your-api-key")# Enable enhanced JavaScript rendering for a React-based e-commerce siteresponse= client.smartscraper(    website_url="https://example-react-store.com/products/123",    user_prompt="Extract product details including name, price, description, and availability",    output_schema=ProductInfo,    render_heavy_js=True  # Enable for React/Vue/Angular sites)print(f"Product:{response['result']['name']}")print(f"Price:{response['result']['price']}")print(f"Available:{response['result']['availability']}")
When to userender_heavy_js:
  • React, Vue, or Angular applications
  • Single Page Applications (SPAs)
  • Sites with heavy client-side rendering
  • Dynamic content loaded via JavaScript
  • Interactive elements that depend on JavaScript execution

SearchScraper

Search and extract information from multiple web sources using AI:
response= client.searchscraper(    user_prompt="What are the key features and pricing of ChatGPT Plus?")

Parameters

ParameterTypeRequiredDescription
user_promptstringYesA textual description of what you want to achieve.
num_resultsnumberNoNumber of websites to search (3-20). Default: 3.
extraction_modebooleanNoTrue = AI extraction mode (10 credits/page),False = markdown mode (2 credits/page). Default: True
output_schemaobjectNoThe Pydantic object that describes the structure and format of the response (AI extraction mode only)
Define a simple schema for structured search results:
from pydanticimport BaseModel, Fieldfrom typingimport Listclass ProductInfo(BaseModel):    name:str = Field(description="Product name")    description:str = Field(description="Product description")    price:str = Field(description="Product price")    features: List[str]= Field(description="List of key features")    availability:str = Field(description="Availability information")response= client.searchscraper(    user_prompt="Find information about iPhone 15 Pro",    output_schema=ProductInfo)print(f"Product:{response.name}")print(f"Price:{response.price}")print("\nFeatures:")for featurein response.features:    print(f"-{feature}")
Define a complex schema for comprehensive market research:
from typingimport Listfrom pydanticimport BaseModel, Fieldclass MarketPlayer(BaseModel):    name:str = Field(description="Company name")    market_share:str = Field(description="Market share percentage")    key_products: List[str]= Field(description="Main products in market")    strengths: List[str]= Field(description="Company's market strengths")class MarketTrend(BaseModel):    name:str = Field(description="Trend name")    description:str = Field(description="Trend description")    impact:str = Field(description="Expected market impact")    timeframe:str = Field(description="Trend timeframe")class MarketAnalysis(BaseModel):    market_size:str = Field(description="Total market size")    growth_rate:str = Field(description="Annual growth rate")    key_players: List[MarketPlayer]= Field(description="Major market players")    trends: List[MarketTrend]= Field(description="Market trends")    challenges: List[str]= Field(description="Industry challenges")    opportunities: List[str]= Field(description="Market opportunities")# Perform comprehensive market researchresponse= client.searchscraper(    user_prompt="Analyze the current AI chip market landscape",    output_schema=MarketAnalysis)# Access structured market dataprint(f"Market Size:{response.market_size}")print(f"Growth Rate:{response.growth_rate}")print("\nKey Players:")for playerin response.key_players:    print(f"\n{player.name}")    print(f"Market Share:{player.market_share}")    print("Key Products:")    for productin player.key_products:        print(f"-{product}")print("\nMarket Trends:")for trendin response.trends:    print(f"\n{trend.name}")    print(f"Impact:{trend.impact}")    print(f"Timeframe:{trend.timeframe}")
Use markdown mode for cost-effective content gathering:
from scrapegraph_pyimport Clientclient= Client(api_key="your-api-key")# Enable markdown mode for cost-effective content gatheringresponse= client.searchscraper(    user_prompt="Latest developments in artificial intelligence",    num_results=3,    extraction_mode=False  # Enable markdown mode (2 credits per page vs 10 credits))# Access the raw markdown contentmarkdown_content= response['markdown_content']reference_urls= response['reference_urls']print(f"Markdown content length:{len(markdown_content)} characters")print(f"Reference URLs:{len(reference_urls)}")# Process the markdown contentprint("Content preview:", markdown_content[:500]+ "...")# Save to file for analysiswith open('ai_research_content.md','w',encoding='utf-8')as f:    f.write(markdown_content)print("Content saved to ai_research_content.md")
Markdown Mode Benefits:
  • Cost-effective: Only 2 credits per page (vs 10 credits for AI extraction)
  • Full content: Get complete page content in markdown format
  • Faster: No AI processing overhead
  • Perfect for: Content analysis, bulk data collection, building datasets

Markdownify

Convert any webpage into clean, formatted markdown:
response= client.markdownify(    website_url="https://example.com")

Async Support

All endpoints support asynchronous operations:
import asynciofrom scrapegraph_pyimport AsyncClientasync def main():    async with AsyncClient()as client:        response= await client.smartscraper(            website_url="https://example.com",            user_prompt="Extract the main content"        )        print(response)asyncio.run(main())

Feedback

Help us improve by submitting feedback programmatically:
client.submit_feedback(    request_id="your-request-id",    rating=5,    feedback_text="Great results!")

Support

This project is licensed under the MIT License. See theLICENSE file for details.

Was this page helpful?

⌘I

[8]ページ先頭

©2009-2025 Movatter.jp