- Notifications
You must be signed in to change notification settings - Fork14
ScrapeGraphAI/scrapegraph-mcp
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
A production-readyModel Context Protocol (MCP) server that provides seamless integration with theScrapeGraph AI API. This server enables language models to leverage advanced AI-powered web scraping capabilities with enterprise-grade reliability.
- Key Features
- Quick Start
- Available Tools
- Setup Instructions
- Local Usage
- Google ADK Integration
- Example Use Cases
- Error Handling
- Common Issues
- Development
- Contributing
- Documentation
- Technology Stack
- License
- 8 Powerful Tools: From simple markdown conversion to complex multi-page crawling and agentic workflows
- AI-Powered Extraction: Intelligently extract structured data using natural language prompts
- Multi-Page Crawling: SmartCrawler supports asynchronous crawling with configurable depth and page limits
- Infinite Scroll Support: Handle dynamic content loading with configurable scroll counts
- JavaScript Rendering: Full support for JavaScript-heavy websites
- Flexible Output Formats: Get results as markdown, structured JSON, or custom schemas
- Easy Integration: Works seamlessly with Claude Desktop, Cursor, and any MCP-compatible client
- Enterprise-Ready: Robust error handling, timeout management, and production-tested reliability
- Simple Deployment: One-command installation via Smithery or manual setup
- Comprehensive Documentation: Detailed developer docs in
.agent/folder
Sign up and get your API key from theScrapeGraph Dashboard
npx -y @smithery/cli install @ScrapeGraphAI/scrapegraph-mcp --client claude
Ask Claude or Cursor:
- "Converthttps://scrapegraphai.com to markdown"
- "Extract all product prices from this e-commerce page"
- "Research the latest AI developments and summarize findings"
That's it! The server is now available to your AI assistant.
The server provides8 enterprise-ready tools for AI-powered web scraping:
Transform any webpage into clean, structured markdown format.
markdownify(website_url:str)
- Credits: 2 per request
- Use case: Quick webpage content extraction in markdown
Leverage AI to extract structured data from any webpage with support for infinite scrolling.
smartscraper(user_prompt:str,website_url:str,number_of_scrolls:int=None,markdown_only:bool=None)
- Credits: 10+ (base) + variable based on scrolling
- Use case: AI-powered data extraction with custom prompts
Execute AI-powered web searches with structured, actionable results.
searchscraper(user_prompt:str,num_results:int=None,number_of_scrolls:int=None)
- Credits: Variable (3-20 websites × 10 credits)
- Use case: Multi-source research and data aggregation
Basic scraping endpoint to fetch page content with optional heavy JavaScript rendering.
scrape(website_url:str,render_heavy_js:bool=None)
- Use case: Simple page content fetching with JS rendering support
Extract sitemap URLs and structure for any website.
sitemap(website_url:str)
- Use case: Website structure analysis and URL discovery
Initiate intelligent multi-page web crawling (asynchronous operation).
smartcrawler_initiate(url:str,prompt:str=None,extraction_mode:str="ai",depth:int=None,max_pages:int=None,same_domain_only:bool=None)
- AI Extraction Mode: 10 credits per page - extracts structured data
- Markdown Mode: 2 credits per page - converts to markdown
- Returns:
request_idfor polling - Use case: Large-scale website crawling and data extraction
Retrieve results from asynchronous crawling operations.
smartcrawler_fetch_results(request_id:str)
- Returns: Status and results when crawling is complete
- Use case: Poll for crawl completion and retrieve results
Run advanced agentic scraping workflows with customizable steps and structured output schemas.
agentic_scrapper(url:str,user_prompt:str=None,output_schema:dict=None,steps:list=None,ai_extraction:bool=None,persistent_session:bool=None,timeout_seconds:float=None)
- Use case: Complex multi-step workflows with custom schemas and persistent sessions
To utilize this server, you'll need a ScrapeGraph API key. Follow these steps to obtain one:
- Navigate to theScrapeGraph Dashboard
- Create an account and generate your API key
For automated installation of the ScrapeGraph API Integration Server usingSmithery:
npx -y @smithery/cli install @ScrapeGraphAI/scrapegraph-mcp --client claude
Update your Claude Desktop configuration file with the following settings (located on the top rigth of the Cursor page):
(remember to add your API key inside the config)
{"mcpServers": {"@ScrapeGraphAI-scrapegraph-mcp": {"command":"npx","args": ["-y","@smithery/cli@latest","run","@ScrapeGraphAI/scrapegraph-mcp","--config","\"{\\\"scrapegraphApiKey\\\":\\\"YOUR-SGAI-API-KEY\\\"}\"" ] } }}The configuration file is located at:
- Windows:
%APPDATA%/Claude/claude_desktop_config.json - macOS:
~/Library/Application\ Support/Claude/claude_desktop_config.json
Add the ScrapeGraphAI MCP server on the settings:
To run the MCP server locally for development or testing, follow these steps:
- Python 3.13 or higher
- pip or uv package manager
- ScrapeGraph API key
- Clone the repository (if you haven't already):
git clone https://github.com/ScrapeGraphAI/scrapegraph-mcpcd scrapegraph-mcp- Install the package:
# Using pippip install -e.# Or using uv (faster)uv pip install -e.
- Set your API key:
# macOS/Linuxexport SGAI_API_KEY=your-api-key-here# Windows (PowerShell)$env:SGAI_API_KEY="your-api-key-here"# Windows (CMD)set SGAI_API_KEY=your-api-key-here
You can run the server directly:
# Using the installed commandscrapegraph-mcp# Or using Python modulepython -m scrapegraph_mcp.server
The server will start and communicate via stdio (standard input/output), which is the standard MCP transport method.
Test your local server using the MCP Inspector tool:
npx @modelcontextprotocol/inspector python -m scrapegraph_mcp.server
This provides a web interface to test all available tools interactively.
To use your locally running server with Claude Desktop, update your configuration file:
macOS/Linux (~/Library/Application Support/Claude/claude_desktop_config.json):
{"mcpServers": {"scrapegraph-mcp-local": {"command":"python","args": ["-m","scrapegraph_mcp.server" ],"env": {"SGAI_API_KEY":"your-api-key-here" } } }}Windows (%APPDATA%\Claude\claude_desktop_config.json):
{"mcpServers": {"scrapegraph-mcp-local": {"command":"python","args": ["-m","scrapegraph_mcp.server" ],"env": {"SGAI_API_KEY":"your-api-key-here" } } }}Note: Make sure Python is in your PATH. You can verify by runningpython --version in your terminal.
In Cursor's MCP settings, add a new server with:
- Command:
python - Args:
["-m", "scrapegraph_mcp.server"] - Environment Variables:
{"SGAI_API_KEY": "your-api-key-here"}
Server not starting:
- Verify Python is installed:
python --version - Check that the package is installed:
pip list | grep scrapegraph-mcp - Ensure API key is set:
echo $SGAI_API_KEY(macOS/Linux) orecho %SGAI_API_KEY%(Windows)
Tools not appearing:
- Check Claude Desktop logs:
- macOS:
~/Library/Logs/Claude/ - Windows:
%APPDATA%\Claude\Logs\
- macOS:
- Verify the server starts without errors when run directly
- Check that the configuration JSON is valid
Import errors:
- Reinstall the package:
pip install -e . --force-reinstall - Verify dependencies:
pip install -r requirements.txt(if available)
The ScrapeGraph MCP server can be integrated withGoogle ADK (Agent Development Kit) to create AI agents with web scraping capabilities.
- Python 3.13 or higher
- Google ADK installed
- ScrapeGraph API key
- Install Google ADK (if not already installed):
pip install google-adk
- Set your API key:
export SGAI_API_KEY=your-api-key-hereCreate an agent file (e.g.,agent.py) with the following configuration:
importosfromgoogle.adk.agentsimportLlmAgentfromgoogle.adk.tools.mcp_tool.mcp_toolsetimportMCPToolsetfromgoogle.adk.tools.mcp_tool.mcp_session_managerimportStdioConnectionParamsfrommcpimportStdioServerParameters# Path to the scrapegraph-mcp server directorySCRAPEGRAPH_MCP_PATH="/path/to/scrapegraph-mcp"# Path to the server.py fileSERVER_SCRIPT_PATH=os.path.join(SCRAPEGRAPH_MCP_PATH,"src","scrapegraph_mcp","server.py")root_agent=LlmAgent(model='gemini-2.0-flash',name='scrapegraph_assistant_agent',instruction='Help the user with web scraping and data extraction using ScrapeGraph AI. ''You can convert webpages to markdown, extract structured data using AI, ''perform web searches, crawl multiple pages, and automate complex scraping workflows.',tools=[MCPToolset(connection_params=StdioConnectionParams(server_params=StdioServerParameters(command='python3',args=[SERVER_SCRIPT_PATH, ],env={'SGAI_API_KEY':os.getenv('SGAI_API_KEY'), }, ),timeout=300.0,) ),# Optional: Filter which tools from the MCP server are exposed# tool_filter=['markdownify', 'smartscraper', 'searchscraper'] ) ],)
Timeout Settings:
- Default timeout is 5 seconds, which may be too short for web scraping operations
- Recommended: Set `timeout=300.0
- Adjust based on your use case (crawling operations may need even longer timeouts)
Tool Filtering:
- By default, all 8 tools are exposed to the agent
- Use
tool_filterto limit which tools are available:tool_filter=['markdownify','smartscraper','searchscraper']
API Key Configuration:
- Set via environment variable:
export SGAI_API_KEY=your-key - Or pass directly in
envdict:'SGAI_API_KEY': 'your-key-here' - Environment variable approach is recommended for security
Once configured, your agent can use natural language to interact with web scraping tools:
# The agent can now handle queries like:# - "Convert https://example.com to markdown"# - "Extract all product prices from this e-commerce page"# - "Search for recent AI research papers and summarize them"# - "Crawl this documentation site and extract all API endpoints"
For more information about Google ADK, visit theofficial documentation.
The server enables sophisticated queries across various scraping scenarios:
- Markdownify: "Convert the ScrapeGraph documentation page to markdown"
- SmartScraper: "Extract all product names, prices, and ratings from this e-commerce page"
- SmartScraper with scrolling: "Scrape this infinite scroll page with 5 scrolls and extract all items"
- Basic Scrape: "Fetch the HTML content of this JavaScript-heavy page with full rendering"
- SearchScraper: "Research and summarize recent developments in AI-powered web scraping"
- SearchScraper: "Search for the top 5 articles about machine learning frameworks and extract key insights"
- SearchScraper: "Find recent news about GPT-4 and provide a structured summary"
- Sitemap: "Extract the complete sitemap structure from the ScrapeGraph website"
- Sitemap: "Discover all URLs on this blog site"
- SmartCrawler (AI mode): "Crawl the entire documentation site and extract all API endpoints with descriptions"
- SmartCrawler (Markdown mode): "Convert all pages in the blog to markdown up to 2 levels deep"
- SmartCrawler: "Extract all product information from an e-commerce site, maximum 100 pages, same domain only"
- Agentic Scraper: "Navigate through a multi-step authentication form and extract user dashboard data"
- Agentic Scraper with schema: "Follow pagination links and compile a dataset with schema: {title, author, date, content}"
- Agentic Scraper: "Execute a complex workflow: login, navigate to reports, download data, and extract summary statistics"
The server implements robust error handling with detailed, actionable error messages for:
- API authentication issues
- Malformed URL structures
- Network connectivity failures
- Rate limiting and quota management
When running on Windows systems, you may need to use the following command to connect to the MCP server:
C:\Windows\System32\cmd.exe /c npx -y @smithery/cli@latest run @ScrapeGraphAI/scrapegraph-mcp --config"{\"scrapegraphApiKey\":\"YOUR-SGAI-API-KEY\"}"
This ensures proper execution in the Windows environment.
"ScrapeGraph client not initialized"
- Cause: Missing API key
- Solution: Set
SGAI_API_KEYenvironment variable or provide via--config
"Error 401: Unauthorized"
- Cause: Invalid API key
- Solution: Verify your API key at theScrapeGraph Dashboard
"Error 402: Payment Required"
- Cause: Insufficient credits
- Solution: Add credits to your ScrapeGraph account
SmartCrawler not returning results
- Cause: Still processing (asynchronous operation)
- Solution: Keep polling
smartcrawler_fetch_results()until status is "completed"
Tools not appearing in Claude Desktop
- Cause: Server not starting or configuration error
- Solution: Check Claude logs at
~/Library/Logs/Claude/(macOS) or%APPDATA%\Claude\Logs\(Windows)
For detailed troubleshooting, see the.agent documentation.
- Python 3.13 or higher
- pip or uv package manager
- ScrapeGraph API key
# Clone the repositorygit clone https://github.com/ScrapeGraphAI/scrapegraph-mcpcd scrapegraph-mcp# Install dependenciespip install -e".[dev]"# Set your API keyexport SGAI_API_KEY=your-api-key# Run the serverscrapegraph-mcp# orpython -m scrapegraph_mcp.server
Test your server locally using the MCP Inspector tool:
npx @modelcontextprotocol/inspector scrapegraph-mcp
This provides a web interface to test all available tools.
Linting:
ruff check src/
Type Checking:
mypy src/
Format Checking:
ruff format --check src/
scrapegraph-mcp/├── src/│ └── scrapegraph_mcp/│ ├── __init__.py # Package initialization│ └── server.py # Main MCP server (all code in one file)├── .agent/ # Developer documentation│ ├── README.md # Documentation index│ └── system/ # System architecture docs├── assets/ # Images and badges├── pyproject.toml # Project metadata & dependencies├── smithery.yaml # Smithery deployment config└── README.md # This fileWe welcome contributions! Here's how you can help:
- Add method to
ScapeGraphClientclass inserver.py:
defnew_tool(self,param:str)->Dict[str,Any]:"""Tool description."""url=f"{self.BASE_URL}/new-endpoint"data= {"param":param}response=self.client.post(url,headers=self.headers,json=data)ifresponse.status_code!=200:raiseException(f"Error{response.status_code}:{response.text}")returnresponse.json()
- Add MCP tool decorator:
@mcp.tool()defnew_tool(param:str)->Dict[str,Any]:""" Tool description for AI assistants. Args: param: Parameter description Returns: Dictionary containing results """ifscrapegraph_clientisNone:return {"error":"ScrapeGraph client not initialized. Please provide an API key."}try:returnscrapegraph_client.new_tool(param)exceptExceptionase:return {"error":str(e)}
- Test with MCP Inspector:
npx @modelcontextprotocol/inspector scrapegraph-mcp
Update documentation:
- Add tool to this README
- Update.agent documentation
Submit a pull request
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Make your changes
- Run linting and type checking
- Test with MCP Inspector and Claude Desktop
- Update documentation
- Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
- Line length: 100 characters
- Type hints: Required for all functions
- Docstrings: Google-style docstrings
- Error handling: Return error dicts, don't raise exceptions in tools
- Python version: Target 3.13+
For detailed development guidelines, see the.agent documentation.
For comprehensive developer documentation, see:
- .agent/README.md - Complete developer documentation index
- .agent/system/project_architecture.md - System architecture and design
- .agent/system/mcp_protocol.md - MCP protocol integration details
- Python 3.13+ - Modern Python with type hints
- FastMCP - Lightweight MCP server framework
- httpx 0.24.0+ - Modern async HTTP client
- Ruff - Fast Python linter and formatter
- mypy - Static type checker
- Hatchling - Modern build backend
- Smithery - Automated MCP server deployment
- Docker - Container support with Alpine Linux
- stdio transport - Standard MCP communication
- ScrapeGraph AI API - Enterprise web scraping service
- Base URL:
https://api.scrapegraphai.com/v1 - Authentication: API key-based
This project is distributed under the MIT License. For detailed terms and conditions, please refer to the LICENSE file.
Special thanks totomekkorbak for his implementation ofoura-mcp-server, which served as starting point for this repo.
- ScrapeGraph AI Homepage
- ScrapeGraph Dashboard - Get your API key
- ScrapeGraph API Documentation
- GitHub Repository
- Model Context Protocol - Official MCP specification
- FastMCP Framework - Framework used by this server
- MCP Inspector - Testing tool
- Smithery - MCP server distribution
- mcp-name: io.github.ScrapeGraphAI/scrapegraph-mcp
- Claude Desktop - Desktop app with MCP support
- Cursor - AI-powered code editor
- GitHub Issues - Report bugs or request features
- Developer Documentation - Comprehensive dev docs
Made with ❤️ byScrapeGraphAI Team
About
ScapeGraph MCP Server
Topics
Resources
License
Uh oh!
There was an error while loading.Please reload this page.
