Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Pydoll is a library for automating chromium-based browsers without a WebDriver, offering realistic interactions.

License

NotificationsYou must be signed in to change notification settings

autoscrape-labs/pydoll

Repository files navigation

Pydoll Logo

Pydoll: The Evasion-First Web Automation Framework

A 100% Typed,async-native automation library built for modern bot evasion and high-performance scraping.

TestsRuff CIMyPy CIPython >= 3.10Ask DeepWiki

  📖Full Documentation •   🚀Getting Started •   ⚡Advanced Features •   🧠Deep Dives •   💖Support This Project

Pydoll is built on a simple philosophy: powerful automation shouldn't require you to fight the browser.

Forget brokenwebdrivers, compatibility issues, or being blocked bynavigator.webdriver=true. Pydoll connects directly to the Chrome DevTools Protocol (CDP), providing a natively asynchronous, robust, andfully typed architecture.

It's designed for modern scraping, combining anintuitive high-level API (for productivity) withdeep-level control over the network and browser behavior (for evasion), allowing you to bypass complex anti-bot defenses.

The Pydoll Philosophy

  • Stealth-by-Design: Pydoll is built for evasion. Ourhuman-like interactions simulate real user clicks, typing, and scrolling to pass behavioral analysis, while granularBrowser Preferences control lets you patch your browser fingerprint.
  • Async & Typed Architecture: Built from the ground up onasyncio and100% type-checked withmypy. This means top-tier I/O performance for concurrent tasks and a fantastic Developer Experience (DX) with autocompletion and error-checking in your IDE.
  • Total Network Control: Go beyond basic HTTP proxies. Pydoll gives you tools tointercept (to block ads/trackers) andmonitor traffic, plusdeep documentation on why SOCKS5 is essential to prevent DNS leaks.
  • Hybrid Automation (The Game-Changer): Use the UI automation to log in, then usetab.request to make blazing-fast API calls thatinherit the entire browser session.
  • Ergonomics Meets Power: Easy for the simple, powerful for the complex. Usetab.find() for 90% of cases andtab.query() for complexCSS/XPath selectors.

📦 Installation

pip install pydoll-python

That's it. Nowebdrivers. No external dependencies.

🚀 Getting Started in 60 Seconds

Thanks to itsasync architecture and context managers, Pydoll is clean and efficient.

importasynciofrompydoll.browserimportChromefrompydoll.constantsimportKeyasyncdefgoogle_search(query:str):   # Context manager handles browser start() and stop()   asyncwithChrome()asbrowser:       tab=awaitbrowser.start()       awaittab.go_to('https://www.google.com')       # Intuitive finding API: find by HTML attributes       search_box=awaittab.find(tag_name='textarea',name='q')               # "Human-like" interactions simulate typing       awaitsearch_box.insert_text(query)       awaitsearch_box.press_keyboard_key(Key.ENTER)       # Find by text and click (simulates mouse movement)       first_result=awaittab.find(           tag_name='h3',           text='autoscrape-labs/pydoll',# Supports partial text matching           timeout=10,        )       awaitfirst_result.click()       # Wait for an element to confirm navigation       awaittab.find(id='repository-container-header',timeout=10)       print(f"Page loaded:{awaittab.title}")asyncio.run(google_search('pydoll python'))

⚡ The Pydoll Feature Ecosystem

Pydoll is a complete toolkit for professional automation.

1. Hybrid Automation (UI + API): The Game-Changer

Tired of manually extracting and managing cookies to userequests orhttpx? Pydoll solves this.

Use the UI automation to pass a complex login (with CAPTCHAs, JS challenges, etc.) and then usetab.request to makeauthenticated API calls that automatically inherit all cookies, headers, and session state from the browser. It's the best of both worlds: the robustness of UI automation for auth, and the speed of direct API calls for data extraction.

# 1. Log in via the UI (handles CAPTCHAs, JS, etc.)awaittab.go_to('https://my-site.com/login')await (awaittab.find(id='username')).type_text('user')await (awaittab.find(id='password')).type_text('pass123')await (awaittab.find(id='login-btn')).click()# 2. Now, use the browser's session to hit the API!# This request automatically INHERITS the login cookiesresponse=awaittab.request.get('https://my-site.com/api/user/profile')user_data=response.json()print(f"Welcome,{user_data['name']}!")

📖 Read more about Hybrid Automation

2. Total Network Control: Monitor & Intercept Traffic

Take full control of the network stack. Pydoll allows you to not onlymonitor traffic for reverse-engineering APIs but also tointercept requests in real-time.

Use this to block ads, trackers, images, or CSS to dramatically speed up your scraping and save bandwidth, or even to modify headers and mock API responses for testing.

importasynciofrompydoll.browser.chromiumimportChromefrompydoll.protocol.fetch.eventsimportFetchEvent,RequestPausedEventfrompydoll.protocol.network.typesimportErrorReasonasyncdefblock_images():asyncwithChrome()asbrowser:tab=awaitbrowser.start()asyncdefblock_resource(event:RequestPausedEvent):request_id=event['params']['requestId']resource_type=event['params']['resourceType']url=event['params']['request']['url']# Block images and stylesheetsifresource_typein ['Image','Stylesheet']:awaittab.fail_request(request_id,ErrorReason.BLOCKED_BY_CLIENT)else:# Continue other requestsawaittab.continue_request(request_id)awaittab.enable_fetch_events()awaittab.on(FetchEvent.REQUEST_PAUSED,block_resource)awaittab.go_to('https://example.com')awaitasyncio.sleep(3)awaittab.disable_fetch_events()asyncio.run(block_images())

📖 Network Monitoring Docs |📖 Request Interception Docs

3. Deep Browser Control: The Fingerprint Evasion Manual

AUser-Agent isn't enough. Pydoll gives you granular control overBrowser Preferences, allowing you to modify hundreds of internal Chrome settings to build a robust and consistent fingerprint.

Our documentation doesn't just give you the tool; itexplains in detail howcanvas, WebGL, font, and timezone fingerprinting works, and how to use these preferences to defend your automation.

options=ChromiumOptions()# Create a realistic and clean browser profileoptions.browser_preferences= {   'profile': {       'default_content_setting_values': {           'notifications':2,      # Block notification popups           'geolocation':2,        # Block location requests        },       'password_manager_enabled':False# Disable "save password" prompt    },   'intl': {       'accept_languages':'en-US,en',# Make consistent with your proxy IP    },   'browser': {       'check_default_browser':False,  # Don't ask to be default browser    }}

📖 Full Guide to Browser Preferences

4. Built for Scale: Concurrency, Contexts & Remote Connections

Pydoll is built for scale. Itsasync architecture allows you to managemultiple tabs andbrowser contexts (isolated sessions) in parallel.

Furthermore, Pydoll excels in production architectures. You can run your browser in a Docker container andconnect to it remotely from your Python script, decoupling the controller from the worker. Our documentation includes guides onhow to set up your own remote server.

# Example: Scrape 2 sites in parallelasyncdefscrape_page(url,tab):   awaittab.go_to(url)   returnawaittab.titleasyncdefconcurrent_scraping():   asyncwithChrome()asbrowser:       tab_google=awaitbrowser.start()       tab_ddg=awaitbrowser.new_tab()# Create a new tab       # Execute both scraping tasks concurrently       tasks= [           scrape_page('https://google.com/',tab_google),           scrape_page('https://duckduckgo.com/',tab_ddg)        ]       results=awaitasyncio.gather(*tasks)       print(results)

📖 Multi-Tab Management Docs |📖 Remote Connection Docs

5. Robust Engineering: `@retry` Decorator & 100% Typed

Reliable Engineering: Pydoll isfully typed, providing a fantastic Developer Experience (DX) with full autocompletion in your IDE and error-checking before you even run your code.Read about our Type System.

Robust-by-Design: The@retry decorator turns fragile scripts into production-ready automations. It doesn't just "try again"; it lets you execute customrecovery logic (on_retry), like refreshing the page or rotating a proxy, before the next attempt.

frompydoll.decoratorsimportretryfrompydoll.exceptionsimportElementNotFound,NetworkError@retry(   max_retries=3,   exceptions=[ElementNotFound,NetworkError],# Only retry on these specific errors   on_retry=my_recovery_function,          # Run your custom recovery logic   exponential_backoff=True              # Wait 2s, 4s, 8s...)asyncdefscrape_product(self,url:str):   # ... your scraping logic ...

📖@retry Decorator Docs


🧠 More Than an API: A Knowledge Base

Pydoll is not a black box. We believe that to defeat anti-bot systems, you must understand them. Our documentation is one of the most comprehensive public resources on the subject, teaching you not just the "how," but the "why."

1. The Battle Against Fingerprinting (Strategic Guide)

Understand how bots are detected and how Pydoll is designed to win.

2. The Advanced Networking Manual (The Foundation)

Proxies are more than just--proxy-server.

3. Transparent Architecture (Software Engineering)

Software engineering you can trust.


🤝 Contributing

We would love your help to make Pydoll even better! Check out ourcontribution guidelines to get started.

💖 Support This Project

If you find Pydoll useful, considersponsoring my work on GitHub. Every contribution helps keep the project alive and drives new features!

📄 License

Pydoll is licensed under theMIT License.

 Pydoll — Web automation, taken seriously.

Sponsor this project

 

Contributors21


[8]ページ先頭

©2009-2025 Movatter.jp