Uh oh!
There was an error while loading.Please reload this page.
- Notifications
You must be signed in to change notification settings - Fork332
Pydoll is a library for automating chromium-based browsers without a WebDriver, offering realistic interactions.
License
autoscrape-labs/pydoll
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
A 100% Typed,async-native automation library built for modern bot evasion and high-performance scraping.
📖Full Documentation • 🚀Getting Started • ⚡Advanced Features • 🧠Deep Dives • 💖Support This Project
Pydoll is built on a simple philosophy: powerful automation shouldn't require you to fight the browser.
Forget brokenwebdrivers, compatibility issues, or being blocked bynavigator.webdriver=true. Pydoll connects directly to the Chrome DevTools Protocol (CDP), providing a natively asynchronous, robust, andfully typed architecture.
It's designed for modern scraping, combining anintuitive high-level API (for productivity) withdeep-level control over the network and browser behavior (for evasion), allowing you to bypass complex anti-bot defenses.
- Stealth-by-Design: Pydoll is built for evasion. Ourhuman-like interactions simulate real user clicks, typing, and scrolling to pass behavioral analysis, while granularBrowser Preferences control lets you patch your browser fingerprint.
- Async & Typed Architecture: Built from the ground up on
asyncioand100% type-checked withmypy. This means top-tier I/O performance for concurrent tasks and a fantastic Developer Experience (DX) with autocompletion and error-checking in your IDE. - Total Network Control: Go beyond basic HTTP proxies. Pydoll gives you tools tointercept (to block ads/trackers) andmonitor traffic, plusdeep documentation on why SOCKS5 is essential to prevent DNS leaks.
- Hybrid Automation (The Game-Changer): Use the UI automation to log in, then use
tab.requestto make blazing-fast API calls thatinherit the entire browser session. - Ergonomics Meets Power: Easy for the simple, powerful for the complex. Use
tab.find()for 90% of cases andtab.query()for complexCSS/XPath selectors.
pip install pydoll-python
That's it. Nowebdrivers. No external dependencies.
Thanks to itsasync architecture and context managers, Pydoll is clean and efficient.
importasynciofrompydoll.browserimportChromefrompydoll.constantsimportKeyasyncdefgoogle_search(query:str): # Context manager handles browser start() and stop() asyncwithChrome()asbrowser: tab=awaitbrowser.start() awaittab.go_to('https://www.google.com') # Intuitive finding API: find by HTML attributes search_box=awaittab.find(tag_name='textarea',name='q') # "Human-like" interactions simulate typing awaitsearch_box.insert_text(query) awaitsearch_box.press_keyboard_key(Key.ENTER) # Find by text and click (simulates mouse movement) first_result=awaittab.find( tag_name='h3', text='autoscrape-labs/pydoll',# Supports partial text matching timeout=10, ) awaitfirst_result.click() # Wait for an element to confirm navigation awaittab.find(id='repository-container-header',timeout=10) print(f"Page loaded:{awaittab.title}")asyncio.run(google_search('pydoll python'))
Pydoll is a complete toolkit for professional automation.
1. Hybrid Automation (UI + API): The Game-Changer
Tired of manually extracting and managing cookies to userequests orhttpx? Pydoll solves this.
Use the UI automation to pass a complex login (with CAPTCHAs, JS challenges, etc.) and then usetab.request to makeauthenticated API calls that automatically inherit all cookies, headers, and session state from the browser. It's the best of both worlds: the robustness of UI automation for auth, and the speed of direct API calls for data extraction.
# 1. Log in via the UI (handles CAPTCHAs, JS, etc.)awaittab.go_to('https://my-site.com/login')await (awaittab.find(id='username')).type_text('user')await (awaittab.find(id='password')).type_text('pass123')await (awaittab.find(id='login-btn')).click()# 2. Now, use the browser's session to hit the API!# This request automatically INHERITS the login cookiesresponse=awaittab.request.get('https://my-site.com/api/user/profile')user_data=response.json()print(f"Welcome,{user_data['name']}!")
2. Total Network Control: Monitor & Intercept Traffic
Take full control of the network stack. Pydoll allows you to not onlymonitor traffic for reverse-engineering APIs but also tointercept requests in real-time.
Use this to block ads, trackers, images, or CSS to dramatically speed up your scraping and save bandwidth, or even to modify headers and mock API responses for testing.
importasynciofrompydoll.browser.chromiumimportChromefrompydoll.protocol.fetch.eventsimportFetchEvent,RequestPausedEventfrompydoll.protocol.network.typesimportErrorReasonasyncdefblock_images():asyncwithChrome()asbrowser:tab=awaitbrowser.start()asyncdefblock_resource(event:RequestPausedEvent):request_id=event['params']['requestId']resource_type=event['params']['resourceType']url=event['params']['request']['url']# Block images and stylesheetsifresource_typein ['Image','Stylesheet']:awaittab.fail_request(request_id,ErrorReason.BLOCKED_BY_CLIENT)else:# Continue other requestsawaittab.continue_request(request_id)awaittab.enable_fetch_events()awaittab.on(FetchEvent.REQUEST_PAUSED,block_resource)awaittab.go_to('https://example.com')awaitasyncio.sleep(3)awaittab.disable_fetch_events()asyncio.run(block_images())
3. Deep Browser Control: The Fingerprint Evasion Manual
AUser-Agent isn't enough. Pydoll gives you granular control overBrowser Preferences, allowing you to modify hundreds of internal Chrome settings to build a robust and consistent fingerprint.
Our documentation doesn't just give you the tool; itexplains in detail howcanvas, WebGL, font, and timezone fingerprinting works, and how to use these preferences to defend your automation.
options=ChromiumOptions()# Create a realistic and clean browser profileoptions.browser_preferences= { 'profile': { 'default_content_setting_values': { 'notifications':2, # Block notification popups 'geolocation':2, # Block location requests }, 'password_manager_enabled':False# Disable "save password" prompt }, 'intl': { 'accept_languages':'en-US,en',# Make consistent with your proxy IP }, 'browser': { 'check_default_browser':False, # Don't ask to be default browser }}
4. Built for Scale: Concurrency, Contexts & Remote Connections
Pydoll is built for scale. Itsasync architecture allows you to managemultiple tabs andbrowser contexts (isolated sessions) in parallel.
Furthermore, Pydoll excels in production architectures. You can run your browser in a Docker container andconnect to it remotely from your Python script, decoupling the controller from the worker. Our documentation includes guides onhow to set up your own remote server.
# Example: Scrape 2 sites in parallelasyncdefscrape_page(url,tab): awaittab.go_to(url) returnawaittab.titleasyncdefconcurrent_scraping(): asyncwithChrome()asbrowser: tab_google=awaitbrowser.start() tab_ddg=awaitbrowser.new_tab()# Create a new tab # Execute both scraping tasks concurrently tasks= [ scrape_page('https://google.com/',tab_google), scrape_page('https://duckduckgo.com/',tab_ddg) ] results=awaitasyncio.gather(*tasks) print(results)
5. Robust Engineering: `@retry` Decorator & 100% Typed
Reliable Engineering: Pydoll isfully typed, providing a fantastic Developer Experience (DX) with full autocompletion in your IDE and error-checking before you even run your code.Read about our Type System.
Robust-by-Design: The@retry decorator turns fragile scripts into production-ready automations. It doesn't just "try again"; it lets you execute customrecovery logic (on_retry), like refreshing the page or rotating a proxy, before the next attempt.
frompydoll.decoratorsimportretryfrompydoll.exceptionsimportElementNotFound,NetworkError@retry( max_retries=3, exceptions=[ElementNotFound,NetworkError],# Only retry on these specific errors on_retry=my_recovery_function, # Run your custom recovery logic exponential_backoff=True # Wait 2s, 4s, 8s...)asyncdefscrape_product(self,url:str): # ... your scraping logic ...
Pydoll is not a black box. We believe that to defeat anti-bot systems, you must understand them. Our documentation is one of the most comprehensive public resources on the subject, teaching you not just the "how," but the "why."
Understand how bots are detected and how Pydoll is designed to win.
- Evasion Techniques Guide: Our unified 3-layer evasion strategy.
- Network Fingerprinting: How your IP, TTL, and TLS (JA3) headers give you away.
- Browser Fingerprinting: How
canvas, WebGL, and fonts create your unique ID. - Behavioral Fingerprinting: Why mouse/keyboard telemetry is the new front line of detection.
Proxies are more than just--proxy-server.
- HTTP vs. SOCKS5: Why SOCKS5 is superior (it solves DNS leaks).
- Proxy Detection: How sites know you're using a proxy (WebRTC Leaks).
- Build Your Own Proxy: Yes, we even teach you how to build your own SOCKS5 proxy server in Python.
Software engineering you can trust.
- Domain-Driven Design (OOP): The clean architecture behind
Browser,Tab, andWebElement. - The FindElements Mixin: The magic behind the intuitive
find()API. - The Connection Layer: How Pydoll manages
asyncioand the CDP.
We would love your help to make Pydoll even better! Check out ourcontribution guidelines to get started.
If you find Pydoll useful, considersponsoring my work on GitHub. Every contribution helps keep the project alive and drives new features!
Pydoll is licensed under theMIT License.
Pydoll — Web automation, taken seriously.
About
Pydoll is a library for automating chromium-based browsers without a WebDriver, offering realistic interactions.
Topics
Resources
License
Contributing
Uh oh!
There was an error while loading.Please reload this page.
Stars
Watchers
Forks
Sponsor this project
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
