- Notifications
You must be signed in to change notification settings - Fork9
Description
Hi, I have created a package namedbotasaurus-proxy-authentication
, which enables SSL support for proxies requiring authentication.
For instance, when using an authenticated proxy with a tool like seleniumwire to scrape a Cloudflare-protected website such as G2.com, a non-SSL connection typically results in being blocked.
To illustrate, run this code:
First, install the required packages:
python -m pip install selenium_wire chromedriver_autoinstaller
Then, execute this Python script:
fromseleniumwireimportwebdriverfromchromedriver_autoinstallerimportinstall# Define the proxyproxy_options= {'proxy': {'http':'http://username:password@proxy-provider-domain:port',# Replace with your proxy'https':'http://username:password@proxy-provider-domain:port',# Replace with your proxy }}# Install and set up the driverdriver_path=install()driver=webdriver.Chrome(driver_path,seleniumwire_options=proxy_options)# Navigate to the desired URLlink='https://www.g2.com/products/github/reviews'driver.get("https://www.google.com/")driver.execute_script(f'window.location.href = "{link}"')# Wait for user inputinput("Press Enter to exit...")# Clean updriver.quit()
You'll likely be blocked by Cloudflare:
First, install the required packages:
python -m pip install botasaurus-proxy-authentication
However, usingbotasaurus_proxy_authentication
with proxies circumvents this problem. Notice the difference by running the following code:
fromseleniumimportwebdriverfromselenium.webdriver.chrome.optionsimportOptionsfromchromedriver_autoinstallerimportinstallfrombotasaurus_proxy_authenticationimportadd_proxy_options# Define the proxy settingsproxy='http://username:password@proxy-provider-domain:port'# Replace with your proxy# Set Chrome optionschrome_options=Options()add_proxy_options(chrome_options,proxy)# Install and set up the driverdriver_path=install()driver=webdriver.Chrome(driver_path,options=chrome_options)# Navigate to the desired URLlink='https://www.g2.com/products/github/reviews'driver.get("https://www.google.com/")driver.execute_script(f'window.location.href = "{link}"')# Wait for user inputinput("Press Enter to exit...")# Clean updriver.quit()
I suggest usingbotasaurus_proxy_authentication
for its SSL support for authenticated proxies, improving the success rate of scraping Cloudflare-protected websites and thus increasing revenue for Oxylabs.
Also, Thanks Oxylabs for your Great Work in Proxy.
Good Luck to the Team.