- Notifications
You must be signed in to change notification settings - Fork21
Multi-threaded web scraper to download all the tutorials fromwww.learncpp.com and convert them to PDF files concurrently.
License
amalrajan/learncpp-download
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
Multi-threaded web scraper to download all the tutorials fromwww.learncpp.com and convert them to PDF files concurrently.
Please support here:https://www.learncpp.com/about/
Get the image
docker pull amalrajan/learncpp-download:latest
And run the container
docker run --rm --name=learncpp-download --mount type=bind,destination=/app/learncpp,source=/home/amalr/temp/downloads amalrajan/learncpp-download
Replace/home/amalr/temp/downloads with a local path on your system where you'd want the files to get downloaded.
You need Python 3.10 andwkhtmltopdf installed on your system.
Clone the repository
git clone https://github.com/amalrajan/learncpp-download.git
Install Python dependencies
cd learncpp-downloadpip install -r requirements.txtRun the script
scrapy crawl learncpp
You'll find the downloaded files insidelearncpp directory under the repository root directory.
Rate Limit Errors:
- Modify
settings.py. - Increase
DOWNLOAD_DELAY(default: 0) to 0.2.
High CPU Usage:
- Adjust
max_workersinlearncpp.py. - Decrease from default 192 to reduce CPU load.
self.executor=ThreadPoolExecutor(max_workers=192)# Limit to 192 concurrent PDF conversions
Further Issues:
- Report athttps://github.com/amalrajan/learncpp-download/issues. Attach console logs.
About
Multi-threaded web scraper to download all the tutorials fromwww.learncpp.com and convert them to PDF files concurrently.
Topics
Resources
License
Uh oh!
There was an error while loading.Please reload this page.
Stars
Watchers
Forks
Releases
Packages0
Uh oh!
There was an error while loading.Please reload this page.
Contributors7
Uh oh!
There was an error while loading.Please reload this page.