- Notifications
You must be signed in to change notification settings - Fork176
Download your daily free Packt Publishing eBookhttps://www.packtpub.com/packt/offers/free-learning
License
niqdev/packtpub-crawler
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
Download FREE eBook every day fromwww.packtpub.com
This crawler automates the following step:
- access to private account
- claim the daily free eBook
- parse title, description and useful information
- download favorite format.pdf .epub .mobi
- download source code and book cover
- upload files to Google Drive or via scp
- store data on Firebase
- notify via email
- schedule daily job on Heroku or with Docker
# upload pdf to drive, store data and notify via emailpython script/spider.py -c config/prod.cfg -u drive -s firebase -n# download all formatpython script/spider.py --config config/prod.cfg --all# download only one format: pdf|epub|mobipython script/spider.py --config config/prod.cfg --type pdf# download also additional material: source code (if exists) and book coverpython script/spider.py --config config/prod.cfg -t pdf --extras# equivalent (default is pdf)python script/spider.py -c config/prod.cfg -e# download and then upload to Drive (given the download url anyone can download it)python script/spider.py -c config/prod.cfg -t epub --upload drivepython script/spider.py --config config/prod.cfg --all --extras --upload drive
Before you start you should
- Verify that your currently installed version of Python is2.x with
python --version - Clone the repository
git clone https://github.com/niqdev/packtpub-crawler.git - Install all the dependencies (you might needsudo privilege)
pip install -r requirements.txt - Create aconfig file
cp config/prod_example.cfg config/prod.cfg - Change your Packtpub credentials in the config file
[credential]credential.email=PACKTPUB_EMAILcredential.password=PACKTPUB_PASSWORDNow you should be able to claim and download your first eBook
python script/spider.py --config config/prod.cfgFrom the documentation, Drive API requires OAuth2.0 for authentication, so to upload files you should:
- Go toGoogle APIs Console and create a newDrive project namedPacktpubDrive
- OnAPI manager > Overview menu
- Enable Google Drive API
- OnAPI manager > Credentials menu
- InOAuth consent screen tab setPacktpubDrive as the product name shown to users
- InCredentials tab create credentials of typeOAuth client ID and choose Application typeOther namedPacktpubDriveCredentials
- ClickDownload JSON and save the file
config/client_secrets.json - Change your Drive credentials in the config file
[drive]...drive.client_secrets=config/client_secrets.jsondrive.gmail=GOOGLE_DRIVE@gmail.comNow you should be able to upload your eBook to Drive
python script/spider.py --config config/prod.cfg --upload driveOnly the first time you will be prompted to login in a browser which has javascript enabled (no text-based browser) to generateconfig/auth_token.json.You should also copy and paste in the config theFOLDER_ID, otherwise every time a new folder with the same name will be created.
[drive]...drive.default_folder=packtpubdrive.upload_folder=FOLDER_IDDocumentation:OAuth,Quickstart,example andpermissions
To upload your eBook viascp on a remote server update the configs
[scp]scp.host=SCP_HOSTscp.user=SCP_USERscp.password=SCP_PASSWORDscp.path=SCP_UPLOAD_PATHNow you should be able to upload your eBook
python script/spider.py --config config/prod.cfg --upload scpNote:
- the destination folder
scp.pathon the remote server must exists in advance - the option
--upload scpis incompatible with--storeand--notify
Create a new Firebaseproject, copy the database secret from your settings
https://console.firebase.google.com/project/PROJECT_NAME/settings/databaseand update the configs
[firebase]firebase.database_secret=DATABASE_SECRETfirebase.url=https://PROJECT_NAME.firebaseio.comNow you should be able to store your eBook details on Firebase
python script/spider.py --config config/prod.cfg --upload drive --store firebaseTosend a notification via email using Gmail you should:
- Allow"less secure apps" and"DisplayUnlockCaptcha" on your account
- Troubleshoot sign-in problems andexamples
- Change your Gmail credentials in the config file
[notify]...notify.username=EMAIL_USERNAME@gmail.comnotify.password=EMAIL_PASSWORDnotify.from=FROM_EMAIL@gmail.comnotify.to=TO_EMAIL_1@gmail.com,TO_EMAIL_2@gmail.comNow you should be able to notify your accounts
python script/spider.py --config config/prod.cfg --upload drive --notifyCreate a new branch
git checkout -b heroku-schedulerUpdate the.gitignore and commit your changes
# removeconfig/prod.cfgconfig/client_secrets.jsonconfig/auth_token.json# adddev/config/dev.cfgconfig/prod_example.cfg
Create, config and deploy the scheduler
heroku login# create a new appheroku create APP_NAME# or if you already have an existing appheroku git:remote -a APP_NAME# deploy your appgit push -u heroku heroku-scheduler:masterheroku ps:scale clock=1# useful commandsheroku psheroku logs --ps clock.1heroku logs --tailheroku run bash
Updatescript/scheduler.py with your own preferences.
More info about HerokuScheduler,Clock Processes,Add-on andAPScheduler
Build your image
docker build -t niqdev/packtpub-crawler:1.4.0 .Run manually
docker run \ --rm \ --name my-packtpub-crawler \ niqdev/packtpub-crawler:1.4.0 \ python script/spider.py --config config/prod.cfg --upload driveRun scheduled crawler in background
docker run \ --detach \ --name my-packtpub-crawler \ niqdev/packtpub-crawler:1.4.0# useful commandsdocker exec -i -t my-packtpub-crawler bashdocker logs -f my-packtpub-crawlerRun a simple static server with
node dev/server.jsand test the crawler with
python script/spider.py --dev --config config/dev.cfg --allThis project is just a Proof of Concept and not intended for any illegal usage. I'm not responsible for any damage or abuse, use it at your own risk.
About
Download your daily free Packt Publishing eBookhttps://www.packtpub.com/packt/offers/free-learning
Topics
Resources
License
Uh oh!
There was an error while loading.Please reload this page.
Stars
Watchers
Forks
Packages0
Uh oh!
There was an error while loading.Please reload this page.
Contributors9
Uh oh!
There was an error while loading.Please reload this page.