Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Download your daily free Packt Publishing eBookhttps://www.packtpub.com/packt/offers/free-learning

License

NotificationsYou must be signed in to change notification settings

niqdev/packtpub-crawler

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

99 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Download FREE eBook every day fromwww.packtpub.com

This crawler automates the following step:

  • access to private account
  • claim the daily free eBook
  • parse title, description and useful information
  • download favorite format.pdf .epub .mobi
  • download source code and book cover
  • upload files to Google Drive
  • store data on Firebase
  • notify via email
  • schedule daily job on Heroku or with Docker

Default command

# upload pdf to drive, store data and notify via emailpython script/spider.py -c config/prod.cfg -u drive -s firebase -n

Other options

# download all formatpython script/spider.py --config config/prod.cfg --all# download only one format: pdf|epub|mobipython script/spider.py --config config/prod.cfg --type pdf# download also additional material: source code (if exists) and book coverpython script/spider.py --config config/prod.cfg -t pdf --extras# equivalent (default is pdf)python script/spider.py -c config/prod.cfg -e# download and then upload to Drive (given the download url anyone can download it)python script/spider.py -c config/prod.cfg -t epub --upload drivepython script/spider.py --config config/prod.cfg --all --extras --upload drive

Basic setup

Before you start you should

  • Verify that your currently installed version of Python is2.x withpython --version
  • Clone the repositorygit clone https://github.com/niqdev/packtpub-crawler.git
  • Install all the dependencies (you might needsudo privilege)pip install -r requirements.txt
  • Create aconfig filecp config/prod_example.cfg config/prod.cfg
  • Change your Packtpub credentials in the config file
[credential]credential.email=PACKTPUB_EMAILcredential.password=PACKTPUB_PASSWORD

Now you should be able to claim and download your first eBook

python script/spider.py --config config/prod.cfg

Upload setup

From documentation, Drive API requires OAuth2.0 for authentication, so to upload files you should:

  • Go toGoogle APIs Console and create a newDrive project namedPacktpubDrive
  • OnAPI manager > Overview menu
    • Enable Google Drive API
  • OnAPI manager > Credentials menu
    • InOAuth consent screen tab setPacktpubDrive as the product name shown to users
    • InCredentials tab create credentials of typeOAuth client ID and choose Application typeOther namedPacktpubDriveCredentials
  • ClickDownload JSON and save the fileconfig/client_secrets.json
  • Change your Drive credentials in the config file
[drive]...drive.client_secrets=config/client_secrets.jsondrive.gmail=GOOGLE_DRIVE@gmail.com

Now you should be able to upload your eBook to Drive

python script/spider.py --config config/prod.cfg --upload drive

Only the first time you will be prompted to login in a browser which has javascript enabled (no text-based browser) to generateconfig/auth_token.json.You should also copy and paste in the config theFOLDER_ID, otherwise every time a new folder with the same name will be created.

[drive]...drive.default_folder=packtpubdrive.upload_folder=FOLDER_ID

Documentation:OAuth,Quickstart,example andpermissions

Database setup

Create a new Firebaseproject, copy the database secret from your settings

https://console.firebase.google.com/project/PROJECT_NAME/settings/database

and update the configs

[firebase]firebase.database_secret=DATABASE_SECRETfirebase.url=https://PROJECT_NAME.firebaseio.com

Now you should be able to store your eBook details on Firebase

python script/spider.py --config config/prod.cfg --upload drive --store firebase

Notification setup

Tosend a notification via email using Gmail you should:

[notify]...notify.username=EMAIL_USERNAME@gmail.comnotify.password=EMAIL_PASSWORDnotify.from=FROM_EMAIL@gmail.comnotify.to=TO_EMAIL_1@gmail.com,TO_EMAIL_2@gmail.com

Now you should be able to notify your accounts

python script/spider.py --config config/prod.cfg --upload drive --notify

Heroku setup

Create a new branch

git checkout -b heroku-scheduler

Update the.gitignore and commit your changes

# removeconfig/prod.cfgconfig/client_secrets.jsonconfig/auth_token.json# adddev/config/dev.cfgconfig/prod_example.cfg

Create, config and deploy the scheduler

heroku login# create a new appheroku create APP_NAME# or if you already have an existing appheroku git:remote -a APP_NAME# deploy your appgit push -u heroku heroku-scheduler:masterheroku ps:scale clock=1# useful commandsheroku psheroku logs --ps clock.1heroku logs --tailheroku run bash

Updatescript/scheduler.py with your own preferences.

More info about HerokuScheduler,Clock Processes,Add-on andAPScheduler

Docker setup

Build your image

docker build -t niqdev/packtpub-crawler:1.3.0 .

Run manually

docker run \  --rm \  --name my-packtpub-crawler \  niqdev/packtpub-crawler:1.3.0 \  python script/spider.py --config config/prod.cfg --upload drive

Run scheduled crawler in background

docker run \  --detach \  --name my-packtpub-crawler \  niqdev/packtpub-crawler:1.3.0# useful commandsdocker exec -i -t my-packtpub-crawler bashdocker logs -f my-packtpub-crawler

Development (only for spidering)

Run a simple static server with

node dev/server.js

and test the crawler with

python script/spider.py --dev --config config/dev.cfg --all

Disclaimer

This project is just a Proof of Concept and not intended for any illegal usage. I'm not responsible for any damage or abuse, use it at your own risk.

Packages

No packages published

Contributors9


[8]ページ先頭

©2009-2025 Movatter.jp