- Notifications
You must be signed in to change notification settings - Fork0
Demonstration of distributed web scraper forhttps://www.techinasia.com/jobs and its analytics data pipeline.
NotificationsYou must be signed in to change notification settings
pythonjokeun/web-scraping-data-pipeline
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
Demonstration of distributed web scraper forhttps://www.techinasia.com/jobs and its analytics data pipeline.
This guide assumes you have working installation of:
- Docker Compose
- Postgres
- Redis
- Create 2 databases in yourPostgres instance named
scraper
andetl
- Clone this repo by run
git clone https://github.com/pythonjokeun/web-scraper-data-pipeline
- Enter the cloned repo.
- Open up
docker-compose.yml
in your favorite editor. - Configure all the environment variables values accordingly.
- Run
docker-compose up --scale scraper-consumer=2
Notes:
- If your working installation ofPostgres orRedis running at your local machine, you can use
host.docker.internal
as the[host]
value. - The scraper and ETL data pipeline execution schedules are defined in the
CRON_SCHEDULE
variables usingcron
expression. - The
--scale scraper-consumer=2
argument is used to define number ofscraper-consumer
instance.
About
Demonstration of distributed web scraper forhttps://www.techinasia.com/jobs and its analytics data pipeline.
Resources
Uh oh!
There was an error while loading.Please reload this page.
Stars
Watchers
Forks
Releases
No releases published
Packages0
No packages published
Uh oh!
There was an error while loading.Please reload this page.