Movatterモバイル変換

pythonjokeun/web-scraping-data-pipelinePublic

NotificationsYou must be signed in to change notification settings
Fork0
Star2

Demonstration of distributed web scraper forhttps://www.techinasia.com/jobs and its analytics data pipeline.

2 stars 0 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
etl		etl
scraper		scraper
.gitignore		.gitignore
README.md		README.md
design-img.png		design-img.png
docker-compose.yml		docker-compose.yml

Repository files navigation

Overview

Demonstration of distributed web scraper forhttps://www.techinasia.com/jobs and its analytics data pipeline.

Prerequisite

This guide assumes you have working installation of:

Docker Compose
Postgres
Redis

Guide

Create 2 databases in yourPostgres instance namedscraper andetl
Clone this repo by rungit clone https://github.com/pythonjokeun/web-scraper-data-pipeline
Enter the cloned repo.
Open updocker-compose.yml in your favorite editor.
Configure all the environment variables values accordingly.
Rundocker-compose up --scale scraper-consumer=2

Notes:

If your working installation ofPostgres orRedis running at your local machine, you can usehost.docker.internal as the[host] value.
The scraper and ETL data pipeline execution schedules are defined in theCRON_SCHEDULE variables usingcron expression.
The--scale scraper-consumer=2 argument is used to define number ofscraper-consumer instance.

Design

Following image is the overview how the system looks like,

About

Demonstration of distributed web scraper forhttps://www.techinasia.com/jobs and its analytics data pipeline.

Releases

No releases published

Packages

No packages published

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Overview

Prerequisite

Guide

Design

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages

Uh oh!

Languages

Movatterモバイル変換

pythonjokeun/web-scraping-data-pipeline

Folders and files

Latest commit

History

Repository files navigation

Overview

Prerequisite

Guide

Design

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages0

Uh oh!

Languages

Packages