Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Demonstration of distributed web scraper forhttps://www.techinasia.com/jobs and its analytics data pipeline.

NotificationsYou must be signed in to change notification settings

pythonjokeun/web-scraping-data-pipeline

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Demonstration of distributed web scraper forhttps://www.techinasia.com/jobs and its analytics data pipeline.

Prerequisite

This guide assumes you have working installation of:

  • Docker Compose
  • Postgres
  • Redis

Guide

  • Create 2 databases in yourPostgres instance namedscraper andetl
  • Clone this repo by rungit clone https://github.com/pythonjokeun/web-scraper-data-pipeline
  • Enter the cloned repo.
  • Open updocker-compose.yml in your favorite editor.
  • Configure all the environment variables values accordingly.
  • Rundocker-compose up --scale scraper-consumer=2

Notes:

  • If your working installation ofPostgres orRedis running at your local machine, you can usehost.docker.internal as the[host] value.
  • The scraper and ETL data pipeline execution schedules are defined in theCRON_SCHEDULE variables usingcron expression.
  • The--scale scraper-consumer=2 argument is used to define number ofscraper-consumer instance.

Design

Following image is the overview how the system looks like,design

About

Demonstration of distributed web scraper forhttps://www.techinasia.com/jobs and its analytics data pipeline.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

[8]ページ先頭

©2009-2025 Movatter.jp