ArshKA/LinkedIn-Job-ScraperPublic

NotificationsYou must be signed in to change notification settings
Fork39
Star139

LinkedIn scraper to retrieve and store a live stream of job postings

139 stars 39 forks Branches Tags Activity

You must be signed in to change notification settings

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 64 Commits
json_paths		json_paths
media		media
scripts		scripts
.gitignore		.gitignore
DatabaseStructure.md		DatabaseStructure.md
README.md		README.md
details_retriever.py		details_retriever.py
logins.csv.template		logins.csv.template
requirements.txt		requirements.txt
search_retriever.py		search_retriever.py
to_csv.py		to_csv.py

Repository files navigation

LinkedIn Job Scraper

Program to scrape and store a constant stream of LinkedIn job postings and dozens of their respective attributes

Download the polished dataset and view insights at -https://www.kaggle.com/datasets/arshkon/linkedin-job-postings

User Configurations

Required

logins.csv
- Populate with multiple LinkedIn logins
- Specify the purpose of the login (search or detail retreiever)
- I recommend 1-3 logins for search and the remaining for more expensive attribute retrieval

Optional

details_retriever.py
- MAX_UPDATES: - Number of job postings to look up before sleeping. Increase with more accounts/proxies (default = 25)
- SLEEP_TIME: - Seconds to sleep between every iteration (default = 60)

Running

This program consists of 2 main scripts, running in parallel.

python search_retriever.py - discovers new job postings and insert the most recent IDs and minimal attributes into the database

python details_retriever.py - populates tables with complete job attributes

It's important to note that whilesearch_retriever.py typically runs smoothly, even through your personal IP and a singular account,details_retriever.py can be a bit finicky. Each search generates approximately 25-50 results, all of which must be individually queried to obtain their attributes. To enhance its performance, I recommend the following strategies:

Utilize multiple proxies and accounts when running details_retriever.py.
Experiment with different time delays to find the optimal settings.
Run details_retriever.py during periods of lower online activity, such as late-night hours and weekends, to catch up with the progress of search_retriever.py. This will ensure that both processes remain synchronized and up to date.

Converting Database to CSV

python to_csv.py --folder <destination folder> --database <linkedin_jobs.db>

Creates a CSV file for each database, along with minimal preprocessing

Database Structure

You can find the structure of the database here

About

LinkedIn scraper to retrieve and store a live stream of job postings

Languages

Python100.0%

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

LinkedIn Job Scraper

User Configurations

Required

Optional

Running

Converting Database to CSV

Database Structure

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages

Uh oh!

Contributors2

Uh oh!

Languages

Movatterモバイル変換

ArshKA/LinkedIn-Job-Scraper

Folders and files

Latest commit

History

Repository files navigation

LinkedIn Job Scraper

User Configurations

Required

Optional

Running

Converting Database to CSV

Database Structure

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages0

Uh oh!

Contributors2

Uh oh!

Languages

Packages