Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

LinkedIn scraper to retrieve and store a live stream of job postings

NotificationsYou must be signed in to change notification settings

ArshKA/LinkedIn-Job-Scraper

Repository files navigation

Program to scrape and store a constant stream of LinkedIn job postings and dozens of their respective attributes

Download the polished dataset and view insights at -https://www.kaggle.com/datasets/arshkon/linkedin-job-postings

User Configurations

Required

  • logins.csv
    • Populate with multiple LinkedIn logins
    • Specify the purpose of the login (search or detail retreiever)
    • I recommend 1-3 logins for search and the remaining for more expensive attribute retrieval

Optional

  • details_retriever.py
    • MAX_UPDATES: - Number of job postings to look up before sleeping. Increase with more accounts/proxies (default = 25)
    • SLEEP_TIME: - Seconds to sleep between every iteration (default = 60)

Running

This program consists of 2 main scripts, running in parallel.

python search_retriever.py - discovers new job postings and insert the most recent IDs and minimal attributes into the database

python details_retriever.py - populates tables with complete job attributes

It's important to note that whilesearch_retriever.py typically runs smoothly, even through your personal IP and a singular account,details_retriever.py can be a bit finicky. Each search generates approximately 25-50 results, all of which must be individually queried to obtain their attributes. To enhance its performance, I recommend the following strategies:

  • Utilize multiple proxies and accounts when running details_retriever.py.
  • Experiment with different time delays to find the optimal settings.
  • Run details_retriever.py during periods of lower online activity, such as late-night hours and weekends, to catch up with the progress of search_retriever.py. This will ensure that both processes remain synchronized and up to date.

Converting Database to CSV

python to_csv.py --folder <destination folder> --database <linkedin_jobs.db>

Creates a CSV file for each database, along with minimal preprocessing

Database Structure

You can find the structure of the database here

About

LinkedIn scraper to retrieve and store a live stream of job postings

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors2

  •  
  •  

Languages


[8]ページ先頭

©2009-2025 Movatter.jp