- Notifications
You must be signed in to change notification settings - Fork39
ArshKA/LinkedIn-Job-Scraper
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
Program to scrape and store a constant stream of LinkedIn job postings and dozens of their respective attributes
Download the polished dataset and view insights at -https://www.kaggle.com/datasets/arshkon/linkedin-job-postings
logins.csv
- Populate with multiple LinkedIn logins
- Specify the purpose of the login (search or detail retreiever)
- I recommend 1-3 logins for search and the remaining for more expensive attribute retrieval
details_retriever.py
- MAX_UPDATES: - Number of job postings to look up before sleeping. Increase with more accounts/proxies (default = 25)
- SLEEP_TIME: - Seconds to sleep between every iteration (default = 60)
This program consists of 2 main scripts, running in parallel.
python search_retriever.py
- discovers new job postings and insert the most recent IDs and minimal attributes into the database
python details_retriever.py
- populates tables with complete job attributes
It's important to note that whilesearch_retriever.py
typically runs smoothly, even through your personal IP and a singular account,details_retriever.py
can be a bit finicky. Each search generates approximately 25-50 results, all of which must be individually queried to obtain their attributes. To enhance its performance, I recommend the following strategies:
- Utilize multiple proxies and accounts when running details_retriever.py.
- Experiment with different time delays to find the optimal settings.
- Run details_retriever.py during periods of lower online activity, such as late-night hours and weekends, to catch up with the progress of search_retriever.py. This will ensure that both processes remain synchronized and up to date.
python to_csv.py --folder <destination folder> --database <linkedin_jobs.db>
Creates a CSV file for each database, along with minimal preprocessing
About
LinkedIn scraper to retrieve and store a live stream of job postings
Topics
Resources
Uh oh!
There was an error while loading.Please reload this page.
Stars
Watchers
Forks
Releases
Packages0
Uh oh!
There was an error while loading.Please reload this page.
Contributors2
Uh oh!
There was an error while loading.Please reload this page.