- Notifications
You must be signed in to change notification settings - Fork4
DPWasserman/indeed-jobs
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
The goal of any job seeker is to land a position. Data science is a very hot area for job prospects. Thus, there are a number of positions open and a number of seekers looking to fill those positions.The aim of this project is to assist job seekers by analyzing Data Scientist job postings on Indeed.com in 6 different cities across the United States. Those cities are:
- Charlotte, NC
- Chicago, IL
- Los Angeles, CA
- New York, NY
- Phoenix, AZ
- San Francisco, CA
Additional cities for later consideration:
- Atlanta, GA
- Austin, TX
- Boston, MA
- Seattle, WA
- Washington, DC
For detailed insights, please see the PPTX files in this repo.
- Enter in the desired locations and a desirable proxy in the
config.py
file in theindeed
sub-folder- For a good, free proxy, refer tohttps://www.us-proxy.org/
- Run the main scraper to get the job postings:
scrapy crawl indeed_spider
- Results will be placed in the
data
sub-folder asindeed_spider.csv
- Results will be placed in the
- Run the secondary scraper to resolve the original posting URL:
scrapy crawl redirect_spider
- Results will be placed in the
data
sub-folder asredirect_spider.csv
- Results will be placed in the
- Open the three Jupyter Notebooks to analyze the results:
- Job_Description_Word_Cloud.ipynb : Produces a naïve word cloud
- Job_Statistics_Calculations.ipynb : Produces statistical analysis from posting metadata
- Job_Text_Analysis.ipynb : Produces analysis from natural language processing from the job descriptions