- Notifications
You must be signed in to change notification settings - Fork0
This repository contains a comprehensive study on employee attrition analysis using data mining techniques. It includes data preprocessing, visualization, and predictive modeling (with algorithms such as Decision Tree, Random Forest, and Logistic Regression) to identify key factors influencing attrition, using the IBM HR dataset.
Run-d1/Employee-Attrition-Analysis
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
This project leverages data mining techniques to analyze employee attrition using the IBM HR Analytics dataset. The goal is to identify key factors influencing attrition and build predictive models to aid human resource departments in improving employee retention strategies.
HR-Employee-Attrition.csv
: Original dataset before preprocessing.HR-Employee-Attrition-Updated.csv
: Dataset after preprocessing.HR-employee-attrition.ipynb
: The Python script for data preprocessing.paper/EmployeeAttrition_Paper_Group1.pdf
: The final project paper detailing the methodology and findings.
The dataset is a fictional IBM HR Analytics dataset designed to simulate employee attrition scenarios.
- Data Preprocessing: Cleaning, encoding, and feature selection.
- Modeling: Decision Tree, Random Forest, and Logistic Regression.
- Evaluation: Logistic Regression was selected as the best model based on recall.
- Monthly income and overtime work significantly impact attrition.
- Logistic Regression demonstrated the highest performance with a recall of ~59%.
About
This repository contains a comprehensive study on employee attrition analysis using data mining techniques. It includes data preprocessing, visualization, and predictive modeling (with algorithms such as Decision Tree, Random Forest, and Logistic Regression) to identify key factors influencing attrition, using the IBM HR dataset.