A machine learning project where i wanted to apply my pipeline and scripting skills, here i have tried to predict the factors which impact the HR to call the candidates and the candidates accepting the offers for Data Science roles
I completed my Certifications in Machine learning as well as Feature engineering from Kaggle, and wanted to use the skills to optimize my code and build an automated preprocessing and modelling pipeline on th HR dataset, especially being a candidate who wantes to join the analytics and Data industry
- Explored the data to find null values, shape, distribution
- Employed seaborn for the visualizations used for exploration in this project
- Used heatmap feature of seaborn to find the null valued columns and removed the columns which has null values above a certain threshold, countplot and distplot to understand distribution of categories inside the columns
- Wrote optimized codes using list comprehension to wrangle through data
- Used SimpleImputer to impute values, Columntransformer and Pipeline to build an optimized ML pipeline from preprocessing to model fit
- Employeed mean_absolute_error,accuracy_score,confusion_matrix,classification_report, mean_squared_error to validate the model