- Notifications
You must be signed in to change notification settings - Fork0
License
PhuongFX/HeartR
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
A Machine Learning Tool for Early Intervention
Heart disease is a leading cause of death worldwide, and accurate prediction of heart disease remains a significant challenge.
This project aims to develop a machine learning model capable of predicting heart disease using a comprehensive dataset of key indicators.
- A dataset of over 400,000 adult profiles, capturing the diverse health status of individuals across various demographics and risk factors
- A range of machine learning models, including Decision Trees, Random Forests, Gradient Boosting, and more
- Hyperparameter tuning using BayesSearchCV and RandomizedSearchCV
- Model evaluation using classification reports and accuracy scores
- Prediction on new, unseen patient data from a random sample from the test set
- Dataset URL:💖 Indicators of Heart Disease
- License: CC0-1.0
- Number of samples: 400,000
- Number of factors: 40
Category | Number of Images |
---|---|
Training | 12594 |
Validation | 500 |
Testing | 500 |
- Python 3.x
- Xgboost
- Keras
- Scikit-learn
- NumPy
- Pandas
- Matplotlib
- Seaborn
- Plotly
- Data Scaling: Appling PCA to the training features, normalize categorical labels, and shuffle the dataset to increase randomness and reduce bias. 🔀
The following models are implemented and compared:
- DecisionTreeClassifier
- RandomForestClassifier
- ExtraTreesClassifier
- GradientBoostingClassifier
- HistGradientBoostingClassifier
- XGBClassifier
- LGBMClassifier
- CatBoostClassifier
- SVC
- LogisticRegression
- MLPClassifier
- AdaBoostClassifier
- GaussianNB
The model achieves a test accuracy of 94.66% using the MLPClassifier model, which is a great result considering the complexity of the dataset! 🎉I have also identified the best hyperparameters for the RandomForestClassifier and XGBClassifier models using BayesSearchCV and RandomizedSearchCV.
- Training accuracy: 0.9996
- Validation accuracy: 0.9420
- Test accuracy: 0.9600
Predicted Positive | Predicted Negative | |
---|---|---|
Actual Positive | 232 | 12 |
Actual Negative | 15 | 213 |
- GridSearchCV
- RandomizedSearchCV
- Kaggle dataset: 💖 Indicators of Heart Disease (2022 UPDATE)
- Scikit-learn and Xgboost libraries for model training
- Matplotlib and Seaborn libraries for data visualization
This project is licensed under AGPL-3.0 License and is for personal use only and should not be used for commercial purposes.The pre-trained model and may not always produce accurate results.
This project demonstrates the potential of machine learning for heart disease prediction.The model achieves high accuracy and can be used as a starting point for further research and development in this field.
I hope you found this project informative and engaging! 😊
If you're interested in collaborating and contributing to the project, please let me know! I'd love to hear from you.
To get started with this project, you'll need to:
- Install the required libraries, including pandas, numpy, scikit-learn, xgboost, catboost, lightgbm
pip install pandas numpy scikit-learn tensorflow xgboost lightgbm catboost
📦 - Download the dataset from Kaggle 📈
- Run the code to train and evaluate the model 🤖
Enjoy working with the content! 😊
About
Resources
License
Uh oh!
There was an error while loading.Please reload this page.