bsameera/Metis_ClassificationPublic

NotificationsYou must be signed in to change notification settings
Fork0
Star0

Stroke Prediction

You must be signed in to change notification settings

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
README.md		README.md
Stroke_Prediction.ipynb		Stroke_Prediction.ipynb
Stroke_Prediction.pdf		Stroke_Prediction.pdf
Stroke_Prediction_Ensemble.ipynb		Stroke_Prediction_Ensemble.ipynb
Stroke_Prediction_MVP.ipynb		Stroke_Prediction_MVP.ipynb
Stroke_Prediction_SMOTE.ipynb		Stroke_Prediction_SMOTE.ipynb

Repository files navigation

Metis_Classification

Stroke Prediction

Classification Project Write-up

Abstract -

The goal of the project was to predict stroke in individuals using binary classification models for UCSF stroke clinic. The best model chosen will reduce will be applied to reduce the number of strokes in patients or alert the patients with high risk to get tests done.

Design -

The data was obtained from Kaggle -https://www.kaggle.com/datasets/fedesoriano/stroke-prediction-dataset . The data was cleaned by dropping some rows with null values (bmi), since a person’s weight depends on many factors and we cannot predict one, the rows were dropped instead imputing the data.The data included categorical and continuous numerical values. The categorical features were binarized using one hot encoding method. Various classification algorithms were trained and the best model was chosen with ensemble voting classifier.

Data -

•There were around 5000 entries of data.•The features included are 16, including the binarized predictors using get_dummies method.•The predictors were id, age, gender, hypertension, heart disease, smoking status, glucose levels, bmi, work type, residence type and ever married.

Algorithms -

Models used were Logistic Regression, kNN, Decision Tree, Random Forest and XGBoost.

The data was divided into two parts, train (80%) and test (20%) and stratified. The data was trained on train data using the above algorithms and GridSearchCV. The final results were evaluated using the test data.Among all the algorithms used linear regression, random forest and XGBoost scores were promising considering the best auc score and logloss for each model. The best model was chosen using ensemble voting classifier (soft).

The results are as follows:•Recall – 0.95•Precision – 0.1•Accuracy - 0.92

Tools -

Pandas – Clean, Explore and Feature EngineeringScikit-Learn – Build different Classification models and perform cross validation, variable selection and regularizationMatplotlib/ Seaborn – Visualizing data exploration, modeling and resultsPython 3.8.5 – to run all of the above

About

Stroke Prediction

Releases

No releases published

Packages

No packages published

Languages

Jupyter Notebook100.0%

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Metis_Classification

Stroke Prediction

Classification Project Write-up

Abstract -

Design -

Data -

Algorithms -

Tools -

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages

Languages

Movatterモバイル変換

bsameera/Metis_Classification

Folders and files

Latest commit

History

Repository files navigation

Metis_Classification

Stroke Prediction

Classification Project Write-up

Abstract -

Design -

Data -

Algorithms -

Tools -

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages0

Languages

Packages