Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Stroke Prediction

NotificationsYou must be signed in to change notification settings

bsameera/Metis_Classification

Repository files navigation

Stroke Prediction

Classification Project Write-up

Abstract -

The goal of the project was to predict stroke in individuals using binary classification models for UCSF stroke clinic. The best model chosen will reduce will be applied to reduce the number of strokes in patients or alert the patients with high risk to get tests done.

Design -

The data was obtained from Kaggle -https://www.kaggle.com/datasets/fedesoriano/stroke-prediction-dataset . The data was cleaned by dropping some rows with null values (bmi), since a person’s weight depends on many factors and we cannot predict one, the rows were dropped instead imputing the data.The data included categorical and continuous numerical values. The categorical features were binarized using one hot encoding method. Various classification algorithms were trained and the best model was chosen with ensemble voting classifier.

Data -

•There were around 5000 entries of data.•The features included are 16, including the binarized predictors using get_dummies method.•The predictors were id, age, gender, hypertension, heart disease, smoking status, glucose levels, bmi, work type, residence type and ever married.

Algorithms -

Models used were Logistic Regression, kNN, Decision Tree, Random Forest and XGBoost.

The data was divided into two parts, train (80%) and test (20%) and stratified. The data was trained on train data using the above algorithms and GridSearchCV. The final results were evaluated using the test data.Among all the algorithms used linear regression, random forest and XGBoost scores were promising considering the best auc score and logloss for each model. The best model was chosen using ensemble voting classifier (soft).

The results are as follows:•Recall – 0.95•Precision – 0.1•Accuracy - 0.92

Tools -

Pandas – Clean, Explore and Feature EngineeringScikit-Learn – Build different Classification models and perform cross validation, variable selection and regularizationMatplotlib/ Seaborn – Visualizing data exploration, modeling and resultsPython 3.8.5 – to run all of the above

About

Stroke Prediction

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

[8]ページ先頭

©2009-2025 Movatter.jp