Movatterモバイル変換

Skip to content

gowthaman25/Data-Preprocessing-Preparation-and-Feature-reductionPublic

NotificationsYou must be signed in to change notification settings
Fork0
Star0

This work is about creating AI Solution to learn data preprocessing, preparation and Feature reduction using UCI Communities & Crime dataset.

0 stars 0 forks Branches Tags Activity

You must be signed in to change notification settings

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
Data_Preparation.ipynb		Data_Preparation.ipynb
README.md		README.md
communities.data		communities.data

Repository files navigation

Data preprocessing, preparation, and feature reduction are among the most critical steps before applying any Machine Learning (ML) model — they often determine 70–80% of the success of your model’s performance.

Preprocessing- Essential to ensure data quality and consistencyPreparation- Critical for representativeness and feature engineeringFeature Reduction- Important for efficiency and avoiding overfitting

Data-Preprocessing-Preparation-and-Feature-reduction

This work is about creating AI Solution to learn data preprocessing, preparation and Singular Value Decomposition for feature reduction using Using the UCI Communities & Crime dataset.

Here we have 128 columns total:•122 predictive features•5 non-predictive features•1 goal/target variable

🧹 Data Preprocessing

Load the dataset
Identify:
- Numeric and non-numeric columns
- Predictive and non-predictive attributes
Exclude non-predictive attributes such as:
- state,county,community,communityname
Split predictive columns by data type:
- Numeric
- Categorical
  (these columns will be used in later processing)
Filter and retain only numeric columns
Encode categorical columns
Handle missing values

🧩 Data Preparation

Identify key predictive factors usingcorrelation analysis
Compute correlation between features and the target variable
Analyze bothpositively andnegatively correlated columns
- Positive correlation → Features that increase with the target
- Negative correlation → Features that decrease with the target

Top 5 positively correlated features are choosen which are more affected by the target

Random Forest

Spliting into training and testing to apply randon forest R² Score: 0.9999600609068787And gives key predictive featuresViolentCrimesPerPop 0.999918LemasPctOfficDrugUn 0.000003racepctblack 0.000003population 0.000003PctTeen2Par 0.000003PctBSorMore 0.000003PctYoungKids2Par 0.000003PctKids2Par 0.000003NumInShelters 0.000003MedRentPctHousInc 0.000002MalePctDivorce 0.000002PctNotHSGrad 0.000002PctWOFullPlumb 0.000002racePctWhite 0.000002TotalPctDiv 0.000002

⚙️ Feature Reduction

Prepare data forSingular Value Decomposition (SVD)
Perform SVD to decompose the dataset into components
Analyze the obtained components and interpret their significance for target prediction
SVD helps identify which features contribute most to each component
Higher component values indicatestronger feature contribution

About

This work is about creating AI Solution to learn data preprocessing, preparation and Feature reduction using UCI Communities & Crime dataset.

Topics

data-analysis python-3 data-preparation feature-reduction preprocessing-data

Resources

Stars

Watchers

Forks

Report repository

Releases

No releases published

Packages

No packages published

Languages

Jupyter Notebook100.0%

[8]ページ先頭

©2009-2025 Movatter.jp