- Notifications
You must be signed in to change notification settings - Fork0
gowthaman25/Data-Preprocessing-Preparation-and-Feature-reduction
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
Data preprocessing, preparation, and feature reduction are among the most critical steps before applying any Machine Learning (ML) model — they often determine 70–80% of the success of your model’s performance.
Preprocessing- Essential to ensure data quality and consistencyPreparation- Critical for representativeness and feature engineeringFeature Reduction- Important for efficiency and avoiding overfitting
This work is about creating AI Solution to learn data preprocessing, preparation and Singular Value Decomposition for feature reduction using Using the UCI Communities & Crime dataset.
Here we have 128 columns total:•122 predictive features•5 non-predictive features•1 goal/target variable
- Load the dataset
- Identify:
- Numeric and non-numeric columns
- Predictive and non-predictive attributes
- Exclude non-predictive attributes such as:
state,county,community,communityname
- Split predictive columns by data type:
- Numeric
- Categorical
(these columns will be used in later processing)
- Filter and retain only numeric columns
- Encode categorical columns
- Handle missing values
- Identify key predictive factors usingcorrelation analysis
- Compute correlation between features and the target variable
- Analyze bothpositively andnegatively correlated columns
- Positive correlation → Features that increase with the target
- Negative correlation → Features that decrease with the target

Top 5 positively correlated features are choosen which are more affected by the target
Spliting into training and testing to apply randon forest R² Score: 0.9999600609068787And gives key predictive featuresViolentCrimesPerPop 0.999918LemasPctOfficDrugUn 0.000003racepctblack 0.000003population 0.000003PctTeen2Par 0.000003PctBSorMore 0.000003PctYoungKids2Par 0.000003PctKids2Par 0.000003NumInShelters 0.000003MedRentPctHousInc 0.000002MalePctDivorce 0.000002PctNotHSGrad 0.000002PctWOFullPlumb 0.000002racePctWhite 0.000002TotalPctDiv 0.000002
- Prepare data forSingular Value Decomposition (SVD)
- Perform SVD to decompose the dataset into components
- Analyze the obtained components and interpret their significance for target prediction
- SVD helps identify which features contribute most to each component
- Higher component values indicatestronger feature contribution

About
This work is about creating AI Solution to learn data preprocessing, preparation and Feature reduction using UCI Communities & Crime dataset.
Topics
Resources
Uh oh!
There was an error while loading.Please reload this page.