Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

This work is about creating AI Solution to learn data preprocessing, preparation and Feature reduction using UCI Communities & Crime dataset.

NotificationsYou must be signed in to change notification settings

gowthaman25/Data-Preprocessing-Preparation-and-Feature-reduction

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 

Repository files navigation

Data preprocessing, preparation, and feature reduction are among the most critical steps before applying any Machine Learning (ML) model — they often determine 70–80% of the success of your model’s performance.

Preprocessing- Essential to ensure data quality and consistencyPreparation- Critical for representativeness and feature engineeringFeature Reduction- Important for efficiency and avoiding overfitting

This work is about creating AI Solution to learn data preprocessing, preparation and Singular Value Decomposition for feature reduction using Using the UCI Communities & Crime dataset.

Here we have 128 columns total:•122 predictive features•5 non-predictive features•1 goal/target variable

🧹 Data Preprocessing

  • Load the dataset
  • Identify:
    • Numeric and non-numeric columns
    • Predictive and non-predictive attributes
  • Exclude non-predictive attributes such as:
    • state,county,community,communityname
  • Split predictive columns by data type:
    • Numeric
    • Categorical
      (these columns will be used in later processing)
  • Filter and retain only numeric columns
  • Encode categorical columns
  • Handle missing values

🧩 Data Preparation

  • Identify key predictive factors usingcorrelation analysis
  • Compute correlation between features and the target variable
  • Analyze bothpositively andnegatively correlated columns
    • Positive correlation → Features that increase with the target
    • Negative correlation → Features that decrease with the target
image

Top 5 positively correlated features are choosen which are more affected by the targetimage

Random Forest

Spliting into training and testing to apply randon forest R² Score: 0.9999600609068787And gives key predictive featuresViolentCrimesPerPop 0.999918LemasPctOfficDrugUn 0.000003racepctblack 0.000003population 0.000003PctTeen2Par 0.000003PctBSorMore 0.000003PctYoungKids2Par 0.000003PctKids2Par 0.000003NumInShelters 0.000003MedRentPctHousInc 0.000002MalePctDivorce 0.000002PctNotHSGrad 0.000002PctWOFullPlumb 0.000002racePctWhite 0.000002TotalPctDiv 0.000002

⚙️ Feature Reduction

  • Prepare data forSingular Value Decomposition (SVD)
  • Perform SVD to decompose the dataset into components
  • Analyze the obtained components and interpret their significance for target prediction
  • SVD helps identify which features contribute most to each component
  • Higher component values indicatestronger feature contribution
image

About

This work is about creating AI Solution to learn data preprocessing, preparation and Feature reduction using UCI Communities & Crime dataset.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

[8]ページ先頭

©2009-2025 Movatter.jp