Movatterモバイル変換


[0]ホーム

URL:


Menu
×
Sign In
+1 Get Certified For Teachers Spaces Plus Get Certified For Teachers Spaces Plus
   ❮     
     ❯   

Python Tutorial

Python HOMEPython IntroPython Get StartedPython SyntaxPython CommentsPython VariablesPython Data TypesPython NumbersPython CastingPython StringsPython BooleansPython OperatorsPython ListsPython TuplesPython SetsPython DictionariesPython If...ElsePython MatchPython While LoopsPython For LoopsPython FunctionsPython LambdaPython ArraysPython OOPPython Classes/ObjectsPython InheritancePython IteratorsPython PolymorphismPython ScopePython ModulesPython DatesPython MathPython JSONPython RegExPython PIPPython Try...ExceptPython String FormattingPython User InputPython VirtualEnv

File Handling

Python File HandlingPython Read FilesPython Write/Create FilesPython Delete Files

Python Modules

NumPy TutorialPandas TutorialSciPy TutorialDjango Tutorial

Python Matplotlib

Matplotlib IntroMatplotlib Get StartedMatplotlib PyplotMatplotlib PlottingMatplotlib MarkersMatplotlib LineMatplotlib LabelsMatplotlib GridMatplotlib SubplotMatplotlib ScatterMatplotlib BarsMatplotlib HistogramsMatplotlib Pie Charts

Machine Learning

Getting StartedMean Median ModeStandard DeviationPercentileData DistributionNormal Data DistributionScatter PlotLinear RegressionPolynomial RegressionMultiple RegressionScaleTrain/TestDecision TreeConfusion MatrixHierarchical ClusteringLogistic RegressionGrid SearchCategorical DataK-meansBootstrap AggregationCross ValidationAUC - ROC CurveK-nearest neighbors

Python DSA

Python DSALists and ArraysStacksQueuesLinked ListsHash TablesTreesBinary TreesBinary Search TreesAVL TreesGraphsLinear SearchBinary SearchBubble SortSelection SortInsertion SortQuick SortCounting SortRadix SortMerge Sort

Python MySQL

MySQL Get StartedMySQL Create DatabaseMySQL Create TableMySQL InsertMySQL SelectMySQL WhereMySQL Order ByMySQL DeleteMySQL Drop TableMySQL UpdateMySQL LimitMySQL Join

Python MongoDB

MongoDB Get StartedMongoDB Create DBMongoDB CollectionMongoDB InsertMongoDB FindMongoDB QueryMongoDB SortMongoDB DeleteMongoDB Drop CollectionMongoDB UpdateMongoDB Limit

Python Reference

Python OverviewPython Built-in FunctionsPython String MethodsPython List MethodsPython Dictionary MethodsPython Tuple MethodsPython Set MethodsPython File MethodsPython KeywordsPython ExceptionsPython Glossary

Module Reference

Random ModuleRequests ModuleStatistics ModuleMath ModulecMath Module

Python How To

Remove List DuplicatesReverse a StringAdd Two Numbers

Python Examples

Python ExamplesPython CompilerPython ExercisesPython QuizPython ServerPython SyllabusPython Study PlanPython Interview Q&APython BootcampPython CertificatePython Training

Machine Learning - Scale


Scale Features

When your data has different values, and even different measurement units, it can be difficult to compare them. What is kilograms compared to meters? Or altitude compared to time?

The answer to this problem is scaling. We can scale data into new values that are easier to compare.

Take a look at the table below, it is the same data set that we used in themultiple regression chapter, but this time thevolume column contains values in liters instead ofcm3 (1.0 instead of 1000).

CarModelVolumeWeightCO2
ToyotaAygo1.079099
MitsubishiSpace Star1.2116095
SkodaCitigo1.092995
Fiat5000.986590
MiniCooper1.51140105
VWUp!1.0929105
SkodaFabia1.4110990
MercedesA-Class1.5136592
FordFiesta1.5111298
AudiA11.6115099
HyundaiI201.198099
SuzukiSwift1.3990101
FordFiesta1.0111299
HondaCivic1.6125294
HundaiI301.6132697
OpelAstra1.6133097
BMW11.6136599
Mazda32.21280104
SkodaRapid1.61119104
FordFocus2.01328105
FordMondeo1.6158494
OpelInsignia2.0142899
MercedesC-Class2.1136599
SkodaOctavia1.6141599
VolvoS602.0141599
MercedesCLA1.51465102
AudiA42.01490104
AudiA62.01725114
VolvoV701.61523109
BMW52.01705114
MercedesE-Class2.11605115
VolvoXC702.01746117
FordB-Max1.61235104
BMW21.61390108
OpelZafira1.61405109
MercedesSLK2.51395120

It can be difficult to compare the volume 1.0 with the weight 790, but if we scale them both into comparable values, we can easily see how much one value is compared to the other.

There are different methods for scaling data, in this tutorial we will use a method called standardization.

The standardization method uses this formula:

z = (x - u) / s

Wherez is the new value,x is the original value,u is the mean ands is the standard deviation.

If you take theweight column from the data set above, the first value is 790, and the scaled value will be:

(790 -1292.23) /238.74 = -2.1

If you take thevolume column from the data set above, the first value is 1.0, and the scaled value will be:

(1.0 -1.61) /0.38 = -1.59

Now you can compare -2.1 with -1.59 instead of comparing 790 with 1.0.

You do not have to do this manually,the Python sklearn module has a method calledStandardScaler()which returns a Scaler object with methods for transforming data sets.

Example

Scale all values in the Weight and Volume columns:

import pandas
from sklearn import linear_model
from sklearn.preprocessing import StandardScaler
scale = StandardScaler()

df = pandas.read_csv("data.csv")

X = df[['Weight', 'Volume']]

scaledX = scale.fit_transform(X)

print(scaledX)

Result:

Note that the first two values are -2.1 and -1.59, which corresponds to our calculations:

[[-2.10389253 -1.59336644] [-0.55407235 -1.07190106] [-1.52166278 -1.59336644] [-1.78973979 -1.85409913] [-0.63784641 -0.28970299] [-1.52166278 -1.59336644] [-0.76769621 -0.55043568] [ 0.3046118  -0.28970299] [-0.7551301  -0.28970299] [-0.59595938 -0.0289703 ] [-1.30803892 -1.33263375] [-1.26615189 -0.81116837] [-0.7551301  -1.59336644] [-0.16871166 -0.0289703 ] [ 0.14125238 -0.0289703 ] [ 0.15800719 -0.0289703 ] [ 0.3046118  -0.0289703 ] [-0.05142797  1.53542584] [-0.72580918 -0.0289703 ] [ 0.14962979  1.01396046] [ 1.2219378  -0.0289703 ] [ 0.5685001   1.01396046] [ 0.3046118   1.27469315] [ 0.51404696 -0.0289703 ] [ 0.51404696  1.01396046] [ 0.72348212 -0.28970299] [ 0.8281997   1.01396046] [ 1.81254495  1.01396046] [ 0.96642691 -0.0289703 ] [ 1.72877089  1.01396046] [ 1.30990057  1.27469315] [ 1.90050772  1.01396046] [-0.23991961 -0.0289703 ] [ 0.40932938 -0.0289703 ] [ 0.47215993 -0.0289703 ] [ 0.4302729   2.31762392]]

Run example »



Predict CO2 Values

The task in theMultiple Regression chapter was to predict the CO2 emission from a car when you only knew its weight and volume.

When the data set is scaled, you will have to use the scale when you predict values:

Example

Predict the CO2 emission from a 1.3 liter car that weighs 2300 kilograms:

import pandas
from sklearn import linear_model
from sklearn.preprocessing import StandardScaler
scale = StandardScaler()

df = pandas.read_csv("data.csv")

X = df[['Weight', 'Volume']]
y = df['CO2']

scaledX = scale.fit_transform(X)

regr = linear_model.LinearRegression()
regr.fit(scaledX, y)

scaled = scale.transform([[2300, 1.3]])

predictedCO2 = regr.predict([scaled[0]])
print(predictedCO2)

Result:

[107.2087328]

Run example »


 
Track your progress - it's free!
 

×

Contact Sales

If you want to use W3Schools services as an educational institution, team or enterprise, send us an e-mail:
sales@w3schools.com

Report Error

If you want to report an error, or if you want to make a suggestion, send us an e-mail:
help@w3schools.com

W3Schools is optimized for learning and training. Examples might be simplified to improve reading and learning.
Tutorials, references, and examples are constantly reviewed to avoid errors, but we cannot warrant full correctness
of all content. While using W3Schools, you agree to have read and accepted ourterms of use,cookie and privacy policy.

Copyright 1999-2025 by Refsnes Data. All Rights Reserved.W3Schools is Powered by W3.CSS.


[8]ページ先頭

©2009-2025 Movatter.jp