Commitc3a0abc

minjk-bl

authored and

gitbook-bot

committed

GITBOOK-69: Data Prep

1 parent890f3ec commitc3a0abcCopy full SHA for c3a0abc

File tree

6 files changed

+69

-10

lines changed

docs
- .gitbook/assets
- machine-learning
  - 3.-data-prep.md

6 files changed

+69

-10

lines changed

`‎docs/.gitbook/assets/image (322).png‎`

236 KB

`‎docs/.gitbook/assets/image (323).png‎`

48.9 KB

`‎docs/.gitbook/assets/image (324).png‎`

66.2 KB

`‎docs/.gitbook/assets/image (325).png‎`

63.6 KB

`‎docs/.gitbook/assets/image (326).png‎`

77.6 KB

`‎docs/machine-learning/3.-data-prep.md‎`

Lines changed: 69 additions & 10 deletions

Original file line number	Diff line number	Diff line change
`@@ -1,20 +1,79 @@`
`1`		`-#3. Data Prep`
`2`		`-`
`3`		`-`
	`1`	`+---`
	`2`	`+description:Tools for Preprocessing(Encoding/Scaling)`
	`3`	`+---`
`4`	`4`
`5`		`-<figure><imgsrc="../.gitbook/assets/image (148).png"alt=""width="211"><figcaption></figcaption></figure>`
`6`		`-`
`7`		`-1. Click on Data Prep in the Machine Learning category.`
	`5`	`+#3. Data Prep`
`8`	`6`
	`7`	`+<figure><imgsrc="../.gitbook/assets/image (322).png"alt=""width="529"><figcaption></figcaption></figure>`
`9`	`8`
	`9`	`+1. Click onData Prep in theMachine Learning category.`
`10`	`10`
`11`		`-<figure><imgsrc="../.gitbook/assets/image (149).png"alt=""width="563"><figcaption></figcaption></figure>`
	`11`	`+<figure><imgsrc="../.gitbook/assets/image (323).png"alt=""width="563"><figcaption></figcaption></figure>`
`12`	`12`
`13`	`13`	`2._Model Type_: You can perform various preprocessing tasks:`
`14`		`-* Encoding`
`15`		`-* Scaling`
`16`		`-* ETC`
	`14`	`+[Encoding*](3.-data-prep.md#encoding)`
	`15`	`+[Scaling*](3.-data-prep.md#scaling)`
	`16`	`+[ETC*](3.-data-prep.md#etc-simpleimputer-smote-makecolumntransformer)`
`17`	`17`	`3._Allocate to_: Assign variable names for the model to perform the selected preprocessing tasks.`
`18`	`18`	`4._Code View_: Preview the code that will be output.`
`19`	`19`	`5._Run_: Execute the code.`
`20`	`20`
	`21`	`+`
	`22`	`+`
	`23`	`+***`
	`24`	`+`
	`25`	`+##Encoding`
	`26`	`+`
	`27`	`+<figure><imgsrc="../.gitbook/assets/image (324).png"alt=""width="563"><figcaption></figcaption></figure>`
	`28`	`+`
	`29`	`+1._Sparse (OneHotEncoder)_: If_true,_ returns the encoding result as a sparse matrix.`
	`30`	`+2._Handle unknown (OneHotEncoder, OrdinalEncoder)_: Used when encoding, if there is a category that exists in the training data but not in the test data. If_ignore is_ selected, it will be set to 0, and if_error_ is selected, a ValueError will be raised.`
	`31`	`+3._Unknown values (OrdinalEncoder)_: Fill with a specific value, not ignore or error.`
	`32`	`+4._Cols (TargetEncoder)_: Select the columns to encode.`
	`33`	`+5._Handle missing (TargetEncoder)_: Choose how to handle missing values.`
	`34`	`+6._Smoothing (TargetEncoder)_: When the number of data in a particular category is small, it adds the entered values and calculates the average of the categories to prevent overfitting.`
	`35`	`+`
	`36`	`+`
	`37`	`+`
	`38`	`+***`
	`39`	`+`
	`40`	`+##Scaling`
	`41`	`+`
	`42`	`+<figure><imgsrc="../.gitbook/assets/image (325).png"alt=""width="563"><figcaption></figcaption></figure>`
	`43`	`+`
	`44`	`+1._With mean (StandardScaler)_: Center the mean of the data to zero.`
	`45`	`+2._With std (StandardScaler)_: Scale the standard deviation of the data to 1.`
	`46`	`+3._With centering (RobustScaler)_: Performs centering by Q-subtracting the median from each attribute (column)_._`
	`47`	`+4._With scaling (RobustScaler)_: Scales each attribute by dividing it by its IQR.`
	`48`	`+5._Feature range (MinMaxScaler)_: Sets the minimum and maximum values for the scaled result.`
	`49`	`+6._Norm (Normalizer):_`
	`50`	`+1._L1_: The sum of the absolute values of each attribute will be 1.`
	`51`	`+2._L2_: Scale the vectors so that their Euclidean distance is 1. `
	`52`	`+3._Max Norm_: Ensures that the scaling result does not exceed an existing maximum value.`
	`53`	`+7._N bins (KBins Discretizer)_: Determines how many bins to divide the variable into.`
	`54`	`+8._Strategy (KBins Discretizer)_:`
	`55`	`+1._uniform_: Divide the section by a uniform width.`
	`56`	`+2._QUANTILE_: Divide so that each bin has an even number of data.`
	`57`	`+9._Encode (KBins Discretizer)_: Specify the encoding method.`
	`58`	`+1._ordinal_: Encodes each interval as an integer.`
	`59`	`+2._onehot_: Encodes each interval as a binary vector.`
	`60`	`+`
	`61`	`+`
	`62`	`+`
	`63`	`+***`
	`64`	`+`
	`65`	`+##ETC(SimpleImputer / SMOTE / MakeColumnTransformer)`
	`66`	`+`
	`67`	`+<figure><imgsrc="../.gitbook/assets/image (326).png"alt=""width="563"><figcaption></figcaption></figure>`
	`68`	`+`
	`69`	`+1._Missing values (SimpleImputer)_: Treats the entered values as missing.`
	`70`	`+2._Fill value (SimpleImputer)_: Replaces_the_ missing value with the input value.`
	`71`	`+3._Copy (SimpleImputer)_: Returns the original data unchanged, as new data.`
	`72`	`+4._Add indicator (SimpleImputer)_: Adds a new column with 0s and 1s, with a 1 for rows with missing values and a 0 for rows without.`
	`73`	`+5._K neighbors (SMOTE)_: Specifies the number of neighbors to group together based on center point data.`
	`74`	`+6._Sampling strategy (SMOTE)_:`
	`75`	`+1._auto_: Automatically adjusts the ratio of minority to majority class data to balance out class imbalances. `
	`76`	`+2._minority_: Makes the size of the minority class dataset equal to the size of the majority class dataset.`
	`77`	`+3._float_: You can specify the desired class ratio. For example, setting it to 0.5 makes the minority class dataset half the size of the majority class dataset.`
	`78`	`+7._Estimator (MakeColumnTransformer)_: You can specify different global models to apply to each column. The model selected here will be applied to the columns selected_in Columns_ below.`
	`79`	`+`

0 commit comments

Comments

(0)

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Commitc3a0abc

File tree

6 files changed

6 files changed

`‎docs/.gitbook/assets/image (322).png‎`

`‎docs/.gitbook/assets/image (323).png‎`

`‎docs/.gitbook/assets/image (324).png‎`

`‎docs/.gitbook/assets/image (325).png‎`

`‎docs/.gitbook/assets/image (326).png‎`

`‎docs/machine-learning/3.-data-prep.md‎`

0 commit comments