Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Commitc3a0abc

Browse files
minjk-blgitbook-bot
authored andcommitted
GITBOOK-69: Data Prep
1 parent890f3ec commitc3a0abc

File tree

6 files changed

+69
-10
lines changed

6 files changed

+69
-10
lines changed
236 KB
Loading
48.9 KB
Loading
66.2 KB
Loading
63.6 KB
Loading
77.6 KB
Loading
Lines changed: 69 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -1,20 +1,79 @@
1-
#3. Data Prep
2-
3-
1+
---
2+
description:Tools for Preprocessing(Encoding/Scaling)
3+
---
44

5-
<figure><imgsrc="../.gitbook/assets/image (148).png"alt=""width="211"><figcaption></figcaption></figure>
6-
7-
1. Click on Data Prep in the Machine Learning category.
5+
#3. Data Prep
86

7+
<figure><imgsrc="../.gitbook/assets/image (322).png"alt=""width="529"><figcaption></figcaption></figure>
98

9+
1. Click on**Data Prep** in the**Machine Learning** category.
1010

11-
<figure><imgsrc="../.gitbook/assets/image (149).png"alt=""width="563"><figcaption></figcaption></figure>
11+
<figure><imgsrc="../.gitbook/assets/image (323).png"alt=""width="563"><figcaption></figcaption></figure>
1212

1313
2._**Model Type**_: You can perform various preprocessing tasks:
14-
* Encoding
15-
* Scaling
16-
* ETC
14+
*[**Encoding**](3.-data-prep.md#encoding)
15+
*[**Scaling**](3.-data-prep.md#scaling)
16+
*[**ETC**](3.-data-prep.md#etc-simpleimputer-smote-makecolumntransformer)
1717
3._**Allocate to**_: Assign variable names for the model to perform the selected preprocessing tasks.
1818
4._**Code View**_: Preview the code that will be output.
1919
5._**Run**_: Execute the code.
2020

21+
22+
23+
***
24+
25+
##Encoding
26+
27+
<figure><imgsrc="../.gitbook/assets/image (324).png"alt=""width="563"><figcaption></figcaption></figure>
28+
29+
1._**Sparse (OneHotEncoder)**_: If_**true**,_ returns the encoding result as a sparse matrix.
30+
2._**Handle unknown (OneHotEncoder, OrdinalEncoder)**_: Used when encoding, if there is a category that exists in the training data but not in the test data. If_**ignore** is_ selected, it will be set to 0, and if_**error**_ is selected, a ValueError will be raised.
31+
3._**Unknown values (OrdinalEncoder)**_: Fill with a specific value, not ignore or error.
32+
4._**Cols (TargetEncoder)**_: Select the columns to encode.
33+
5._**Handle missing (TargetEncoder)**_: Choose how to handle missing values.
34+
6._**Smoothing (TargetEncoder)**_: When the number of data in a particular category is small, it adds the entered values and calculates the average of the categories to prevent overfitting.
35+
36+
37+
38+
***
39+
40+
##Scaling
41+
42+
<figure><imgsrc="../.gitbook/assets/image (325).png"alt=""width="563"><figcaption></figcaption></figure>
43+
44+
1._**With mean (StandardScaler)**_: Center the mean of the data to zero.
45+
2._**With std (StandardScaler)**_: Scale the standard deviation of the data to 1.
46+
3._**With centering (RobustScaler)**_: Performs centering by Q-subtracting the median from each attribute (column)_._
47+
4._**With scaling (RobustScaler)**_: Scales each attribute by dividing it by its IQR.
48+
5._**Feature range (MinMaxScaler)**_: Sets the minimum and maximum values for the scaled result.
49+
6._**Norm (Normalizer)**:_
50+
1._**L1**_: The sum of the absolute values of each attribute will be 1.
51+
2._**L2**_: Scale the vectors so that their Euclidean distance is 1.&#x20;
52+
3._**Max Norm**_: Ensures that the scaling result does not exceed an existing maximum value.
53+
7._**N bins (KBins Discretizer)**_: Determines how many bins to divide the variable into.
54+
8._**Strategy (KBins Discretizer)**_:
55+
1._**uniform**_: Divide the section by a uniform width.
56+
2._**QUANTILE**_: Divide so that each bin has an even number of data.
57+
9._**Encode (KBins Discretizer)**_: Specify the encoding method.
58+
1._**ordinal**_: Encodes each interval as an integer.
59+
2._**onehot**_: Encodes each interval as a binary vector.
60+
61+
62+
63+
***
64+
65+
##ETC(SimpleImputer / SMOTE / MakeColumnTransformer)
66+
67+
<figure><imgsrc="../.gitbook/assets/image (326).png"alt=""width="563"><figcaption></figcaption></figure>
68+
69+
1._**Missing values (SimpleImputer)**_: Treats the entered values as missing.
70+
2._**Fill value (SimpleImputer)**_: Replaces_the_ missing value with the input value.
71+
3._**Copy (SimpleImputer)**_: Returns the original data unchanged, as new data.
72+
4._**Add indicator (SimpleImputer)**_: Adds a new column with 0s and 1s, with a 1 for rows with missing values and a 0 for rows without.
73+
5._**K neighbors (SMOTE)**_: Specifies the number of neighbors to group together based on center point data.
74+
6._**Sampling strategy (SMOTE)**_:
75+
1._**auto**_: Automatically adjusts the ratio of minority to majority class data to balance out class imbalances.&#x20;
76+
2._**minority**_: Makes the size of the minority class dataset equal to the size of the majority class dataset.
77+
3._**float**_: You can specify the desired class ratio. For example, setting it to 0.5 makes the minority class dataset half the size of the majority class dataset.
78+
7._**Estimator (MakeColumnTransformer)**_: You can specify different global models to apply to each column. The model selected here will be applied to the columns selected_in Columns_ below.
79+

0 commit comments

Comments
 (0)

[8]ページ先頭

©2009-2025 Movatter.jp