|
1 | 1 | #4-5. Frame - Data Cleaning |
2 | 2 |
|
| 3 | +<figure><imgsrc="../../.gitbook/assets/image (217).png"alt=""width="470"><figcaption></figcaption></figure> |
3 | 4 |
|
4 | | - |
5 | | -<figure><imgsrc="../../.gitbook/assets/image (24).png"alt=""><figcaption></figcaption></figure> |
6 | | - |
7 | | -1._**Fill NA**_: Replace NA with another value. |
8 | | -2._**Drop NA**_: Remove rows or columns containing NA. |
9 | | -3._**Fill Outlier**_: Replace outliers in specific columns. |
10 | | -4._**Drop Outlier**_: Remove outliers in specific columns. |
| 5 | +1._**Fill NA**_: Replace the value NA with another value. |
| 6 | +2._**Drop NA**_: Removes rows or columns that contain NA values. |
| 7 | +3._**Fill Outlier**_: Replaces outliers in a specific column. |
| 8 | +4._**Drop Outlier**_: Removes outliers in a specific column. |
11 | 9 | 5._**Drop Duplicates**_: Remove duplicate values. |
12 | 10 |
|
13 | 11 |
|
|
16 | 14 |
|
17 | 15 | ###Fill NA |
18 | 16 |
|
| 17 | +<figure><imgsrc="../../.gitbook/assets/image (218).png"alt=""width="388"><figcaption></figcaption></figure> |
19 | 18 |
|
20 | | - |
21 | | -<figure><imgsrc="../../.gitbook/assets/image (25).png"alt=""width="375"><figcaption></figcaption></figure> |
22 | | - |
23 | | -1._**Method**_: Choose the filling method. |
24 | | - |
25 | | -  1-1._**Value**_: Replace NA with the specified input value. |
26 | | - |
27 | | -  1-2._**Forward/Back Fill**_: Replace NA with values from the front/back. If NA is consecutive, you can set the '_**Limit**_' to determine how many values to fill. |
28 | | - |
29 | | -  1-3._**Statistics**_: Fill in with statistical properties. |
| 19 | +1._**Method**_: Select a fill method. |
| 20 | +1.**Replace**_**Value**_: NA with the input value. |
| 21 | +2._**Forward/Back Fill**_: Replace the NA with the value before/after it. If there are consecutive NA's, you can limit the fill to only a few NA's. |
| 22 | +3._**Statistics**_: Replace NA with Statistics. |
30 | 23 |
|
31 | 24 |
|
32 | 25 |
|
33 | 26 | *** |
34 | 27 |
|
35 | 28 | ###Drop NA |
36 | 29 |
|
37 | | - |
38 | | - |
39 | | -<figure><imgsrc="../../.gitbook/assets/image (28).png"alt=""width="375"><figcaption></figcaption></figure> |
| 30 | +<figure><imgsrc="../../.gitbook/assets/image (219).png"alt=""width="398"><figcaption></figcaption></figure> |
40 | 31 |
|
41 | 32 | 1._**How**_ |
42 | | - |
43 | | -  1-1._**Select Options**_: Keep only rows with the number of non-NA values set by the_**threshold**_, and delete the rest. |
44 | | - |
45 | | -  1-2._**Any**_: Delete rows if there is at least one NA in the row. |
46 | | - |
47 | | -  1-3._**All**_: Delete rows if all values in the row are NA. |
48 | | - |
49 | | -2._**Ignore Index**_: Choose whether to reset the index after row deletion. |
| 33 | +1._**Select Options**_: If the number of non-missing values in any row is less than the value set in_**Threshold**,_ delete that row. |
| 34 | +2._**Any**_: If there is any NA in the row, delete the row. |
| 35 | +3._**All**_: If all values in a row are NA, delete the row. |
| 36 | +2._**Ignore Index**_: Choose whether to reset the index after the operation. |
50 | 37 |
|
51 | 38 |
|
52 | 39 |
|
53 | 40 | *** |
54 | 41 |
|
55 | 42 | ###Drop Duplicates |
56 | 43 |
|
| 44 | +<figure><imgsrc="../../.gitbook/assets/image (220).png"alt=""width="371"><figcaption></figcaption></figure> |
57 | 45 |
|
58 | | - |
59 | | -<figure><imgsrc="../../.gitbook/assets/image (29).png"alt=""width="375"><figcaption></figcaption></figure> |
60 | | - |
61 | | -1._**Keep**_: Choose the value to retain among the duplicate values. Selecting_**False**_ will result in the deletion of all duplicate values. |
62 | | -2. _**Ignore Index**_: Choose whether to reset the index after duplicate values deletion. |
| 46 | +1._**Keep**_: Select which of the duplicate values you want to keep. If you select_**False**,_ all duplicate values will be deleted. |
| 47 | +2._**Ignore Index**_: Choose whether to reset the index after the operation. |
63 | 48 |
|