|
1 | 1 | #2. Data Split |
2 | 2 |
|
| 3 | + |
| 4 | + |
| 5 | +<figure><imgsrc="../.gitbook/assets/image (146).png"alt=""width="211"><figcaption></figcaption></figure> |
| 6 | + |
| 7 | +1. Click on_**Data Split**_ in the_**Machine Learning**_ category. |
| 8 | + |
| 9 | + |
| 10 | + |
| 11 | +<figure><imgsrc="../.gitbook/assets/image (147).png"alt=""width="563"><figcaption></figcaption></figure> |
| 12 | + |
| 13 | +2._**Input Data**_: Choose whether the target data is included in the input data. If it is, select_**Feature Data**_ and_**Target Data**_ separately. You can also select specific columns from one dataset using the_**funnel icon**_. |
| 14 | +3._**Test Size**_: Select the percentage of input data to use for testing purposes. |
| 15 | +4._**Random State**_: Generate the same random state, ensuring consistent data splits each time. (If not set, data will be randomly split differently each time.) |
| 16 | +5._**Shuffle**_: Shuffle the data randomly to prevent the model from relying on the order of the data, thereby reducing bias and improving generalization performance. |
| 17 | +6._**Stratify**_: Maintain class ratios when splitting the data to prevent over-representation of certain classes (Classification). |
| 18 | +7._**Allocate to**_: Assign variable names to the split data. |
| 19 | +8._**Code View**_: Preview the code that will be output. |
| 20 | +9._**Run**_: Execute the code. |
| 21 | + |