Working with numerical data Stay organized with collections Save and categorize content based on your preferences.
Page Summary
This module focuses on preparing numerical data, such as temperature or weight, for use in machine learning models.
Machine learning practitioners spend significant time on data preparation tasks like cleaning and transformation.
The module covers techniques like feature scaling, outlier detection, and binning to improve data quality for model training.
Learners should have a basic understanding of machine learning concepts before starting this module.
Categorical data, like postal codes, will be addressed in a separate module due to its distinct characteristics and handling requirements.
- Understand feature vectors.
- Explore your dataset's potential features visually and mathematically.
- Identify outliers.
- Understand four different techniques to normalize numerical data.
- Understand binning and develop strategies for binning numerical data.
- Understand the characteristics of good continuous numerical features.
This module assumes you are familiar with the concepts covered in the following module:
ML practitioners spend far more time evaluating, cleaning, and transformingdata than building models.Data is so important that this course devotes three entire units to the topic:
- Working with numerical data (this unit)
- Working with categorical data
- Datasets, generalization, and overfitting
This unit focuses onnumerical data,meaning integers or floating-point valuesthat behave like numbers. That is, they are additive, countable, ordered,and so on. The next unit focuses oncategorical data, which caninclude numbers that behave like categories. The third unit focuses on how toprepare your data to ensure high-quality results when training and evaluatingyour model.
Examples of numerical data include:
- Temperature
- Weight
- The number of deer wintering in a nature preserve
In contrast, US postal codes, despitebeing five-digit or nine-digit numbers, don't behave like numbers or representmathematical relationships. Postal code 40004 (in Nelson County, Kentucky) isnot twice the quantity of postal code 20002 (in Washington, D.C.). These numbersrepresent categories, specifically geographic areas, and are consideredcategorical data.
Key terms:Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2025-08-25 UTC.