Data traps Stay organized with collections Save and categorize content based on your preferences.
Learning objectives
In this module, you will learn to:
- Investigate potential issues underlying raw or processed datasets, includingcollection and quality issues.
- Identify biases, invalid inferences, and rationalizations.
- Find common issues in data analysis, including correlation,relatedness, and irrelevance.
- Examine a chart for common problems, misperceptions, andmisleading display and design choices.
ML motivation
While not as glamorous as model architectures and other downstream model work,data exploration, documentation, and preprocessing are critical toML work. ML practitioners can fall into what Nithya Sambasivan et al. calleddata cascadesin their2021 ACM paperif they do not deeply understand:
- the conditions under which their data is collected
- the quality, characteristics, and limitations of the data
- what the data can and can't show
It's very expensive to train models on bad data andonly find out at the point of low-quality outputs that there were problemswith the data. Likewise, a failure to grasp the limitations of data, humanbiases in collecting data, or mistaking correlation for causation,can result in over-promising and under-delivering results, which can lead to aloss of trust.
This course walks through common but subtle data traps that ML and datapractitioners may encounter in their work.
Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2025-08-25 UTC.