AutoML: Getting started

  • AutoML automates the process of developing machine learning models, requiring minimal coding for some tools while others offer more flexibility through APIs and CLIs for advanced users.

  • The AutoML workflow follows similar steps to traditional machine learning, including problem definition, data gathering, preparation, model development, evaluation, and potential retraining.

  • Data preparation remains crucial for AutoML, involving labeling, cleaning, formatting, and potentially feature transformations to ensure optimal model training.

  • No-code AutoML tools guide users through model development with steps like data import, analysis, refinement, and configuration of training parameters before initiating the automated training process.

  • After training, users can evaluate model performance, feature importance, and underlying architecture, with some AutoML systems even supporting model deployment and retraining.

If you are thinking about using AutoML, you may have questions about how itworks and what steps you should take to get started. This section dives deeperinto common AutoML patterns, explores how AutoML works, and examines what stepsyou may need to take before you begin using AutoML for your project.

AutoML tools

AutoML tools fall into two main categories:

  • Tools that require no coding typically take the form of web applicationsthat let you configure and run experiments through a user interface to findthe best model for your data without writing any code.
  • API and CLI tools provide advanced automation features, but require more(sometimes significantly more) programming and ML expertise.

AutoML tools that require coding can be more powerful and more flexible thanno-code tools, but they can also be more difficult to use. This module focuseson the no-code options for model development, but be aware that API and CLIoptions can help if you require customized automation.

AutoML workflow

Let's walk through a typical ML workflow and see how things work when you useAutoML. The high level steps in the workflow are the same as those you use forcustom training; the main difference is that AutoML handles some tasks for you.

Problem definition

The first step in any ML workflow is to define your problem. When you are usingAutoML, ensure that the tool you choose can support theobjectives of your ML project. Most AutoML tools support a variety of supervisedmachine learning algorithms and input data types.

For more information about problem framing, take a look at the module onIntroduction to Machine Learning Problem Framing.

Data gathering

Before you can start working with an AutoML tool, you need to collect your datainto a single data source. Check the product documentation to make sure thatyour tool supports: your data source, the data types in your dataset, the sizeof your dataset.

Data preparation

Data preparation is an area where AutoML tools can help you, but notool can do everything automatically, so expect to do some work before youcan import your data into the tool. Data preparation for AutoML is similar towhat you would need to do to train a model manually. If you need to know moreabout how to prepare your data for training, take a look at the Data Preparationsection.

For more information on preparing your data, see theworking with numerical dataandworking with categorical datamodules.

Before importing your data for AutoML training, you need to complete thesesteps:

  • Label your data

    Every example in your dataset needs a label.

  • Clean and format data

    Real-world data tends to be messy, so expect to clean your data before usingit. Even with AutoML you need to determine the best treatments for yourparticular dataset and problem. This might require some exploration andpotentially multiple AutoML runs before you get the best results.

  • Perform feature transformations

    Some AutoML tools handle certain feature transformations for you. But, ifthe tool you are using does not support a feature transform that you need ordoes not support it well, you may need to perform the transformations aheadof time.

Model development (with a no-code AutoML)

AutoML does the work for you during training. However, before you starttraining, you need to configure your experiment. To set up an AutoML trainingrun, you typically need to specify these high level steps:

  1. Import your data

    To import your data, specify your data source. During the importprocess, the AutoML tool assigns a semantic data type to each data value.

  2. Analyze your data

    AutoML products usually provide tools to analyze your dataset before andafter training. As a best practice, you may want to use these analysis toolsto understand and verify your data before starting an AutoML run.

  3. Refine your data

    AutoML tools often provide mechanisms to help you refine your data afterimporting and before training. Here are a few tasks you may want to completeto refine your data:

    • Semantic Checking: During import, AutoML tools try to determine thecorrect semantic type for each feature, but these are only guesses.You should check the types designated to all features and change themif they were assigned incorrectly.

      For example, you may have postal codes stored as numbers in a column inyour database. Most AutoML systems would detect the data as continuousnumeric data. This would be incorrect for a postal code and the userwould probably want to change the semantic type to categorical ratherthan continuous for this feature column.

    • Transformations: Some tools allow users to customize datatransformations as part of the refinement process. Sometimes this isneeded when a dataset has potentially predictive features that need tobe transformed or combined in a way that is difficult for AutoML toolsto determine without help.

      For example, consider a housing dataset that you are using to predictthe sale price for a house. Suppose there is feature that represents thedescription for a house listing calleddescription and you wouldlike to use this data to create a new feature calleddescription_length. Some AutoML systems offer ways to use customtransformations. For this example, there might be aLENGTH functionto generate a new description length feature like this:LENGTH(description).

  4. Configure AutoML run parameters

    The last step before running your training experiment is to choose a fewconfiguration settings to tell the tool how you want it to train your model.Though each AutoML tool has its own unique set of configuration options,here are a few of the significant configuration tasks you may need tocomplete:

    • Select the ML problem type you plan to solve. For example, are yousolving a classification or regression problem?
    • Select which column in your dataset is the label.
    • Select the set of features to use to train the model.
    • Select the set of ML algorithms AutoML considers in the model search.
    • Select the evaluation metric AutoML uses to choose the best model.

After configuring your AutoML experiment, you are ready to start the trainingrun. Training may take a while to complete (on the order of hours).

Evaluate model

After training, you can examine the results by using the tools your AutoMLproduct provides to help you:

  • Evaluate your features by examining feature importance metrics.
  • Understand your model by examining the architecture and hyperparameters usedto build it.
  • Evaluate top-level model performance with plots and metrics collected duringtraining for the output model.

Productionization

Though it is outside the scope of this module, some AutoML systems can help youtest and deploy your model.

Retrain model

You might need to retrain the model with new data. This might happen after youevaluate your AutoML training run or after your model is in production for sometime. Either way, AutoML systems can help with retraining too. It is notuncommon to take another look at your data after an AutoML run, and retrain withan improved dataset.

Key terms:

What's next

Congratulations on finishing this module!

We encourage you to explore the variousMLCC modulesat your own pace and interest. If you'd like to follow a recommended order,we suggest that you move to the following module next:ML Fairness.


Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2025-08-25 UTC.