About Vertex AI Neural Architecture Search

With Vertex AI Neural Architecture Search, you can search for optimalneural architectures in terms of accuracy, latency, memory, acombination of these, or a custom metric.

Determine whether Vertex AI Neural Architecture Search is the best tool for me

Important: Read this section carefully. To determine whether Vertex AI Neural Architecture Searchis a good fit for you:
  • Vertex AI Neural Architecture Search is a high-end optimization tool used to find bestneural architectures in terms of accuracy with or withoutconstraints such as latency, memory, or a custom metric. Thesearch space of possible neural architecture choicescan be as large as 10^20. It is based on a technique, which hassuccessfully generated several stateof the art computer vision models in the past years, includingNasnet,MNasnet,EfficientNet,NAS-FPN,andSpineNet.
  • Neural Architecture Search isn't a solution where you can just bring yourdata and expect a good result without experimentation. It is anexperimentation tool.
  • Neural Architecture Search isn't for hyperparameter tuningsuch as for tuning the learning rate or optimizer settings. It is only meantfor an architecture search. You shouldn't combinehyper-parameter tuning with Neural Architecture Search.
  • Neural Architecture Search is not recommended with limited training data orfor highly imbalanced datasets where some classes are very rare.If you are already using heavy augmentations for your baselinetraining due to lack of data, then Neural Architecture Searchis not recommended.
  • You should first try other traditional and conventional machinelearning methods and techniques such as hyperparameter tuning.You should use Neural Architecture Search only if you don't see furthergain with such traditional methods.
  • You should have an in-house team for model tuning, which has some basic ideaabout architecture parameters to modify and try. These architectureparameters can include the kernel size, number of channels or connections amongmany other possibilities. If you have a search space in mind to explore,then Neural Architecture Search is highly valuable and canreduce at least approximately six monthsof engineering time in exploring a large search space: up to 10^20architecture choices.
  • Neural Architecture Search is meant for enterprise customers who can spendseveral thousand dollars on an experiment.
  • Neural Architecture Search isn't limited to vision only use case.Currently, only vision-based prebuilt search spaces andprebuilt trainers are provided, butcustomers can bring their own non-vision search spaces and trainers as well.
  • Neural Architecture Search doesn't use asupernet(oneshot-NAS or weight-sharing based NAS) approach where you justbring your own data, and use it as a solution. It is non-trivial(months of effort) to customize a supernet. Unlike a supernet,Neural Architecture Search is highly customizable to define custom search spacesand rewards. The customization can be done in approximately one to two days.
  • Neural Architecture Search is supported in 8 regions across the world.Check theavailability in your region.

You should also read the following section on expected cost, result gains,and GPU quota requirements before using Neural Architecture Search.

Expected cost, result gains, and GPU quota requirements

NAS search.

The figure shows a typical Neural Architecture Search curve.TheY-axis shows the trial rewards, and theX-axis shows the number of trials launched.As the number of trials increase, the controller starts finding bettermodels. Therefore, the reward starts increasing, and later, thereward variance and the reward growth start decreasingand show the convergence. At the point of convergence, the number of trialscan vary based on the search-space size, but it is of the order ofapproximately 2000 trials.Each trial is designed to be a smaller version offull training calledproxy-taskwhich runs for approximately one to two hours on two NvidiaV100 GPUs. The customercan stop the search manually at any point and might find higherreward models compared to their baselinebefore the point of convergence occurs.It might be better to wait until the point of convergence occursto choose the better results.After the search, the next stage is to picktop 10 trials (models) and run a full training on them.

(Optional) Test drive the prebuilt MNasNet search space and trainer

In this mode, observe the search curveor a few trials, approximately 25, and do a test drive with a prebuiltMNasNet search space and trainer.

MnasNet toy run.

In the figure, the best stage-1 reward starts to climb up from ~0.30 at trial-1to ~0.37 at trial-17. Your exact run may look slightly differentdue to sampling randomness but you should see some small increase in the bestreward.Note that this is still a toy runand doesn't represent any proof-of-concept or a publicbenchmark validation.

The cost for this run is detailed as follows:

  • Stage-1:
    • Number of trials: 25
    • Number of GPUs per trial: 2
    • GPU type: TESLA_T4
    • Number of CPUs per trial: 1
    • CPU type: n1-highmem-16
    • Avg single trial training time: 3 hours
    • Number of parallel trials: 6
    • GPU quota used: (num-gpus-per-trial * num-parallel-trials) = 12 GPUs.Useus-central1 region for the test drive and host training datain the same region.No extra quota needed.
    • Time to run: (total-trials * training-time-per-trial)/(num-parallel-trials) = 12 hours
    • GPU hours: (total-trials * training-time-per-trial * num-gpus-per-trial) = 150 T4 GPU hours
    • CPU hours: (total-trials * training-time-per-trial * num-cpus-per-trial) = 75 n1-highmem-16 hours
    • Cost: Approximately $185. You can stop the job earlier to reduce the cost.Refer to thepricing page to calculateexact price.

Because this is a toy run, there is no need to run afull stage-2 training for models from stage-1. To learn more about running stage-2, seetutorial 3.

TheMnasNet notebookis used for this run.

(Optional) Proof-of-concept (POC) run of the prebuilt MNasNet search space and trainer

In case you are interested in almost replicating a publishedMNasnetresult, you can use this mode. According to the paper, MnasNet achieves75.2% top-1 accuracy with 78 ms latency on a Pixel phone, which is 1.8xfaster than the MobileNetV2 with 0.5% higher accuracy and 2.3xfaster than NASNet with 1.2% higher accuracy. However, this example usesGPUs instead of TPUs for training and uses cloud-CPU (n1-highmem-8)to evaluate latency. With this example, the expectedStage2 top-1 accuracy on MNasNet is 75.2% with 50mslatency on cloud-CPU (n1-highmem-8).

The cost for this run is detailed as follows:

  • Stage-1 search:

    • Number of trials: 2000
    • Number of GPUs per trial: 2
    • GPU type: TESLA_T4
    • Avg single trial training time: 3 hours
    • Number of parallel trials: 10
    • GPU quota used: (num-gpus-per-trial * num-parallel-trials) = 20 T4 GPUs.Since this number is above the default quota,create a quota request from your project UI. For more information, seesetting_up_path.
    • Time to run: (total-trials * training-time-per-trial)/(num-parallel-trials)/24 = 25 days.Note: The job terminates after 14 days. After that time, you canresume the search jobeasily with one command for another 14 days. If you have higher GPUquota, then the runtime decreases proportionately.
    • GPU hours: (total-trials * training-time-per-trial * num-gpus-per-trial) = 12000 T4 GPU hours.
    • Cost: ~$15,000
  • Stage-2 full-training with top 10 models:

    • Number of trials: 10
    • Number of GPUs per trial: 4
    • GPU type: TESLA_T4
    • Avg single trial training time: ~9 days
    • Number of parallel trials: 10
    • GPU quota used: (num-gpus-per-trial * num-parallel-trials) = 40 T4 GPUs.Because this number is above the default quota,create a quota request from your project UI. For more information, seesetting_up_path.You can also run this with 20 T4 GPUs by running the job twicewith five models at a time instead of all 10 in parallel.
    • Time to run: (total-trials * training-time-per-trial)/(num-parallel-trials)/24 = ~9 days
    • GPU hours: (total-trials * training-time-per-trial * num-gpus-per-trial) = 8960 T4 GPU hours.
    • Cost: ~$8,000

Total cost: Approximately $23,000. Refer to thepricing pageto calculate exact price. Note: This example isn't an average regular trainingjob. The full training runs forapproximately nine days on four TESLA_T4 GPUs.

TheMnasNet notebookis used for this run.

Using your search space and trainers

We provide an approximate cost for an average custom user.Your needs can vary depending on your training task and GPUsand CPUs used. You need at least 20 GPUs quota for an end-to-end runas documented here.Note: The performance gain is completely dependent on your task.We can only provide examples like MNasnet asreferenced examples for performance gain.

The cost for this hypothetical custom run is detailed as follows:

  • Stage-1 search:

    • Number of trials: 2,000
    • Number of GPUs per trial: 2
    • GPU type: TESLA_T4
    • Avg single trial training time: 1.5 hours
    • Number of parallel trials: 10
    • GPU quota used: (num-gpus-per-trial * num-parallel-trials) = 20 T4 GPUs.Because this number is above the default quota, you need tocreate a quota request from your project UI. For more information, seeRequest additional device quota for the project.
    • Time to run: (total-trials * training-time-per-trial)/(num-parallel-trials)/24 = 12.5 days
    • GPU hours: (total-trials * training-time-per-trial * num-gpus-per-trial) = 6000 T4 GPU hours.
    • Cost: approximately $7,400
  • Stage-2 full training with top 10 models:

    • Number of trials: 10
    • Number of GPUs per trial: 2
    • GPU type: TESLA_T4
    • Average single trial training time: approximately 4 days
    • Number of parallel trials: 10
    • GPU quota used: (num-gpus-per-trial * num-parallel-trials) = 20 T4 GPUs.**Since this number is above the default quota, you need tocreate a quota request from your project UI. For more information,seeRequest additional device quota for the project. Referto the same documentation for custom quota needs.
    • Time to run: (total-trials * training-time-per-trial)/(num-parallel-trials)/24 = approximately 4 days
    • GPU hours: (total-trials * training-time-per-trial * num-gpus-per-trial) = 1920 T4 GPU hours.
    • Cost: approximately $2,400
  • For more information on proxy-task design cost, seeProxy task designThe cost is similar to training 12 models(stage-2 in the figure uses 10 models):

    • GPU quota used: Same as stage-2 run in the figure.
    • Cost: (12/10) * stage-2-cost-for-10-models = ~$2,880

Total cost: approximately $12,680. Refer to thepricing pageto calculate exact price.

These stage-1 search cost are for the search until the convergencepoint is reached and for maximumperformance gain. However, don't wait until the search converges.You can expect to see a smaller amountof performance gain with a smaller search cost by running stage-2 full trainingwith the best model so far if the search-rewardcurve has started growing.For example, for thesearch-plot shown earlier,don't wait until the 2,000 trials for convergence are reached.You might have found bettermodels at 700 or 1,200 trials and can run stage-2 full training for those.You can always stop the search earlier to reduce the cost. You might alsodo stage-2 full training in parallel while the search is running, but make sureyou have GPU quota to support an extra parallel job.

Summary of performance and cost

The following table summarizes some data points with different use casesand associated performance and cost.

Note: Some of the datapoints shown here are for very large models with full trainingapproaching 10 days with multiple GPUs. The cost and performancevary based on your data and model size. Refer toUsing your search space and trainers to determine the average cost.

Summary.

Use cases and features

Neural Architecture Search features are both flexible and easy to use. Anovice user can use prebuilt search spaces, prebuilt-trainer,and notebooks without any further setup to start exploringVertex AI Neural Architecture Search for their dataset.At the same time, an expert user can use Neural Architecture Searchwith their custom trainer,custom search space, and custom inference device and even extendarchitecture-search for non-vision use cases as well.

Neural Architecture Search offers prebuilt trainers and search spaces to berun on GPUs for the following use cases:

  • Tensorflow trainers with public dataset based results published in a notebook
    • Image Object Detection with end to end (SpineNet) search spaces
    • Classification with prebuilt backbone (MnasNet) search spaces
    • LiDAR 3D Point Cloud Object Detection with prebuilt end to end search spaces
    • Latency and memory constrained search for targeting devices
  • PyTorch trainers to be used only as a tutorial example
    • PyTorch 3D medical image segmentation search space example
    • PyTorch-based MNasNet classification
    • Latency and memory constrained search for targeting devices
  • Additional Tensorflow based prebuilt state-of-the-art search spaces with code
    • Model Scaling
    • Data Augmentation

The full set of features that Neural Architecture Search offers can be used easilyfor customized architectures and use cases as well:

  • A Neural Architecture Search language to define a custom search space over possibleneural-architectures and integrate this search space withcustom trainer code.
  • Ready-to-use prebuilt state-of-the-art search spaces with code.
  • Ready-to-use prebuilt Trainer, with code, which runs on GPU.
  • A Managed Service for architecture-search including
    • A Neural Architecture Search controller which samples the search space to find the best architecture.
    • Prebuilt docker/libraries, with code, to calculate latency/FLOPs/Memoryon custom hardware.
  • Tutorials to teach NAS usage.
  • A set of tools to design proxy-tasks.
  • Guidance and example for efficient PyTorch training with Vertex AI.
  • Library support for custom metrics reporting and analysis.
  • Google Cloud console UI to monitor and manage jobs.
  • Easy to use notebooks to kick-start the search.
  • Library support for GPU/CPU resource usage management on per project orper job level of granularity.
  • Python-based Nas-client to build dockers, launch NAS jobs, and resume a previous search job.
  • Google Cloud console UI-based customer support.

Background

Neural Architecture Searchis a technique for automating the design ofneural networks. Ithas successfully generated several state of the art computer vision models inthe past years, including:

These resulting models are leading the way in all 3 key classes of computervision problems: image classification, object detection, and segmentation.

With Neural Architecture Search, engineers can optimize models foraccuracy,latency, andmemory in the same trial, reducing the time needed to deploymodels. Neural Architecture Search explores many different types of models: thecontroller proposes ML models, then trains and evaluates models and iterates1k+ times to find the best solutionswith latency and/or memory constraint ontargeting devices. The following figure shows the key components of the architecturesearch framework:

Components of a Neural Architecture Search framework.

  • Model: A neural architecture with operations and connections.
  • Search space: The space of possible models (operations and connections)that can be designed and optimized.
  • Trainer docker: User customizable trainer code to train andevaluate a model and compute accuracy of the model.
  • Inference device: A hardware device such as CPU/GPU on which the modellatency and memory usage is computed.
  • Reward: A combination of model metrics such as the accuracy,latency, and memory used for ranking the models as better or worse.
  • Neural Architecture Search Controller: The orchestrating algorithm that (a) samples themodels from the search space, (b) receives the model-rewards, and(c) provides next set of model suggestions to evaluate to find themost optimal models.

User setup tasks

Neural Architecture Search offers prebuilt trainer integrated withprebuilt search spaces which can be easily used with provided notebookswithout any further setup.

However, most users need to use their custom trainer, custom search spaces,custom metrics (memory, latency, and training time, for examples), and custom reward(combination of things such as accuracy and latency).For this, you need to:

  • Define a custom search space using the provided Neural Architecture Search language.
  • Integrate the search space definition into the trainer code.
  • Add custom metrics reporting to the trainer code.
  • Add custom reward to the trainer code.
  • Build training container and use it to start Neural Architecture Search jobs.

The following diagram illustrates this:

Neural Architecture Search setup in user environment.

Neural Architecture Search service in operation

After you set up the training container to use, the Neural Architecture Search servicethen launches multiple training-containers in parallel on multipleGPU devices. You can control how many trials to use in parallelfor training and how many total trials to launch. Eachtraining-container is provided a suggested architecture from the search space.The training-container builds the suggested model, does train/eval,and then reports rewards back to the Neural Architecture Search service. As this processprogresses, the Neural Architecture Search service uses the reward feedback to find better and bettermodel-architectures. After the search, you have access to the reportedmetrics for further analysis.

The Neural Architecture Search service in operation.

Overview of user journey for Neural Architecture Search

The high level steps for performing an Neural Architecture Search experiment are asfollows:

  • Setups and definitions:

    • Identify the labeled dataset and specify the task type(detection or segmentation, for example).
    • Customize trainer code:
      • Use a prebuilt search space or define a custom search space using the Neural Architecture Search language.
      • Integrate the search-space definition into the trainer code.
      • Add custom metrics reporting to the trainer code.
      • Add custom reward to the trainer code.
    • Build a trainer container.
    • Set up search trial parameters for partial training (proxy task). Thesearch training should ideally finish fast (for example, 30-60 minutes)to partially train the models:
      • Minimum epochs needed for sampled models to gatherreward (the minimum epochsdon't need to ensure model convergence).
      • Hyperparameters (for example, learning rate).
  • Run search locally to ensure the search space integrated container can run properly.

  • Start the Google Cloud search (stage-1) job with fivetest trials andverify that the search trials meet the runtime and accuracy goals.

  • Start the Google Cloud search (stage-1) job with +1k trials.

    • As part of the search, also set a regular interval to train(stage-2) top N models:

      • Hyperparameters and algorithm for hyperparameter search.stage-2 normally uses the similar configuration as stage-1,but with higher settings for certain parameters,such as training steps/epochs, and number of channels.
      • Stop criteria (the number of epochs).
  • Analyze the reported metrics and/or visualize architectures for insights.

An architecture-search experiment can be followed up by ascaling-search experiment, followed up by an augmentation search experimentas well.

Documentation reading order

  1. (Required)Set up your environment
  2. (Required)Tutorials
  3. (Required only for PyTorch customers)PyTorch efficient training with cloud data
  4. (Required)Best practices and suggested workflow
  5. (Required)Proxy task design
  6. (Required only when using prebuilt trainers)How to use prebuilt search spaces and a prebuilt trainer

References

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2026-02-18 UTC.