Rate this Page

★★★★★

beginner/hyperparameter_tuning_tutorial

Note

Go to the endto download the full example code.

Hyperparameter tuning using Ray Tune#

Created On: Aug 31, 2020 | Last Updated: Jan 08, 2026 | Last Verified: Nov 05, 2024

Author:Ricardo Decal

This tutorial shows how to integrate Ray Tune into your PyTorch trainingworkflow to perform scalable and efficient hyperparameter tuning.

What you will learn

How to modify a PyTorch training loop for Ray Tune
How to scale a hyperparameter sweep to multiple nodes and GPUs without code changes
How to define a hyperparameter search space and run a sweep withtune.Tuner
How to use an early-stopping scheduler (ASHA) and report metrics/checkpoints
How to use checkpointing to resume training and load the best model

Prerequisites

PyTorch v2.9+ andtorchvision
Ray Tune (ray[tune]) v2.52.1+
GPU(s) are optional, but recommended for faster training

Ray, a project of thePyTorch Foundation, is an open source unified framework for scaling AIand Python applications. It helps run distributed jobs by handling thecomplexity of distributed computing.RayTune is a librarybuilt on Ray for hyperparameter tuning that enables you to scale ahyperparameter sweep from your machine to a large cluster with no codechanges.

This tutorial adapts thePyTorch tutorial for training a CIFAR10classifierto run multi-GPU hyperparameter sweeps with Ray Tune.

Setup#

To run this tutorial, install the following dependencies:

pipinstall"ray[tune]"torchvision

Then start with the imports:

fromfunctoolsimportpartialimportosimporttempfilefrompathlibimportPathimporttorchimporttorch.nnasnnimporttorch.nn.functionalasFimporttorch.optimasoptimfromtorch.utils.dataimportrandom_splitimporttorchvisionimporttorchvision.transformsastransforms# New: imports for Ray Tuneimportrayfromrayimporttunefromray.tuneimportCheckpointfromray.tune.schedulersimportASHAScheduler

Data loading#

Wrap the data loaders in a constructor function. In this tutorial, aglobal data directory is passed to the function to enable reusing thedataset across different trials. In a cluster environment, you can useshared storage, such as network file systems, to prevent each node fromdownloading the data separately.

defload_data(data_dir="./data"):# Mean and standard deviation of the CIFAR10 training subset.transform=transforms.Compose([transforms.ToTensor(),transforms.Normalize((0.4914,0.48216,0.44653),(0.2022,0.19932,0.20086))])trainset=torchvision.datasets.CIFAR10(root=data_dir,train=True,download=True,transform=transform)testset=torchvision.datasets.CIFAR10(root=data_dir,train=False,download=True,transform=transform)returntrainset,testset

Model architecture#

This tutorial searches for the best sizes for the fully connected layersand the learning rate. To enable this, theNet class exposes thelayer sizesl1 andl2 as configurable parameters that Ray Tunecan search over:

classNet(nn.Module):def__init__(self,l1=120,l2=84):super().__init__()self.conv1=nn.Conv2d(3,6,5)self.pool=nn.MaxPool2d(2,2)self.conv2=nn.Conv2d(6,16,5)self.fc1=nn.Linear(16*5*5,l1)self.fc2=nn.Linear(l1,l2)self.fc3=nn.Linear(l2,10)defforward(self,x):x=self.pool(F.relu(self.conv1(x)))x=self.pool(F.relu(self.conv2(x)))x=torch.flatten(x,1)# flatten all dimensions except batchx=F.relu(self.fc1(x))x=F.relu(self.fc2(x))x=self.fc3(x)returnx

Define the search space#

Next, define the hyperparameters to tune and how Ray Tune samples them.Ray Tune offers a variety ofsearch spacedistributionsto suit different parameter types:loguniform,uniform,choice,randint,grid, and more. You can also expresscomplex dependencies between parameters withconditional searchspacesor sample from arbitrary functions.

Here is the search space for this tutorial:

config={"l1":tune.choice([2**iforiinrange(9)]),"l2":tune.choice([2**iforiinrange(9)]),"lr":tune.loguniform(1e-4,1e-1),"batch_size":tune.choice([2,4,8,16]),}

Thetune.choice() accepts a list of values that are uniformlysampled from. In this example, thel1 andl2 parameter valuesare powers of 2 between 1 and 256, and the learning rate samples on alog scale between 0.0001 and 0.1. Sampling on a log scale enablesexploration across a range of magnitudes on a relative scale, ratherthan an absolute scale.

Training function#

Ray Tune requires a training function that accepts a configurationdictionary and runs the main training loop. As Ray Tune runs differenttrials, it updates the configuration dictionary for each trial.

Here is the full training function, followed by explanations of the keyRay Tune integration points:

deftrain_cifar(config,data_dir=None):net=Net(config["l1"],config["l2"])device=config["device"]net=net.to(device)iftorch.cuda.device_count()>1:net=nn.DataParallel(net)criterion=nn.CrossEntropyLoss()optimizer=optim.SGD(net.parameters(),lr=config["lr"],momentum=0.9)# Load checkpoint if resuming trainingcheckpoint=tune.get_checkpoint()ifcheckpoint:withcheckpoint.as_directory()ascheckpoint_dir:checkpoint_path=Path(checkpoint_dir)/"checkpoint.pt"checkpoint_state=torch.load(checkpoint_path)start_epoch=checkpoint_state["epoch"]net.load_state_dict(checkpoint_state["net_state_dict"])optimizer.load_state_dict(checkpoint_state["optimizer_state_dict"])else:start_epoch=0trainset,_testset=load_data(data_dir)test_abs=int(len(trainset)*0.8)train_subset,val_subset=random_split(trainset,[test_abs,len(trainset)-test_abs])trainloader=torch.utils.data.DataLoader(train_subset,batch_size=int(config["batch_size"]),shuffle=True,num_workers=8)valloader=torch.utils.data.DataLoader(val_subset,batch_size=int(config["batch_size"]),shuffle=True,num_workers=8)forepochinrange(start_epoch,10):# loop over the dataset multiple timesrunning_loss=0.0epoch_steps=0fori,datainenumerate(trainloader,0):# get the inputs; data is a list of [inputs, labels]inputs,labels=datainputs,labels=inputs.to(device),labels.to(device)# zero the parameter gradientsoptimizer.zero_grad()# forward + backward + optimizeoutputs=net(inputs)loss=criterion(outputs,labels)loss.backward()optimizer.step()# print statisticsrunning_loss+=loss.item()epoch_steps+=1ifi%2000==1999:# print every 2000 mini-batchesprint("[%d,%5d] loss:%.3f"%(epoch+1,i+1,running_loss/epoch_steps))running_loss=0.0# Validation lossval_loss=0.0val_steps=0total=0correct=0fori,datainenumerate(valloader,0):withtorch.no_grad():inputs,labels=datainputs,labels=inputs.to(device),labels.to(device)outputs=net(inputs)_,predicted=torch.max(outputs.data,1)total+=labels.size(0)correct+=(predicted==labels).sum().item()loss=criterion(outputs,labels)val_loss+=loss.cpu().numpy()val_steps+=1# Save checkpoint and report metricscheckpoint_data={"epoch":epoch,"net_state_dict":net.state_dict(),"optimizer_state_dict":optimizer.state_dict(),}withtempfile.TemporaryDirectory()ascheckpoint_dir:checkpoint_path=Path(checkpoint_dir)/"checkpoint.pt"torch.save(checkpoint_data,checkpoint_path)checkpoint=Checkpoint.from_directory(checkpoint_dir)tune.report({"loss":val_loss/val_steps,"accuracy":correct/total},checkpoint=checkpoint,)print("Finished Training")

Key integration points#

Using hyperparameters from the configuration dictionary#

Ray Tune updates theconfig dictionary with the hyperparameters foreach trial. In this example, the model architecture and optimizerreceive the hyperparameters from theconfig dictionary:

net=Net(config["l1"],config["l2"])optimizer=optim.SGD(net.parameters(),lr=config["lr"],momentum=0.9)

Reporting metrics and saving checkpoints#

The most important integration is communicating with Ray Tune. Ray Tuneuses the validation metrics to determine the best hyperparameterconfiguration and to stop underperforming trials early, savingresources.

Checkpointing enables you to later load the trained models, resumehyperparameter searches, and provides fault tolerance. It’s alsorequired for some Ray Tune schedulers likePopulation BasedTrainingthat pause and resume trials during the search.

This code from the training function loads model and optimizer state atthe start if a checkpoint exists:

checkpoint=tune.get_checkpoint()ifcheckpoint:withcheckpoint.as_directory()ascheckpoint_dir:checkpoint_path=Path(checkpoint_dir)/"checkpoint.pt"checkpoint_state=torch.load(checkpoint_path)start_epoch=checkpoint_state["epoch"]net.load_state_dict(checkpoint_state["net_state_dict"])optimizer.load_state_dict(checkpoint_state["optimizer_state_dict"])

At the end of each epoch, save a checkpoint and report the validationmetrics:

checkpoint_data={"epoch":epoch,"net_state_dict":net.state_dict(),"optimizer_state_dict":optimizer.state_dict(),}withtempfile.TemporaryDirectory()ascheckpoint_dir:checkpoint_path=Path(checkpoint_dir)/"checkpoint.pt"torch.save(checkpoint_data,checkpoint_path)checkpoint=Checkpoint.from_directory(checkpoint_dir)tune.report({"loss":val_loss/val_steps,"accuracy":correct/total},checkpoint=checkpoint,)

Ray Tune checkpointing supports local file systems, cloud storage, anddistributed file systems. For more information, see theRay Tunestoragedocumentation.

Multi-GPU support#

Image classification models can be greatly accelerated by using GPUs.The training function supports multi-GPU training by wrapping the modelinnn.DataParallel:

iftorch.cuda.device_count()>1:net=nn.DataParallel(net)

This training function supports training on CPUs, a single GPU, multiple GPUs, ormultiple nodes without code changes. Ray Tune automatically distributes the trialsacross the nodes according to the available resources. Ray Tune also supportsfractionalGPUsso that one GPU can be shared among multiple trials, provided that themodels, optimizers, and data batches fit into the GPU memory.

Validation split#

The original CIFAR10 dataset only has train and test subsets. This issufficient for training a single model, however for hyperparametertuning a validation subset is required. The training function creates avalidation subset by reserving 20% of the training subset. The testsubset is used to evaluate the best model’s generalization error afterthe search completes.

Evaluation function#

After finding the optimal hyperparameters, test the model on a held-outtest set to estimate the generalization error:

deftest_accuracy(net,device="cpu",data_dir=None):_trainset,testset=load_data(data_dir)testloader=torch.utils.data.DataLoader(testset,batch_size=4,shuffle=False,num_workers=2)correct=0total=0withtorch.no_grad():fordataintestloader:image_batch,labels=dataimage_batch,labels=image_batch.to(device),labels.to(device)outputs=net(image_batch)_,predicted=torch.max(outputs.data,1)total+=labels.size(0)correct+=(predicted==labels).sum().item()returncorrect/total

Configure and run Ray Tune#

With the training and evaluation functions defined, configure Ray Tuneto run the hyperparameter search.

Scheduler for early stopping#

Ray Tune provides schedulers to improve the efficiency of thehyperparameter search by detecting underperforming trials and stoppingthem early. TheASHAScheduler uses the Asynchronous SuccessiveHalving Algorithm (ASHA) to aggressively terminate low-performingtrials:

scheduler=ASHAScheduler(max_t=max_num_epochs,grace_period=1,reduction_factor=2,)

Ray Tune also providesadvanced searchalgorithmsto smartly pick the next set of hyperparameters based on previousresults, instead of relying only on random or grid search. ExamplesincludeOptunaandBayesOpt.

Resource allocation#

Tell Ray Tune what resources to allocate for each trial by passing aresources dictionary totune.with_resources:

tune.with_resources(partial(train_cifar,data_dir=data_dir),resources={"cpu":cpus_per_trial,"gpu":gpus_per_trial})

Ray Tune automatically manages the placement of these trials and ensuresthat the trials run in isolation, so you don’t need to manually assignGPUs to processes.

For example, if you are running this experiment on a cluster of 20machines, each with 8 GPUs, you can setgpus_per_trial=0.5 toschedule two concurrent trials per GPU. This configuration runs 320trials in parallel across the cluster.

Note

To run this tutorial without GPUs, setgpus_per_trial=0and expect significantly longer runtimes.

To avoid long runtimes during development, start with a small numberof trials and epochs.

Creating the Tuner#

The Ray Tune API is modular and composable. Pass your configuration tothetune.Tuner class to create a tuner object, then runtuner.fit() to start training:

tuner=tune.Tuner(tune.with_resources(partial(train_cifar,data_dir=data_dir),resources={"cpu":cpus_per_trial,"gpu":gpus_per_trial}),tune_config=tune.TuneConfig(metric="loss",mode="min",scheduler=scheduler,num_samples=num_trials,),param_space=config,)results=tuner.fit()

After training completes, retrieve the best performing trial, load itscheckpoint, and evaluate on the test set.

Putting it all together#

defmain(num_trials=10,max_num_epochs=10,gpus_per_trial=0,cpus_per_trial=2):print("Starting hyperparameter tuning.")ray.init(include_dashboard=False)data_dir=os.path.abspath("./data")load_data(data_dir)# Pre-download the datasetdevice="cuda"iftorch.cuda.is_available()else"cpu"config={"l1":tune.choice([2**iforiinrange(9)]),"l2":tune.choice([2**iforiinrange(9)]),"lr":tune.loguniform(1e-4,1e-1),"batch_size":tune.choice([2,4,8,16]),"device":device,}scheduler=ASHAScheduler(max_t=max_num_epochs,grace_period=1,reduction_factor=2,)tuner=tune.Tuner(tune.with_resources(partial(train_cifar,data_dir=data_dir),resources={"cpu":cpus_per_trial,"gpu":gpus_per_trial}),tune_config=tune.TuneConfig(metric="loss",mode="min",scheduler=scheduler,num_samples=num_trials,),param_space=config,)results=tuner.fit()best_result=results.get_best_result("loss","min")print(f"Best trial config:{best_result.config}")print(f"Best trial final validation loss:{best_result.metrics['loss']}")print(f"Best trial final validation accuracy:{best_result.metrics['accuracy']}")best_trained_model=Net(best_result.config["l1"],best_result.config["l2"])best_trained_model=best_trained_model.to(device)ifgpus_per_trial>1:best_trained_model=nn.DataParallel(best_trained_model)best_checkpoint=best_result.checkpointwithbest_checkpoint.as_directory()ascheckpoint_dir:checkpoint_path=Path(checkpoint_dir)/"checkpoint.pt"best_checkpoint_data=torch.load(checkpoint_path)best_trained_model.load_state_dict(best_checkpoint_data["net_state_dict"])test_acc=test_accuracy(best_trained_model,device,data_dir)print(f"Best trial test set accuracy:{test_acc}")if__name__=="__main__":# Set the number of trials, epochs, and GPUs per trial here:main(num_trials=10,max_num_epochs=10,gpus_per_trial=1)

Starting hyperparameter tuning.2026-02-19 16:52:30,880 WARNING services.py:2137 -- WARNING: The object store is using /tmp instead of /dev/shm because /dev/shm has only 2147471360 bytes available. This will harm performance! You may be able to free up space by deleting files in /dev/shm. If you are inside a Docker container, you can increase /dev/shm size by passing '--shm-size=10.24gb' to 'docker run' (or add it to the run_options list in a Ray cluster config). Make sure to set this to more than 30% of available RAM.2026-02-19 16:52:31,073 INFO worker.py:2023 -- Started a local Ray instance./usr/local/lib/python3.10/dist-packages/ray/_private/worker.py:2062: FutureWarning:Tip: In future versions of Ray, Ray will no longer override accelerator visible devices env var if num_gpus=0 or num_gpus=None (default). To enable this behavior and turn off this error message, set RAY_ACCEL_ENV_VAR_OVERRIDE_ON_ZERO=0  0%|          | 0.00/170M [00:00<?, ?B/s]  0%|          | 459k/170M [00:00<00:38, 4.45MB/s]  2%|▏         | 3.54M/170M [00:00<00:08, 19.8MB/s]  4%|▍         | 6.42M/170M [00:00<00:06, 23.9MB/s]  6%|▌         | 9.40M/170M [00:00<00:06, 26.0MB/s]  7%|▋         | 12.5M/170M [00:00<00:05, 27.5MB/s]  9%|▉         | 15.5M/170M [00:00<00:05, 28.4MB/s] 11%|█         | 18.6M/170M [00:00<00:05, 29.2MB/s] 13%|█▎        | 21.8M/170M [00:00<00:04, 30.0MB/s] 15%|█▍        | 25.0M/170M [00:00<00:04, 30.5MB/s] 17%|█▋        | 28.2M/170M [00:01<00:04, 31.0MB/s] 18%|█▊        | 31.5M/170M [00:01<00:04, 31.4MB/s] 20%|██        | 34.8M/170M [00:01<00:04, 31.9MB/s] 22%|██▏       | 38.1M/170M [00:01<00:04, 32.2MB/s] 24%|██▍       | 41.5M/170M [00:01<00:03, 32.5MB/s] 26%|██▋       | 44.9M/170M [00:01<00:03, 32.8MB/s] 28%|██▊       | 48.3M/170M [00:01<00:03, 33.3MB/s] 30%|███       | 52.0M/170M [00:01<00:03, 34.2MB/s] 33%|███▎      | 55.8M/170M [00:01<00:03, 35.3MB/s] 35%|███▍      | 59.6M/170M [00:01<00:03, 36.2MB/s] 37%|███▋      | 63.5M/170M [00:02<00:02, 37.0MB/s] 40%|███▉      | 67.4M/170M [00:02<00:02, 37.4MB/s] 42%|████▏     | 71.2M/170M [00:02<00:02, 37.8MB/s] 44%|████▍     | 75.1M/170M [00:02<00:02, 37.9MB/s] 46%|████▋     | 79.0M/170M [00:02<00:02, 38.2MB/s] 49%|████▊     | 83.0M/170M [00:02<00:02, 38.7MB/s] 51%|█████     | 87.1M/170M [00:02<00:02, 39.1MB/s] 53%|█████▎    | 91.2M/170M [00:02<00:02, 39.4MB/s] 56%|█████▌    | 95.3M/170M [00:02<00:01, 39.9MB/s] 58%|█████▊    | 99.5M/170M [00:02<00:01, 40.5MB/s] 61%|██████    | 104M/170M [00:03<00:01, 40.8MB/s] 63%|██████▎   | 108M/170M [00:03<00:01, 41.1MB/s] 66%|██████▌   | 112M/170M [00:03<00:01, 41.1MB/s] 68%|██████▊   | 116M/170M [00:03<00:01, 41.3MB/s] 71%|███████   | 120M/170M [00:03<00:01, 41.3MB/s] 73%|███████▎  | 125M/170M [00:03<00:01, 41.5MB/s] 76%|███████▌  | 129M/170M [00:03<00:01, 41.6MB/s] 78%|███████▊  | 133M/170M [00:03<00:00, 41.7MB/s] 80%|████████  | 137M/170M [00:03<00:00, 41.7MB/s] 83%|████████▎ | 141M/170M [00:03<00:00, 41.8MB/s] 85%|████████▌ | 146M/170M [00:04<00:00, 42.1MB/s] 88%|████████▊ | 150M/170M [00:04<00:00, 42.5MB/s] 91%|█████████ | 154M/170M [00:04<00:00, 42.7MB/s] 93%|█████████▎| 159M/170M [00:04<00:00, 43.0MB/s] 96%|█████████▌| 163M/170M [00:04<00:00, 43.2MB/s] 98%|█████████▊| 168M/170M [00:04<00:00, 43.4MB/s]100%|██████████| 170M/170M [00:04<00:00, 37.1MB/s]╭────────────────────────────────────────────────────────────────────╮│ Configuration for experiment     train_cifar_2026-02-19_16-52-39   │├────────────────────────────────────────────────────────────────────┤│ Search algorithm                 BasicVariantGenerator             ││ Scheduler                        AsyncHyperBandScheduler           ││ Number of trials                 10                                │╰────────────────────────────────────────────────────────────────────╯View detailed results here: /var/lib/ci-user/ray_results/train_cifar_2026-02-19_16-52-39To visualize your results with TensorBoard, run: `tensorboard --logdir /tmp/ray/session_2026-02-19_16-52-29_424498_3915/artifacts/2026-02-19_16-52-39/train_cifar_2026-02-19_16-52-39/driver_artifacts`Trial status: 10 PENDINGCurrent time: 2026-02-19 16:52:39. Total running time: 0sLogical resource usage: 2.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:A10G)╭───────────────────────────────────────────────────────────────────────────────╮│ Trial name                status       l1     l2            lr     batch_size │├───────────────────────────────────────────────────────────────────────────────┤│ train_cifar_669ef_00000   PENDING      32      4   0.00732106               2 ││ train_cifar_669ef_00001   PENDING     128      8   0.0139934                8 ││ train_cifar_669ef_00002   PENDING       4    256   0.00049422              16 ││ train_cifar_669ef_00003   PENDING       4     64   0.00662052              16 ││ train_cifar_669ef_00004   PENDING       8      1   0.000384563              2 ││ train_cifar_669ef_00005   PENDING       8     32   0.0489862                4 ││ train_cifar_669ef_00006   PENDING     256    128   0.000152455              2 ││ train_cifar_669ef_00007   PENDING       1     64   0.00384703              16 ││ train_cifar_669ef_00008   PENDING       1      4   0.073969                 4 ││ train_cifar_669ef_00009   PENDING       1     32   0.00174425              16 │╰───────────────────────────────────────────────────────────────────────────────╯Trial train_cifar_669ef_00000 started with configuration:╭──────────────────────────────────────────────────╮│ Trial train_cifar_669ef_00000 config             │├──────────────────────────────────────────────────┤│ batch_size                                     2 ││ device                                      cuda ││ l1                                            32 ││ l2                                             4 ││ lr                                       0.00732 │╰──────────────────────────────────────────────────╯(func pid=5037) [1,  2000] loss: 2.309(func pid=5037) [1,  4000] loss: 1.152(pid=gcs_server) [2026-02-19 16:52:59,937 E 3920 3920] (gcs_server) gcs_server.cc:303: Failed to establish connection to the event+metrics exporter agent. Events and metrics will not be exported. Exporter agent status: RpcError: Running out of retries to initialize the metrics agent. rpc_code: 14(raylet) [2026-02-19 16:53:01,039 E 4060 4060] (raylet) main.cc:979: Failed to establish connection to the metrics exporter agent. Metrics will not be exported. Exporter agent status: RpcError: Running out of retries to initialize the metrics agent. rpc_code: 14(func pid=5037) [1,  6000] loss: 0.770(bundle_reservation_check_func pid=4136) [2026-02-19 16:53:01,633 E 4136 4380] core_worker_process.cc:837: Failed to establish connection to the metrics exporter agent. Metrics will not be exported. Exporter agent status: RpcError: Running out of retries to initialize the metrics agent. rpc_code: 14(func pid=5037) [1,  8000] loss: 0.578Trial status: 1 RUNNING | 9 PENDINGCurrent time: 2026-02-19 16:53:10. Total running time: 30sLogical resource usage: 2.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:A10G)╭───────────────────────────────────────────────────────────────────────────────╮│ Trial name                status       l1     l2            lr     batch_size │├───────────────────────────────────────────────────────────────────────────────┤│ train_cifar_669ef_00000   RUNNING      32      4   0.00732106               2 ││ train_cifar_669ef_00001   PENDING     128      8   0.0139934                8 ││ train_cifar_669ef_00002   PENDING       4    256   0.00049422              16 ││ train_cifar_669ef_00003   PENDING       4     64   0.00662052              16 ││ train_cifar_669ef_00004   PENDING       8      1   0.000384563              2 ││ train_cifar_669ef_00005   PENDING       8     32   0.0489862                4 ││ train_cifar_669ef_00006   PENDING     256    128   0.000152455              2 ││ train_cifar_669ef_00007   PENDING       1     64   0.00384703              16 ││ train_cifar_669ef_00008   PENDING       1      4   0.073969                 4 ││ train_cifar_669ef_00009   PENDING       1     32   0.00174425              16 │╰───────────────────────────────────────────────────────────────────────────────╯(func pid=5037) [2026-02-19 16:53:10,616 E 5037 5072] core_worker_process.cc:837: Failed to establish connection to the metrics exporter agent. Metrics will not be exported. Exporter agent status: RpcError: Running out of retries to initialize the metrics agent. rpc_code: 14 [repeated 14x across cluster] (Ray deduplicates logs by default. Set RAY_DEDUP_LOGS=0 to disable log deduplication, or see https://docs.ray.io/en/master/ray-observability/user-guides/configure-logging.html#log-deduplication for more options.)(func pid=5037) [1, 10000] loss: 0.462(func pid=5037) [1, 12000] loss: 0.385(func pid=5037) [1, 14000] loss: 0.330(func pid=5037) [1, 16000] loss: 0.289(func pid=5037) [1, 18000] loss: 0.257(func pid=5037) [1, 20000] loss: 0.231Trial status: 1 RUNNING | 9 PENDINGCurrent time: 2026-02-19 16:53:40. Total running time: 1min 0sLogical resource usage: 2.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:A10G)╭───────────────────────────────────────────────────────────────────────────────╮│ Trial name                status       l1     l2            lr     batch_size │├───────────────────────────────────────────────────────────────────────────────┤│ train_cifar_669ef_00000   RUNNING      32      4   0.00732106               2 ││ train_cifar_669ef_00001   PENDING     128      8   0.0139934                8 ││ train_cifar_669ef_00002   PENDING       4    256   0.00049422              16 ││ train_cifar_669ef_00003   PENDING       4     64   0.00662052              16 ││ train_cifar_669ef_00004   PENDING       8      1   0.000384563              2 ││ train_cifar_669ef_00005   PENDING       8     32   0.0489862                4 ││ train_cifar_669ef_00006   PENDING     256    128   0.000152455              2 ││ train_cifar_669ef_00007   PENDING       1     64   0.00384703              16 ││ train_cifar_669ef_00008   PENDING       1      4   0.073969                 4 ││ train_cifar_669ef_00009   PENDING       1     32   0.00174425              16 │╰───────────────────────────────────────────────────────────────────────────────╯(func pid=5037) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2026-02-19_16-52-39/train_cifar_669ef_00000_0_batch_size=2,l1=32,l2=4,lr=0.0073_2026-02-19_16-52-39/checkpoint_000000)(func pid=5037) [2,  2000] loss: 2.311(func pid=5037) [2,  4000] loss: 1.156(func pid=5037) [2,  6000] loss: 0.771(func pid=5037) [2,  8000] loss: 0.578Trial status: 1 RUNNING | 9 PENDINGCurrent time: 2026-02-19 16:54:10. Total running time: 1min 30sLogical resource usage: 2.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:A10G)Current best trial: 669ef_00000 with loss=2.3054263219833375 and params={'l1': 32, 'l2': 4, 'lr': 0.0073210577383846474, 'batch_size': 2, 'device': 'cuda'}╭──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮│ Trial name                status       l1     l2            lr     batch_size     iter     total time (s)      loss     accuracy │├──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤│ train_cifar_669ef_00000   RUNNING      32      4   0.00732106               2        1            62.7377   2.30543        0.103 ││ train_cifar_669ef_00001   PENDING     128      8   0.0139934                8                                                    ││ train_cifar_669ef_00002   PENDING       4    256   0.00049422              16                                                    ││ train_cifar_669ef_00003   PENDING       4     64   0.00662052              16                                                    ││ train_cifar_669ef_00004   PENDING       8      1   0.000384563              2                                                    ││ train_cifar_669ef_00005   PENDING       8     32   0.0489862                4                                                    ││ train_cifar_669ef_00006   PENDING     256    128   0.000152455              2                                                    ││ train_cifar_669ef_00007   PENDING       1     64   0.00384703              16                                                    ││ train_cifar_669ef_00008   PENDING       1      4   0.073969                 4                                                    ││ train_cifar_669ef_00009   PENDING       1     32   0.00174425              16                                                    │╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯(func pid=5037) [2, 10000] loss: 0.462(func pid=5037) [2, 12000] loss: 0.385(func pid=5037) [2, 14000] loss: 0.330(func pid=5037) [2, 16000] loss: 0.289(func pid=5037) [2, 18000] loss: 0.257(func pid=5037) [2, 20000] loss: 0.231Trial status: 1 RUNNING | 9 PENDINGCurrent time: 2026-02-19 16:54:40. Total running time: 2min 0sLogical resource usage: 2.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:A10G)Current best trial: 669ef_00000 with loss=2.3054263219833375 and params={'l1': 32, 'l2': 4, 'lr': 0.0073210577383846474, 'batch_size': 2, 'device': 'cuda'}╭──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮│ Trial name                status       l1     l2            lr     batch_size     iter     total time (s)      loss     accuracy │├──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤│ train_cifar_669ef_00000   RUNNING      32      4   0.00732106               2        1            62.7377   2.30543        0.103 ││ train_cifar_669ef_00001   PENDING     128      8   0.0139934                8                                                    ││ train_cifar_669ef_00002   PENDING       4    256   0.00049422              16                                                    ││ train_cifar_669ef_00003   PENDING       4     64   0.00662052              16                                                    ││ train_cifar_669ef_00004   PENDING       8      1   0.000384563              2                                                    ││ train_cifar_669ef_00005   PENDING       8     32   0.0489862                4                                                    ││ train_cifar_669ef_00006   PENDING     256    128   0.000152455              2                                                    ││ train_cifar_669ef_00007   PENDING       1     64   0.00384703              16                                                    ││ train_cifar_669ef_00008   PENDING       1      4   0.073969                 4                                                    ││ train_cifar_669ef_00009   PENDING       1     32   0.00174425              16                                                    │╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯(func pid=5037) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2026-02-19_16-52-39/train_cifar_669ef_00000_0_batch_size=2,l1=32,l2=4,lr=0.0073_2026-02-19_16-52-39/checkpoint_000001)(func pid=5037) [3,  2000] loss: 2.312(func pid=5037) [3,  4000] loss: 1.156(func pid=5037) [3,  6000] loss: 0.770(func pid=5037) [3,  8000] loss: 0.578Trial status: 1 RUNNING | 9 PENDINGCurrent time: 2026-02-19 16:55:10. Total running time: 2min 30sLogical resource usage: 2.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:A10G)Current best trial: 669ef_00000 with loss=2.3116847174167634 and params={'l1': 32, 'l2': 4, 'lr': 0.0073210577383846474, 'batch_size': 2, 'device': 'cuda'}╭──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮│ Trial name                status       l1     l2            lr     batch_size     iter     total time (s)      loss     accuracy │├──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤│ train_cifar_669ef_00000   RUNNING      32      4   0.00732106               2        2            123.619   2.31168       0.0973 ││ train_cifar_669ef_00001   PENDING     128      8   0.0139934                8                                                    ││ train_cifar_669ef_00002   PENDING       4    256   0.00049422              16                                                    ││ train_cifar_669ef_00003   PENDING       4     64   0.00662052              16                                                    ││ train_cifar_669ef_00004   PENDING       8      1   0.000384563              2                                                    ││ train_cifar_669ef_00005   PENDING       8     32   0.0489862                4                                                    ││ train_cifar_669ef_00006   PENDING     256    128   0.000152455              2                                                    ││ train_cifar_669ef_00007   PENDING       1     64   0.00384703              16                                                    ││ train_cifar_669ef_00008   PENDING       1      4   0.073969                 4                                                    ││ train_cifar_669ef_00009   PENDING       1     32   0.00174425              16                                                    │╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯(func pid=5037) [3, 10000] loss: 0.462(func pid=5037) [3, 12000] loss: 0.385(func pid=5037) [3, 14000] loss: 0.330(func pid=5037) [3, 16000] loss: 0.289(func pid=5037) [3, 18000] loss: 0.257(func pid=5037) [3, 20000] loss: 0.231Trial status: 1 RUNNING | 9 PENDINGCurrent time: 2026-02-19 16:55:40. Total running time: 3min 0sLogical resource usage: 2.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:A10G)Current best trial: 669ef_00000 with loss=2.3116847174167634 and params={'l1': 32, 'l2': 4, 'lr': 0.0073210577383846474, 'batch_size': 2, 'device': 'cuda'}╭──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮│ Trial name                status       l1     l2            lr     batch_size     iter     total time (s)      loss     accuracy │├──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤│ train_cifar_669ef_00000   RUNNING      32      4   0.00732106               2        2            123.619   2.31168       0.0973 ││ train_cifar_669ef_00001   PENDING     128      8   0.0139934                8                                                    ││ train_cifar_669ef_00002   PENDING       4    256   0.00049422              16                                                    ││ train_cifar_669ef_00003   PENDING       4     64   0.00662052              16                                                    ││ train_cifar_669ef_00004   PENDING       8      1   0.000384563              2                                                    ││ train_cifar_669ef_00005   PENDING       8     32   0.0489862                4                                                    ││ train_cifar_669ef_00006   PENDING     256    128   0.000152455              2                                                    ││ train_cifar_669ef_00007   PENDING       1     64   0.00384703              16                                                    ││ train_cifar_669ef_00008   PENDING       1      4   0.073969                 4                                                    ││ train_cifar_669ef_00009   PENDING       1     32   0.00174425              16                                                    │╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯(func pid=5037) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2026-02-19_16-52-39/train_cifar_669ef_00000_0_batch_size=2,l1=32,l2=4,lr=0.0073_2026-02-19_16-52-39/checkpoint_000002)(func pid=5037) [4,  2000] loss: 2.309(func pid=5037) [4,  4000] loss: 1.156(func pid=5037) [4,  6000] loss: 0.771(func pid=5037) [4,  8000] loss: 0.577Trial status: 1 RUNNING | 9 PENDINGCurrent time: 2026-02-19 16:56:10. Total running time: 3min 30sLogical resource usage: 2.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:A10G)Current best trial: 669ef_00000 with loss=2.3128548283100128 and params={'l1': 32, 'l2': 4, 'lr': 0.0073210577383846474, 'batch_size': 2, 'device': 'cuda'}╭──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮│ Trial name                status       l1     l2            lr     batch_size     iter     total time (s)      loss     accuracy │├──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤│ train_cifar_669ef_00000   RUNNING      32      4   0.00732106               2        3            184.509   2.31285       0.0978 ││ train_cifar_669ef_00001   PENDING     128      8   0.0139934                8                                                    ││ train_cifar_669ef_00002   PENDING       4    256   0.00049422              16                                                    ││ train_cifar_669ef_00003   PENDING       4     64   0.00662052              16                                                    ││ train_cifar_669ef_00004   PENDING       8      1   0.000384563              2                                                    ││ train_cifar_669ef_00005   PENDING       8     32   0.0489862                4                                                    ││ train_cifar_669ef_00006   PENDING     256    128   0.000152455              2                                                    ││ train_cifar_669ef_00007   PENDING       1     64   0.00384703              16                                                    ││ train_cifar_669ef_00008   PENDING       1      4   0.073969                 4                                                    ││ train_cifar_669ef_00009   PENDING       1     32   0.00174425              16                                                    │╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯(func pid=5037) [4, 10000] loss: 0.462(func pid=5037) [4, 12000] loss: 0.385(func pid=5037) [4, 14000] loss: 0.330(func pid=5037) [4, 16000] loss: 0.289(func pid=5037) [4, 18000] loss: 0.257(func pid=5037) [4, 20000] loss: 0.231Trial status: 1 RUNNING | 9 PENDINGCurrent time: 2026-02-19 16:56:40. Total running time: 4min 0sLogical resource usage: 2.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:A10G)Current best trial: 669ef_00000 with loss=2.3128548283100128 and params={'l1': 32, 'l2': 4, 'lr': 0.0073210577383846474, 'batch_size': 2, 'device': 'cuda'}╭──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮│ Trial name                status       l1     l2            lr     batch_size     iter     total time (s)      loss     accuracy │├──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤│ train_cifar_669ef_00000   RUNNING      32      4   0.00732106               2        3            184.509   2.31285       0.0978 ││ train_cifar_669ef_00001   PENDING     128      8   0.0139934                8                                                    ││ train_cifar_669ef_00002   PENDING       4    256   0.00049422              16                                                    ││ train_cifar_669ef_00003   PENDING       4     64   0.00662052              16                                                    ││ train_cifar_669ef_00004   PENDING       8      1   0.000384563              2                                                    ││ train_cifar_669ef_00005   PENDING       8     32   0.0489862                4                                                    ││ train_cifar_669ef_00006   PENDING     256    128   0.000152455              2                                                    ││ train_cifar_669ef_00007   PENDING       1     64   0.00384703              16                                                    ││ train_cifar_669ef_00008   PENDING       1      4   0.073969                 4                                                    ││ train_cifar_669ef_00009   PENDING       1     32   0.00174425              16                                                    │╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯(func pid=5037) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2026-02-19_16-52-39/train_cifar_669ef_00000_0_batch_size=2,l1=32,l2=4,lr=0.0073_2026-02-19_16-52-39/checkpoint_000003)(func pid=5037) [5,  2000] loss: 2.313(func pid=5037) [5,  4000] loss: 1.155(func pid=5037) [5,  6000] loss: 0.770(func pid=5037) [5,  8000] loss: 0.578Trial status: 1 RUNNING | 9 PENDINGCurrent time: 2026-02-19 16:57:10. Total running time: 4min 30sLogical resource usage: 2.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:A10G)Current best trial: 669ef_00000 with loss=2.3107058206081392 and params={'l1': 32, 'l2': 4, 'lr': 0.0073210577383846474, 'batch_size': 2, 'device': 'cuda'}╭──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮│ Trial name                status       l1     l2            lr     batch_size     iter     total time (s)      loss     accuracy │├──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤│ train_cifar_669ef_00000   RUNNING      32      4   0.00732106               2        4            244.802   2.31071       0.0996 ││ train_cifar_669ef_00001   PENDING     128      8   0.0139934                8                                                    ││ train_cifar_669ef_00002   PENDING       4    256   0.00049422              16                                                    ││ train_cifar_669ef_00003   PENDING       4     64   0.00662052              16                                                    ││ train_cifar_669ef_00004   PENDING       8      1   0.000384563              2                                                    ││ train_cifar_669ef_00005   PENDING       8     32   0.0489862                4                                                    ││ train_cifar_669ef_00006   PENDING     256    128   0.000152455              2                                                    ││ train_cifar_669ef_00007   PENDING       1     64   0.00384703              16                                                    ││ train_cifar_669ef_00008   PENDING       1      4   0.073969                 4                                                    ││ train_cifar_669ef_00009   PENDING       1     32   0.00174425              16                                                    │╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯(func pid=5037) [5, 10000] loss: 0.462(func pid=5037) [5, 12000] loss: 0.385(func pid=5037) [5, 14000] loss: 0.330(func pid=5037) [5, 16000] loss: 0.289(func pid=5037) [5, 18000] loss: 0.257Trial status: 1 RUNNING | 9 PENDINGCurrent time: 2026-02-19 16:57:40. Total running time: 5min 0sLogical resource usage: 2.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:A10G)Current best trial: 669ef_00000 with loss=2.3107058206081392 and params={'l1': 32, 'l2': 4, 'lr': 0.0073210577383846474, 'batch_size': 2, 'device': 'cuda'}╭──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮│ Trial name                status       l1     l2            lr     batch_size     iter     total time (s)      loss     accuracy │├──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤│ train_cifar_669ef_00000   RUNNING      32      4   0.00732106               2        4            244.802   2.31071       0.0996 ││ train_cifar_669ef_00001   PENDING     128      8   0.0139934                8                                                    ││ train_cifar_669ef_00002   PENDING       4    256   0.00049422              16                                                    ││ train_cifar_669ef_00003   PENDING       4     64   0.00662052              16                                                    ││ train_cifar_669ef_00004   PENDING       8      1   0.000384563              2                                                    ││ train_cifar_669ef_00005   PENDING       8     32   0.0489862                4                                                    ││ train_cifar_669ef_00006   PENDING     256    128   0.000152455              2                                                    ││ train_cifar_669ef_00007   PENDING       1     64   0.00384703              16                                                    ││ train_cifar_669ef_00008   PENDING       1      4   0.073969                 4                                                    ││ train_cifar_669ef_00009   PENDING       1     32   0.00174425              16                                                    │╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯(func pid=5037) [5, 20000] loss: 0.231(func pid=5037) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2026-02-19_16-52-39/train_cifar_669ef_00000_0_batch_size=2,l1=32,l2=4,lr=0.0073_2026-02-19_16-52-39/checkpoint_000004)(func pid=5037) [6,  2000] loss: 2.309(func pid=5037) [6,  4000] loss: 1.156(func pid=5037) [6,  6000] loss: 0.770(func pid=5037) [6,  8000] loss: 0.578Trial status: 1 RUNNING | 9 PENDINGCurrent time: 2026-02-19 16:58:10. Total running time: 5min 30sLogical resource usage: 2.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:A10G)Current best trial: 669ef_00000 with loss=2.3117804392337797 and params={'l1': 32, 'l2': 4, 'lr': 0.0073210577383846474, 'batch_size': 2, 'device': 'cuda'}╭──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮│ Trial name                status       l1     l2            lr     batch_size     iter     total time (s)      loss     accuracy │├──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤│ train_cifar_669ef_00000   RUNNING      32      4   0.00732106               2        5            305.741   2.31178        0.103 ││ train_cifar_669ef_00001   PENDING     128      8   0.0139934                8                                                    ││ train_cifar_669ef_00002   PENDING       4    256   0.00049422              16                                                    ││ train_cifar_669ef_00003   PENDING       4     64   0.00662052              16                                                    ││ train_cifar_669ef_00004   PENDING       8      1   0.000384563              2                                                    ││ train_cifar_669ef_00005   PENDING       8     32   0.0489862                4                                                    ││ train_cifar_669ef_00006   PENDING     256    128   0.000152455              2                                                    ││ train_cifar_669ef_00007   PENDING       1     64   0.00384703              16                                                    ││ train_cifar_669ef_00008   PENDING       1      4   0.073969                 4                                                    ││ train_cifar_669ef_00009   PENDING       1     32   0.00174425              16                                                    │╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯(func pid=5037) [6, 10000] loss: 0.462(func pid=5037) [6, 12000] loss: 0.385(func pid=5037) [6, 14000] loss: 0.330(func pid=5037) [6, 16000] loss: 0.289(func pid=5037) [6, 18000] loss: 0.257Trial status: 1 RUNNING | 9 PENDINGCurrent time: 2026-02-19 16:58:40. Total running time: 6min 0sLogical resource usage: 2.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:A10G)Current best trial: 669ef_00000 with loss=2.3117804392337797 and params={'l1': 32, 'l2': 4, 'lr': 0.0073210577383846474, 'batch_size': 2, 'device': 'cuda'}╭──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮│ Trial name                status       l1     l2            lr     batch_size     iter     total time (s)      loss     accuracy │├──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤│ train_cifar_669ef_00000   RUNNING      32      4   0.00732106               2        5            305.741   2.31178        0.103 ││ train_cifar_669ef_00001   PENDING     128      8   0.0139934                8                                                    ││ train_cifar_669ef_00002   PENDING       4    256   0.00049422              16                                                    ││ train_cifar_669ef_00003   PENDING       4     64   0.00662052              16                                                    ││ train_cifar_669ef_00004   PENDING       8      1   0.000384563              2                                                    ││ train_cifar_669ef_00005   PENDING       8     32   0.0489862                4                                                    ││ train_cifar_669ef_00006   PENDING     256    128   0.000152455              2                                                    ││ train_cifar_669ef_00007   PENDING       1     64   0.00384703              16                                                    ││ train_cifar_669ef_00008   PENDING       1      4   0.073969                 4                                                    ││ train_cifar_669ef_00009   PENDING       1     32   0.00174425              16                                                    │╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯(func pid=5037) [6, 20000] loss: 0.231(func pid=5037) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2026-02-19_16-52-39/train_cifar_669ef_00000_0_batch_size=2,l1=32,l2=4,lr=0.0073_2026-02-19_16-52-39/checkpoint_000005)(func pid=5037) [7,  2000] loss: 2.311(func pid=5037) [7,  4000] loss: 1.155(func pid=5037) [7,  6000] loss: 0.770Trial status: 1 RUNNING | 9 PENDINGCurrent time: 2026-02-19 16:59:10. Total running time: 6min 30sLogical resource usage: 2.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:A10G)Current best trial: 669ef_00000 with loss=2.311542234945297 and params={'l1': 32, 'l2': 4, 'lr': 0.0073210577383846474, 'batch_size': 2, 'device': 'cuda'}╭──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮│ Trial name                status       l1     l2            lr     batch_size     iter     total time (s)      loss     accuracy │├──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤│ train_cifar_669ef_00000   RUNNING      32      4   0.00732106               2        6            366.529   2.31154       0.1036 ││ train_cifar_669ef_00001   PENDING     128      8   0.0139934                8                                                    ││ train_cifar_669ef_00002   PENDING       4    256   0.00049422              16                                                    ││ train_cifar_669ef_00003   PENDING       4     64   0.00662052              16                                                    ││ train_cifar_669ef_00004   PENDING       8      1   0.000384563              2                                                    ││ train_cifar_669ef_00005   PENDING       8     32   0.0489862                4                                                    ││ train_cifar_669ef_00006   PENDING     256    128   0.000152455              2                                                    ││ train_cifar_669ef_00007   PENDING       1     64   0.00384703              16                                                    ││ train_cifar_669ef_00008   PENDING       1      4   0.073969                 4                                                    ││ train_cifar_669ef_00009   PENDING       1     32   0.00174425              16                                                    │╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯(func pid=5037) [7,  8000] loss: 0.577(func pid=5037) [7, 10000] loss: 0.462(func pid=5037) [7, 12000] loss: 0.385(func pid=5037) [7, 14000] loss: 0.330(func pid=5037) [7, 16000] loss: 0.289(func pid=5037) [7, 18000] loss: 0.257Trial status: 1 RUNNING | 9 PENDINGCurrent time: 2026-02-19 16:59:40. Total running time: 7min 0sLogical resource usage: 2.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:A10G)Current best trial: 669ef_00000 with loss=2.311542234945297 and params={'l1': 32, 'l2': 4, 'lr': 0.0073210577383846474, 'batch_size': 2, 'device': 'cuda'}╭──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮│ Trial name                status       l1     l2            lr     batch_size     iter     total time (s)      loss     accuracy │├──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤│ train_cifar_669ef_00000   RUNNING      32      4   0.00732106               2        6            366.529   2.31154       0.1036 ││ train_cifar_669ef_00001   PENDING     128      8   0.0139934                8                                                    ││ train_cifar_669ef_00002   PENDING       4    256   0.00049422              16                                                    ││ train_cifar_669ef_00003   PENDING       4     64   0.00662052              16                                                    ││ train_cifar_669ef_00004   PENDING       8      1   0.000384563              2                                                    ││ train_cifar_669ef_00005   PENDING       8     32   0.0489862                4                                                    ││ train_cifar_669ef_00006   PENDING     256    128   0.000152455              2                                                    ││ train_cifar_669ef_00007   PENDING       1     64   0.00384703              16                                                    ││ train_cifar_669ef_00008   PENDING       1      4   0.073969                 4                                                    ││ train_cifar_669ef_00009   PENDING       1     32   0.00174425              16                                                    │╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯(func pid=5037) [7, 20000] loss: 0.231(func pid=5037) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2026-02-19_16-52-39/train_cifar_669ef_00000_0_batch_size=2,l1=32,l2=4,lr=0.0073_2026-02-19_16-52-39/checkpoint_000006)(func pid=5037) [8,  2000] loss: 2.312(func pid=5037) [8,  4000] loss: 1.156(func pid=5037) [8,  6000] loss: 0.771Trial status: 1 RUNNING | 9 PENDINGCurrent time: 2026-02-19 17:00:10. Total running time: 7min 31sLogical resource usage: 2.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:A10G)Current best trial: 669ef_00000 with loss=2.3077846806526185 and params={'l1': 32, 'l2': 4, 'lr': 0.0073210577383846474, 'batch_size': 2, 'device': 'cuda'}╭──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮│ Trial name                status       l1     l2            lr     batch_size     iter     total time (s)      loss     accuracy │├──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤│ train_cifar_669ef_00000   RUNNING      32      4   0.00732106               2        7            427.174   2.30778        0.103 ││ train_cifar_669ef_00001   PENDING     128      8   0.0139934                8                                                    ││ train_cifar_669ef_00002   PENDING       4    256   0.00049422              16                                                    ││ train_cifar_669ef_00003   PENDING       4     64   0.00662052              16                                                    ││ train_cifar_669ef_00004   PENDING       8      1   0.000384563              2                                                    ││ train_cifar_669ef_00005   PENDING       8     32   0.0489862                4                                                    ││ train_cifar_669ef_00006   PENDING     256    128   0.000152455              2                                                    ││ train_cifar_669ef_00007   PENDING       1     64   0.00384703              16                                                    ││ train_cifar_669ef_00008   PENDING       1      4   0.073969                 4                                                    ││ train_cifar_669ef_00009   PENDING       1     32   0.00174425              16                                                    │╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯(func pid=5037) [8,  8000] loss: 0.578(func pid=5037) [8, 10000] loss: 0.462(func pid=5037) [8, 12000] loss: 0.385(func pid=5037) [8, 14000] loss: 0.330(func pid=5037) [8, 16000] loss: 0.289(func pid=5037) [8, 18000] loss: 0.257Trial status: 1 RUNNING | 9 PENDINGCurrent time: 2026-02-19 17:00:40. Total running time: 8min 1sLogical resource usage: 2.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:A10G)Current best trial: 669ef_00000 with loss=2.3077846806526185 and params={'l1': 32, 'l2': 4, 'lr': 0.0073210577383846474, 'batch_size': 2, 'device': 'cuda'}╭──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮│ Trial name                status       l1     l2            lr     batch_size     iter     total time (s)      loss     accuracy │├──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤│ train_cifar_669ef_00000   RUNNING      32      4   0.00732106               2        7            427.174   2.30778        0.103 ││ train_cifar_669ef_00001   PENDING     128      8   0.0139934                8                                                    ││ train_cifar_669ef_00002   PENDING       4    256   0.00049422              16                                                    ││ train_cifar_669ef_00003   PENDING       4     64   0.00662052              16                                                    ││ train_cifar_669ef_00004   PENDING       8      1   0.000384563              2                                                    ││ train_cifar_669ef_00005   PENDING       8     32   0.0489862                4                                                    ││ train_cifar_669ef_00006   PENDING     256    128   0.000152455              2                                                    ││ train_cifar_669ef_00007   PENDING       1     64   0.00384703              16                                                    ││ train_cifar_669ef_00008   PENDING       1      4   0.073969                 4                                                    ││ train_cifar_669ef_00009   PENDING       1     32   0.00174425              16                                                    │╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯(func pid=5037) [8, 20000] loss: 0.231(func pid=5037) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2026-02-19_16-52-39/train_cifar_669ef_00000_0_batch_size=2,l1=32,l2=4,lr=0.0073_2026-02-19_16-52-39/checkpoint_000007)(func pid=5037) [9,  2000] loss: 2.310(func pid=5037) [9,  4000] loss: 1.156(func pid=5037) [9,  6000] loss: 0.770Trial status: 1 RUNNING | 9 PENDINGCurrent time: 2026-02-19 17:01:10. Total running time: 8min 31sLogical resource usage: 2.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:A10G)Current best trial: 669ef_00000 with loss=2.310666144037247 and params={'l1': 32, 'l2': 4, 'lr': 0.0073210577383846474, 'batch_size': 2, 'device': 'cuda'}╭──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮│ Trial name                status       l1     l2            lr     batch_size     iter     total time (s)      loss     accuracy │├──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤│ train_cifar_669ef_00000   RUNNING      32      4   0.00732106               2        8            487.426   2.31067       0.0998 ││ train_cifar_669ef_00001   PENDING     128      8   0.0139934                8                                                    ││ train_cifar_669ef_00002   PENDING       4    256   0.00049422              16                                                    ││ train_cifar_669ef_00003   PENDING       4     64   0.00662052              16                                                    ││ train_cifar_669ef_00004   PENDING       8      1   0.000384563              2                                                    ││ train_cifar_669ef_00005   PENDING       8     32   0.0489862                4                                                    ││ train_cifar_669ef_00006   PENDING     256    128   0.000152455              2                                                    ││ train_cifar_669ef_00007   PENDING       1     64   0.00384703              16                                                    ││ train_cifar_669ef_00008   PENDING       1      4   0.073969                 4                                                    ││ train_cifar_669ef_00009   PENDING       1     32   0.00174425              16                                                    │╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯(func pid=5037) [9,  8000] loss: 0.578(func pid=5037) [9, 10000] loss: 0.462(func pid=5037) [9, 12000] loss: 0.385(func pid=5037) [9, 14000] loss: 0.330(func pid=5037) [9, 16000] loss: 0.289(func pid=5037) [9, 18000] loss: 0.257Trial status: 1 RUNNING | 9 PENDINGCurrent time: 2026-02-19 17:01:41. Total running time: 9min 1sLogical resource usage: 2.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:A10G)Current best trial: 669ef_00000 with loss=2.310666144037247 and params={'l1': 32, 'l2': 4, 'lr': 0.0073210577383846474, 'batch_size': 2, 'device': 'cuda'}╭──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮│ Trial name                status       l1     l2            lr     batch_size     iter     total time (s)      loss     accuracy │├──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤│ train_cifar_669ef_00000   RUNNING      32      4   0.00732106               2        8            487.426   2.31067       0.0998 ││ train_cifar_669ef_00001   PENDING     128      8   0.0139934                8                                                    ││ train_cifar_669ef_00002   PENDING       4    256   0.00049422              16                                                    ││ train_cifar_669ef_00003   PENDING       4     64   0.00662052              16                                                    ││ train_cifar_669ef_00004   PENDING       8      1   0.000384563              2                                                    ││ train_cifar_669ef_00005   PENDING       8     32   0.0489862                4                                                    ││ train_cifar_669ef_00006   PENDING     256    128   0.000152455              2                                                    ││ train_cifar_669ef_00007   PENDING       1     64   0.00384703              16                                                    ││ train_cifar_669ef_00008   PENDING       1      4   0.073969                 4                                                    ││ train_cifar_669ef_00009   PENDING       1     32   0.00174425              16                                                    │╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯(func pid=5037) [9, 20000] loss: 0.231(func pid=5037) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2026-02-19_16-52-39/train_cifar_669ef_00000_0_batch_size=2,l1=32,l2=4,lr=0.0073_2026-02-19_16-52-39/checkpoint_000008)(func pid=5037) [10,  2000] loss: 2.310(func pid=5037) [10,  4000] loss: 1.154(func pid=5037) [10,  6000] loss: 0.770Trial status: 1 RUNNING | 9 PENDINGCurrent time: 2026-02-19 17:02:11. Total running time: 9min 31sLogical resource usage: 2.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:A10G)Current best trial: 669ef_00000 with loss=2.313345506286621 and params={'l1': 32, 'l2': 4, 'lr': 0.0073210577383846474, 'batch_size': 2, 'device': 'cuda'}╭──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮│ Trial name                status       l1     l2            lr     batch_size     iter     total time (s)      loss     accuracy │├──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤│ train_cifar_669ef_00000   RUNNING      32      4   0.00732106               2        9            547.575   2.31335       0.0982 ││ train_cifar_669ef_00001   PENDING     128      8   0.0139934                8                                                    ││ train_cifar_669ef_00002   PENDING       4    256   0.00049422              16                                                    ││ train_cifar_669ef_00003   PENDING       4     64   0.00662052              16                                                    ││ train_cifar_669ef_00004   PENDING       8      1   0.000384563              2                                                    ││ train_cifar_669ef_00005   PENDING       8     32   0.0489862                4                                                    ││ train_cifar_669ef_00006   PENDING     256    128   0.000152455              2                                                    ││ train_cifar_669ef_00007   PENDING       1     64   0.00384703              16                                                    ││ train_cifar_669ef_00008   PENDING       1      4   0.073969                 4                                                    ││ train_cifar_669ef_00009   PENDING       1     32   0.00174425              16                                                    │╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯(func pid=5037) [10,  8000] loss: 0.578(func pid=5037) [10, 10000] loss: 0.462(func pid=5037) [10, 12000] loss: 0.385(func pid=5037) [10, 14000] loss: 0.330(func pid=5037) [10, 16000] loss: 0.289(func pid=5037) [10, 18000] loss: 0.257Trial status: 1 RUNNING | 9 PENDINGCurrent time: 2026-02-19 17:02:41. Total running time: 10min 1sLogical resource usage: 2.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:A10G)Current best trial: 669ef_00000 with loss=2.313345506286621 and params={'l1': 32, 'l2': 4, 'lr': 0.0073210577383846474, 'batch_size': 2, 'device': 'cuda'}╭──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮│ Trial name                status       l1     l2            lr     batch_size     iter     total time (s)      loss     accuracy │├──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤│ train_cifar_669ef_00000   RUNNING      32      4   0.00732106               2        9            547.575   2.31335       0.0982 ││ train_cifar_669ef_00001   PENDING     128      8   0.0139934                8                                                    ││ train_cifar_669ef_00002   PENDING       4    256   0.00049422              16                                                    ││ train_cifar_669ef_00003   PENDING       4     64   0.00662052              16                                                    ││ train_cifar_669ef_00004   PENDING       8      1   0.000384563              2                                                    ││ train_cifar_669ef_00005   PENDING       8     32   0.0489862                4                                                    ││ train_cifar_669ef_00006   PENDING     256    128   0.000152455              2                                                    ││ train_cifar_669ef_00007   PENDING       1     64   0.00384703              16                                                    ││ train_cifar_669ef_00008   PENDING       1      4   0.073969                 4                                                    ││ train_cifar_669ef_00009   PENDING       1     32   0.00174425              16                                                    │╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯(func pid=5037) [10, 20000] loss: 0.231Trial train_cifar_669ef_00000 completed after 10 iterations at 2026-02-19 17:02:52. Total running time: 10min 12s╭────────────────────────────────────────────────────────────╮│ Trial train_cifar_669ef_00000 result                       │├────────────────────────────────────────────────────────────┤│ checkpoint_dir_name                      checkpoint_000009 ││ time_this_iter_s                                  60.72094 ││ time_total_s                                     608.29549 ││ training_iteration                                      10 ││ accuracy                                            0.0973 ││ loss                                               2.31656 │╰────────────────────────────────────────────────────────────╯(func pid=5037) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2026-02-19_16-52-39/train_cifar_669ef_00000_0_batch_size=2,l1=32,l2=4,lr=0.0073_2026-02-19_16-52-39/checkpoint_000009)Trial train_cifar_669ef_00001 started with configuration:╭──────────────────────────────────────────────────╮│ Trial train_cifar_669ef_00001 config             │├──────────────────────────────────────────────────┤│ batch_size                                     8 ││ device                                      cuda ││ l1                                           128 ││ l2                                             8 ││ lr                                       0.01399 │╰──────────────────────────────────────────────────╯(func pid=5806) [1,  2000] loss: 2.097(func pid=5806) [1,  4000] loss: 1.032Trial status: 1 TERMINATED | 1 RUNNING | 8 PENDINGCurrent time: 2026-02-19 17:03:11. Total running time: 10min 31sLogical resource usage: 2.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:A10G)Current best trial: 669ef_00000 with loss=2.3165569902896883 and params={'l1': 32, 'l2': 4, 'lr': 0.0073210577383846474, 'batch_size': 2, 'device': 'cuda'}╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮│ Trial name                status         l1     l2            lr     batch_size     iter     total time (s)      loss     accuracy │├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤│ train_cifar_669ef_00001   RUNNING       128      8   0.0139934                8                                                    ││ train_cifar_669ef_00000   TERMINATED     32      4   0.00732106               2       10            608.295   2.31656       0.0973 ││ train_cifar_669ef_00002   PENDING         4    256   0.00049422              16                                                    ││ train_cifar_669ef_00003   PENDING         4     64   0.00662052              16                                                    ││ train_cifar_669ef_00004   PENDING         8      1   0.000384563              2                                                    ││ train_cifar_669ef_00005   PENDING         8     32   0.0489862                4                                                    ││ train_cifar_669ef_00006   PENDING       256    128   0.000152455              2                                                    ││ train_cifar_669ef_00007   PENDING         1     64   0.00384703              16                                                    ││ train_cifar_669ef_00008   PENDING         1      4   0.073969                 4                                                    ││ train_cifar_669ef_00009   PENDING         1     32   0.00174425              16                                                    │╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯(func pid=5806) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2026-02-19_16-52-39/train_cifar_669ef_00001_1_batch_size=8,l1=128,l2=8,lr=0.0140_2026-02-19_16-52-39/checkpoint_000000)(func pid=5806) [2,  2000] loss: 2.062(func pid=5806) [2026-02-19 17:03:23,410 E 5806 5842] core_worker_process.cc:837: Failed to establish connection to the metrics exporter agent. Metrics will not be exported. Exporter agent status: RpcError: Running out of retries to initialize the metrics agent. rpc_code: 14(func pid=5806) [2,  4000] loss: 1.096(func pid=5806) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2026-02-19_16-52-39/train_cifar_669ef_00001_1_batch_size=8,l1=128,l2=8,lr=0.0140_2026-02-19_16-52-39/checkpoint_000001)(func pid=5806) [3,  2000] loss: 2.307Trial status: 1 TERMINATED | 1 RUNNING | 8 PENDINGCurrent time: 2026-02-19 17:03:41. Total running time: 11min 1sLogical resource usage: 2.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:A10G)Current best trial: 669ef_00001 with loss=2.306536409187317 and params={'l1': 128, 'l2': 8, 'lr': 0.013993373084434733, 'batch_size': 8, 'device': 'cuda'}╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮│ Trial name                status         l1     l2            lr     batch_size     iter     total time (s)      loss     accuracy │├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤│ train_cifar_669ef_00001   RUNNING       128      8   0.0139934                8        2            34.4494   2.30654       0.098  ││ train_cifar_669ef_00000   TERMINATED     32      4   0.00732106               2       10           608.295    2.31656       0.0973 ││ train_cifar_669ef_00002   PENDING         4    256   0.00049422              16                                                    ││ train_cifar_669ef_00003   PENDING         4     64   0.00662052              16                                                    ││ train_cifar_669ef_00004   PENDING         8      1   0.000384563              2                                                    ││ train_cifar_669ef_00005   PENDING         8     32   0.0489862                4                                                    ││ train_cifar_669ef_00006   PENDING       256    128   0.000152455              2                                                    ││ train_cifar_669ef_00007   PENDING         1     64   0.00384703              16                                                    ││ train_cifar_669ef_00008   PENDING         1      4   0.073969                 4                                                    ││ train_cifar_669ef_00009   PENDING         1     32   0.00174425              16                                                    │╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯(func pid=5806) [3,  4000] loss: 1.153(func pid=5806) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2026-02-19_16-52-39/train_cifar_669ef_00001_1_batch_size=8,l1=128,l2=8,lr=0.0140_2026-02-19_16-52-39/checkpoint_000002)(func pid=5806) [4,  2000] loss: 2.306(func pid=5806) [4,  4000] loss: 1.153(func pid=5806) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2026-02-19_16-52-39/train_cifar_669ef_00001_1_batch_size=8,l1=128,l2=8,lr=0.0140_2026-02-19_16-52-39/checkpoint_000003)(func pid=5806) [5,  2000] loss: 2.306Trial status: 1 TERMINATED | 1 RUNNING | 8 PENDINGCurrent time: 2026-02-19 17:04:11. Total running time: 11min 31sLogical resource usage: 2.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:A10G)Current best trial: 669ef_00001 with loss=2.30608193321228 and params={'l1': 128, 'l2': 8, 'lr': 0.013993373084434733, 'batch_size': 8, 'device': 'cuda'}╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮│ Trial name                status         l1     l2            lr     batch_size     iter     total time (s)      loss     accuracy │├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤│ train_cifar_669ef_00001   RUNNING       128      8   0.0139934                8        4            66.9403   2.30608       0.098  ││ train_cifar_669ef_00000   TERMINATED     32      4   0.00732106               2       10           608.295    2.31656       0.0973 ││ train_cifar_669ef_00002   PENDING         4    256   0.00049422              16                                                    ││ train_cifar_669ef_00003   PENDING         4     64   0.00662052              16                                                    ││ train_cifar_669ef_00004   PENDING         8      1   0.000384563              2                                                    ││ train_cifar_669ef_00005   PENDING         8     32   0.0489862                4                                                    ││ train_cifar_669ef_00006   PENDING       256    128   0.000152455              2                                                    ││ train_cifar_669ef_00007   PENDING         1     64   0.00384703              16                                                    ││ train_cifar_669ef_00008   PENDING         1      4   0.073969                 4                                                    ││ train_cifar_669ef_00009   PENDING         1     32   0.00174425              16                                                    │╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯(func pid=5806) [5,  4000] loss: 1.153(func pid=5806) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2026-02-19_16-52-39/train_cifar_669ef_00001_1_batch_size=8,l1=128,l2=8,lr=0.0140_2026-02-19_16-52-39/checkpoint_000004)(func pid=5806) [6,  2000] loss: 2.306(func pid=5806) [6,  4000] loss: 1.153(func pid=5806) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2026-02-19_16-52-39/train_cifar_669ef_00001_1_batch_size=8,l1=128,l2=8,lr=0.0140_2026-02-19_16-52-39/checkpoint_000005)(func pid=5806) [7,  2000] loss: 2.306Trial status: 1 TERMINATED | 1 RUNNING | 8 PENDINGCurrent time: 2026-02-19 17:04:41. Total running time: 12min 1sLogical resource usage: 2.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:A10G)Current best trial: 669ef_00001 with loss=2.305825618171692 and params={'l1': 128, 'l2': 8, 'lr': 0.013993373084434733, 'batch_size': 8, 'device': 'cuda'}╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮│ Trial name                status         l1     l2            lr     batch_size     iter     total time (s)      loss     accuracy │├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤│ train_cifar_669ef_00001   RUNNING       128      8   0.0139934                8        6            99.4467   2.30583       0.098  ││ train_cifar_669ef_00000   TERMINATED     32      4   0.00732106               2       10           608.295    2.31656       0.0973 ││ train_cifar_669ef_00002   PENDING         4    256   0.00049422              16                                                    ││ train_cifar_669ef_00003   PENDING         4     64   0.00662052              16                                                    ││ train_cifar_669ef_00004   PENDING         8      1   0.000384563              2                                                    ││ train_cifar_669ef_00005   PENDING         8     32   0.0489862                4                                                    ││ train_cifar_669ef_00006   PENDING       256    128   0.000152455              2                                                    ││ train_cifar_669ef_00007   PENDING         1     64   0.00384703              16                                                    ││ train_cifar_669ef_00008   PENDING         1      4   0.073969                 4                                                    ││ train_cifar_669ef_00009   PENDING         1     32   0.00174425              16                                                    │╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯(func pid=5806) [7,  4000] loss: 1.154(func pid=5806) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2026-02-19_16-52-39/train_cifar_669ef_00001_1_batch_size=8,l1=128,l2=8,lr=0.0140_2026-02-19_16-52-39/checkpoint_000006)(func pid=5806) [8,  2000] loss: 2.307(func pid=5806) [8,  4000] loss: 1.154(func pid=5806) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2026-02-19_16-52-39/train_cifar_669ef_00001_1_batch_size=8,l1=128,l2=8,lr=0.0140_2026-02-19_16-52-39/checkpoint_000007)Trial status: 1 TERMINATED | 1 RUNNING | 8 PENDINGCurrent time: 2026-02-19 17:05:11. Total running time: 12min 31sLogical resource usage: 2.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:A10G)Current best trial: 669ef_00001 with loss=2.305406739425659 and params={'l1': 128, 'l2': 8, 'lr': 0.013993373084434733, 'batch_size': 8, 'device': 'cuda'}╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮│ Trial name                status         l1     l2            lr     batch_size     iter     total time (s)      loss     accuracy │├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤│ train_cifar_669ef_00001   RUNNING       128      8   0.0139934                8        8            131.933   2.30541       0.102  ││ train_cifar_669ef_00000   TERMINATED     32      4   0.00732106               2       10            608.295   2.31656       0.0973 ││ train_cifar_669ef_00002   PENDING         4    256   0.00049422              16                                                    ││ train_cifar_669ef_00003   PENDING         4     64   0.00662052              16                                                    ││ train_cifar_669ef_00004   PENDING         8      1   0.000384563              2                                                    ││ train_cifar_669ef_00005   PENDING         8     32   0.0489862                4                                                    ││ train_cifar_669ef_00006   PENDING       256    128   0.000152455              2                                                    ││ train_cifar_669ef_00007   PENDING         1     64   0.00384703              16                                                    ││ train_cifar_669ef_00008   PENDING         1      4   0.073969                 4                                                    ││ train_cifar_669ef_00009   PENDING         1     32   0.00174425              16                                                    │╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯(func pid=5806) [9,  2000] loss: 2.307(func pid=5806) [9,  4000] loss: 1.153(func pid=5806) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2026-02-19_16-52-39/train_cifar_669ef_00001_1_batch_size=8,l1=128,l2=8,lr=0.0140_2026-02-19_16-52-39/checkpoint_000008)(func pid=5806) [10,  2000] loss: 2.307(func pid=5806) [10,  4000] loss: 1.153Trial train_cifar_669ef_00001 completed after 10 iterations at 2026-02-19 17:05:40. Total running time: 13min 1s╭────────────────────────────────────────────────────────────╮│ Trial train_cifar_669ef_00001 result                       │├────────────────────────────────────────────────────────────┤│ checkpoint_dir_name                      checkpoint_000009 ││ time_this_iter_s                                  16.14774 ││ time_total_s                                     164.41085 ││ training_iteration                                      10 ││ accuracy                                            0.0997 ││ loss                                               2.30709 │╰────────────────────────────────────────────────────────────╯(func pid=5806) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2026-02-19_16-52-39/train_cifar_669ef_00001_1_batch_size=8,l1=128,l2=8,lr=0.0140_2026-02-19_16-52-39/checkpoint_000009)Trial status: 2 TERMINATED | 8 PENDINGCurrent time: 2026-02-19 17:05:41. Total running time: 13min 1sLogical resource usage: 0/16 CPUs, 0/1 GPUs (0.0/1.0 accelerator_type:A10G)Current best trial: 669ef_00001 with loss=2.3070928424835206 and params={'l1': 128, 'l2': 8, 'lr': 0.013993373084434733, 'batch_size': 8, 'device': 'cuda'}╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮│ Trial name                status         l1     l2            lr     batch_size     iter     total time (s)      loss     accuracy │├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤│ train_cifar_669ef_00000   TERMINATED     32      4   0.00732106               2       10            608.295   2.31656       0.0973 ││ train_cifar_669ef_00001   TERMINATED    128      8   0.0139934                8       10            164.411   2.30709       0.0997 ││ train_cifar_669ef_00002   PENDING         4    256   0.00049422              16                                                    ││ train_cifar_669ef_00003   PENDING         4     64   0.00662052              16                                                    ││ train_cifar_669ef_00004   PENDING         8      1   0.000384563              2                                                    ││ train_cifar_669ef_00005   PENDING         8     32   0.0489862                4                                                    ││ train_cifar_669ef_00006   PENDING       256    128   0.000152455              2                                                    ││ train_cifar_669ef_00007   PENDING         1     64   0.00384703              16                                                    ││ train_cifar_669ef_00008   PENDING         1      4   0.073969                 4                                                    ││ train_cifar_669ef_00009   PENDING         1     32   0.00174425              16                                                    │╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯Trial train_cifar_669ef_00002 started with configuration:╭──────────────────────────────────────────────────╮│ Trial train_cifar_669ef_00002 config             │├──────────────────────────────────────────────────┤│ batch_size                                    16 ││ device                                      cuda ││ l1                                             4 ││ l2                                           256 ││ lr                                       0.00049 │╰──────────────────────────────────────────────────╯(func pid=6528) [1,  2000] loss: 2.025(func pid=6528) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2026-02-19_16-52-39/train_cifar_669ef_00002_2_batch_size=16,l1=4,l2=256,lr=0.0005_2026-02-19_16-52-39/checkpoint_000000)(func pid=6528) [2,  2000] loss: 1.644(func pid=6528) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2026-02-19_16-52-39/train_cifar_669ef_00002_2_batch_size=16,l1=4,l2=256,lr=0.0005_2026-02-19_16-52-39/checkpoint_000001)(func pid=6528) [3,  2000] loss: 1.527Trial status: 2 TERMINATED | 1 RUNNING | 7 PENDINGCurrent time: 2026-02-19 17:06:11. Total running time: 13min 31sLogical resource usage: 2.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:A10G)Current best trial: 669ef_00002 with loss=1.5551642936706542 and params={'l1': 4, 'l2': 256, 'lr': 0.00049422038084244, 'batch_size': 16, 'device': 'cuda'}╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮│ Trial name                status         l1     l2            lr     batch_size     iter     total time (s)      loss     accuracy │├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤│ train_cifar_669ef_00002   RUNNING         4    256   0.00049422              16        2            19.5097   1.55516       0.4083 ││ train_cifar_669ef_00000   TERMINATED     32      4   0.00732106               2       10           608.295    2.31656       0.0973 ││ train_cifar_669ef_00001   TERMINATED    128      8   0.0139934                8       10           164.411    2.30709       0.0997 ││ train_cifar_669ef_00003   PENDING         4     64   0.00662052              16                                                    ││ train_cifar_669ef_00004   PENDING         8      1   0.000384563              2                                                    ││ train_cifar_669ef_00005   PENDING         8     32   0.0489862                4                                                    ││ train_cifar_669ef_00006   PENDING       256    128   0.000152455              2                                                    ││ train_cifar_669ef_00007   PENDING         1     64   0.00384703              16                                                    ││ train_cifar_669ef_00008   PENDING         1      4   0.073969                 4                                                    ││ train_cifar_669ef_00009   PENDING         1     32   0.00174425              16                                                    │╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯(func pid=6528) [2026-02-19 17:06:12,443 E 6528 6563] core_worker_process.cc:837: Failed to establish connection to the metrics exporter agent. Metrics will not be exported. Exporter agent status: RpcError: Running out of retries to initialize the metrics agent. rpc_code: 14(func pid=6528) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2026-02-19_16-52-39/train_cifar_669ef_00002_2_batch_size=16,l1=4,l2=256,lr=0.0005_2026-02-19_16-52-39/checkpoint_000002)(func pid=6528) [4,  2000] loss: 1.447(func pid=6528) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2026-02-19_16-52-39/train_cifar_669ef_00002_2_batch_size=16,l1=4,l2=256,lr=0.0005_2026-02-19_16-52-39/checkpoint_000003)(func pid=6528) [5,  2000] loss: 1.392(func pid=6528) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2026-02-19_16-52-39/train_cifar_669ef_00002_2_batch_size=16,l1=4,l2=256,lr=0.0005_2026-02-19_16-52-39/checkpoint_000004)(func pid=6528) [6,  2000] loss: 1.346(func pid=6528) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2026-02-19_16-52-39/train_cifar_669ef_00002_2_batch_size=16,l1=4,l2=256,lr=0.0005_2026-02-19_16-52-39/checkpoint_000005)Trial status: 2 TERMINATED | 1 RUNNING | 7 PENDINGCurrent time: 2026-02-19 17:06:41. Total running time: 14min 1sLogical resource usage: 2.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:A10G)Current best trial: 669ef_00002 with loss=1.3210023609161377 and params={'l1': 4, 'l2': 256, 'lr': 0.00049422038084244, 'batch_size': 16, 'device': 'cuda'}╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮│ Trial name                status         l1     l2            lr     batch_size     iter     total time (s)      loss     accuracy │├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤│ train_cifar_669ef_00002   RUNNING         4    256   0.00049422              16        6            54.2831   1.321         0.511  ││ train_cifar_669ef_00000   TERMINATED     32      4   0.00732106               2       10           608.295    2.31656       0.0973 ││ train_cifar_669ef_00001   TERMINATED    128      8   0.0139934                8       10           164.411    2.30709       0.0997 ││ train_cifar_669ef_00003   PENDING         4     64   0.00662052              16                                                    ││ train_cifar_669ef_00004   PENDING         8      1   0.000384563              2                                                    ││ train_cifar_669ef_00005   PENDING         8     32   0.0489862                4                                                    ││ train_cifar_669ef_00006   PENDING       256    128   0.000152455              2                                                    ││ train_cifar_669ef_00007   PENDING         1     64   0.00384703              16                                                    ││ train_cifar_669ef_00008   PENDING         1      4   0.073969                 4                                                    ││ train_cifar_669ef_00009   PENDING         1     32   0.00174425              16                                                    │╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯(func pid=6528) [7,  2000] loss: 1.305(func pid=6528) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2026-02-19_16-52-39/train_cifar_669ef_00002_2_batch_size=16,l1=4,l2=256,lr=0.0005_2026-02-19_16-52-39/checkpoint_000006)(func pid=6528) [8,  2000] loss: 1.285(func pid=6528) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2026-02-19_16-52-39/train_cifar_669ef_00002_2_batch_size=16,l1=4,l2=256,lr=0.0005_2026-02-19_16-52-39/checkpoint_000007)(func pid=6528) [9,  2000] loss: 1.254(func pid=6528) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2026-02-19_16-52-39/train_cifar_669ef_00002_2_batch_size=16,l1=4,l2=256,lr=0.0005_2026-02-19_16-52-39/checkpoint_000008)(func pid=6528) [10,  2000] loss: 1.236Trial status: 2 TERMINATED | 1 RUNNING | 7 PENDINGCurrent time: 2026-02-19 17:07:11. Total running time: 14min 32sLogical resource usage: 2.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:A10G)Current best trial: 669ef_00002 with loss=1.2684166840553284 and params={'l1': 4, 'l2': 256, 'lr': 0.00049422038084244, 'batch_size': 16, 'device': 'cuda'}╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮│ Trial name                status         l1     l2            lr     batch_size     iter     total time (s)      loss     accuracy │├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤│ train_cifar_669ef_00002   RUNNING         4    256   0.00049422              16        9            80.2501   1.26842       0.5308 ││ train_cifar_669ef_00000   TERMINATED     32      4   0.00732106               2       10           608.295    2.31656       0.0973 ││ train_cifar_669ef_00001   TERMINATED    128      8   0.0139934                8       10           164.411    2.30709       0.0997 ││ train_cifar_669ef_00003   PENDING         4     64   0.00662052              16                                                    ││ train_cifar_669ef_00004   PENDING         8      1   0.000384563              2                                                    ││ train_cifar_669ef_00005   PENDING         8     32   0.0489862                4                                                    ││ train_cifar_669ef_00006   PENDING       256    128   0.000152455              2                                                    ││ train_cifar_669ef_00007   PENDING         1     64   0.00384703              16                                                    ││ train_cifar_669ef_00008   PENDING         1      4   0.073969                 4                                                    ││ train_cifar_669ef_00009   PENDING         1     32   0.00174425              16                                                    │╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯Trial train_cifar_669ef_00002 completed after 10 iterations at 2026-02-19 17:07:14. Total running time: 14min 34s╭────────────────────────────────────────────────────────────╮│ Trial train_cifar_669ef_00002 result                       │├────────────────────────────────────────────────────────────┤│ checkpoint_dir_name                      checkpoint_000009 ││ time_this_iter_s                                   8.74424 ││ time_total_s                                      88.99429 ││ training_iteration                                      10 ││ accuracy                                            0.5276 ││ loss                                               1.27744 │╰────────────────────────────────────────────────────────────╯(func pid=6528) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2026-02-19_16-52-39/train_cifar_669ef_00002_2_batch_size=16,l1=4,l2=256,lr=0.0005_2026-02-19_16-52-39/checkpoint_000009)Trial train_cifar_669ef_00003 started with configuration:╭──────────────────────────────────────────────────╮│ Trial train_cifar_669ef_00003 config             │├──────────────────────────────────────────────────┤│ batch_size                                    16 ││ device                                      cuda ││ l1                                             4 ││ l2                                            64 ││ lr                                       0.00662 │╰──────────────────────────────────────────────────╯(func pid=7242) [1,  2000] loss: 1.892(func pid=7242) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2026-02-19_16-52-39/train_cifar_669ef_00003_3_batch_size=16,l1=4,l2=64,lr=0.0066_2026-02-19_16-52-39/checkpoint_000000)(func pid=7242) [2,  2000] loss: 1.641(func pid=7242) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2026-02-19_16-52-39/train_cifar_669ef_00003_3_batch_size=16,l1=4,l2=64,lr=0.0066_2026-02-19_16-52-39/checkpoint_000001)Trial status: 3 TERMINATED | 1 RUNNING | 6 PENDINGCurrent time: 2026-02-19 17:07:41. Total running time: 15min 2sLogical resource usage: 2.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:A10G)Current best trial: 669ef_00002 with loss=1.2774354283332825 and params={'l1': 4, 'l2': 256, 'lr': 0.00049422038084244, 'batch_size': 16, 'device': 'cuda'}╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮│ Trial name                status         l1     l2            lr     batch_size     iter     total time (s)      loss     accuracy │├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤│ train_cifar_669ef_00003   RUNNING         4     64   0.00662052              16        2            19.6104   1.5621        0.4265 ││ train_cifar_669ef_00000   TERMINATED     32      4   0.00732106               2       10           608.295    2.31656       0.0973 ││ train_cifar_669ef_00001   TERMINATED    128      8   0.0139934                8       10           164.411    2.30709       0.0997 ││ train_cifar_669ef_00002   TERMINATED      4    256   0.00049422              16       10            88.9943   1.27744       0.5276 ││ train_cifar_669ef_00004   PENDING         8      1   0.000384563              2                                                    ││ train_cifar_669ef_00005   PENDING         8     32   0.0489862                4                                                    ││ train_cifar_669ef_00006   PENDING       256    128   0.000152455              2                                                    ││ train_cifar_669ef_00007   PENDING         1     64   0.00384703              16                                                    ││ train_cifar_669ef_00008   PENDING         1      4   0.073969                 4                                                    ││ train_cifar_669ef_00009   PENDING         1     32   0.00174425              16                                                    │╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯(func pid=7242) [3,  2000] loss: 1.574(func pid=7242) [2026-02-19 17:07:45,457 E 7242 7278] core_worker_process.cc:837: Failed to establish connection to the metrics exporter agent. Metrics will not be exported. Exporter agent status: RpcError: Running out of retries to initialize the metrics agent. rpc_code: 14(func pid=7242) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2026-02-19_16-52-39/train_cifar_669ef_00003_3_batch_size=16,l1=4,l2=64,lr=0.0066_2026-02-19_16-52-39/checkpoint_000002)(func pid=7242) [4,  2000] loss: 1.548(func pid=7242) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2026-02-19_16-52-39/train_cifar_669ef_00003_3_batch_size=16,l1=4,l2=64,lr=0.0066_2026-02-19_16-52-39/checkpoint_000003)(func pid=7242) [5,  2000] loss: 1.531(func pid=7242) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2026-02-19_16-52-39/train_cifar_669ef_00003_3_batch_size=16,l1=4,l2=64,lr=0.0066_2026-02-19_16-52-39/checkpoint_000004)(func pid=7242) [6,  2000] loss: 1.521Trial status: 3 TERMINATED | 1 RUNNING | 6 PENDINGCurrent time: 2026-02-19 17:08:11. Total running time: 15min 32sLogical resource usage: 2.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:A10G)Current best trial: 669ef_00002 with loss=1.2774354283332825 and params={'l1': 4, 'l2': 256, 'lr': 0.00049422038084244, 'batch_size': 16, 'device': 'cuda'}╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮│ Trial name                status         l1     l2            lr     batch_size     iter     total time (s)      loss     accuracy │├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤│ train_cifar_669ef_00003   RUNNING         4     64   0.00662052              16        5            45.9906   1.66557       0.4117 ││ train_cifar_669ef_00000   TERMINATED     32      4   0.00732106               2       10           608.295    2.31656       0.0973 ││ train_cifar_669ef_00001   TERMINATED    128      8   0.0139934                8       10           164.411    2.30709       0.0997 ││ train_cifar_669ef_00002   TERMINATED      4    256   0.00049422              16       10            88.9943   1.27744       0.5276 ││ train_cifar_669ef_00004   PENDING         8      1   0.000384563              2                                                    ││ train_cifar_669ef_00005   PENDING         8     32   0.0489862                4                                                    ││ train_cifar_669ef_00006   PENDING       256    128   0.000152455              2                                                    ││ train_cifar_669ef_00007   PENDING         1     64   0.00384703              16                                                    ││ train_cifar_669ef_00008   PENDING         1      4   0.073969                 4                                                    ││ train_cifar_669ef_00009   PENDING         1     32   0.00174425              16                                                    │╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯(func pid=7242) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2026-02-19_16-52-39/train_cifar_669ef_00003_3_batch_size=16,l1=4,l2=64,lr=0.0066_2026-02-19_16-52-39/checkpoint_000005)(func pid=7242) [7,  2000] loss: 1.515(func pid=7242) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2026-02-19_16-52-39/train_cifar_669ef_00003_3_batch_size=16,l1=4,l2=64,lr=0.0066_2026-02-19_16-52-39/checkpoint_000006)(func pid=7242) [8,  2000] loss: 1.528(func pid=7242) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2026-02-19_16-52-39/train_cifar_669ef_00003_3_batch_size=16,l1=4,l2=64,lr=0.0066_2026-02-19_16-52-39/checkpoint_000007)(func pid=7242) [9,  2000] loss: 1.525(func pid=7242) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2026-02-19_16-52-39/train_cifar_669ef_00003_3_batch_size=16,l1=4,l2=64,lr=0.0066_2026-02-19_16-52-39/checkpoint_000008)Trial status: 3 TERMINATED | 1 RUNNING | 6 PENDINGCurrent time: 2026-02-19 17:08:41. Total running time: 16min 2sLogical resource usage: 2.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:A10G)Current best trial: 669ef_00002 with loss=1.2774354283332825 and params={'l1': 4, 'l2': 256, 'lr': 0.00049422038084244, 'batch_size': 16, 'device': 'cuda'}╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮│ Trial name                status         l1     l2            lr     batch_size     iter     total time (s)      loss     accuracy │├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤│ train_cifar_669ef_00003   RUNNING         4     64   0.00662052              16        9            80.9702   1.54701       0.4484 ││ train_cifar_669ef_00000   TERMINATED     32      4   0.00732106               2       10           608.295    2.31656       0.0973 ││ train_cifar_669ef_00001   TERMINATED    128      8   0.0139934                8       10           164.411    2.30709       0.0997 ││ train_cifar_669ef_00002   TERMINATED      4    256   0.00049422              16       10            88.9943   1.27744       0.5276 ││ train_cifar_669ef_00004   PENDING         8      1   0.000384563              2                                                    ││ train_cifar_669ef_00005   PENDING         8     32   0.0489862                4                                                    ││ train_cifar_669ef_00006   PENDING       256    128   0.000152455              2                                                    ││ train_cifar_669ef_00007   PENDING         1     64   0.00384703              16                                                    ││ train_cifar_669ef_00008   PENDING         1      4   0.073969                 4                                                    ││ train_cifar_669ef_00009   PENDING         1     32   0.00174425              16                                                    │╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯(func pid=7242) [10,  2000] loss: 1.517(func pid=7242) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2026-02-19_16-52-39/train_cifar_669ef_00003_3_batch_size=16,l1=4,l2=64,lr=0.0066_2026-02-19_16-52-39/checkpoint_000009)Trial train_cifar_669ef_00003 completed after 10 iterations at 2026-02-19 17:08:48. Total running time: 16min 8s╭────────────────────────────────────────────────────────────╮│ Trial train_cifar_669ef_00003 result                       │├────────────────────────────────────────────────────────────┤│ checkpoint_dir_name                      checkpoint_000009 ││ time_this_iter_s                                   8.77005 ││ time_total_s                                      89.74023 ││ training_iteration                                      10 ││ accuracy                                            0.4545 ││ loss                                               1.50707 │╰────────────────────────────────────────────────────────────╯Trial train_cifar_669ef_00004 started with configuration:╭──────────────────────────────────────────────────╮│ Trial train_cifar_669ef_00004 config             │├──────────────────────────────────────────────────┤│ batch_size                                     2 ││ device                                      cuda ││ l1                                             8 ││ l2                                             1 ││ lr                                       0.00038 │╰──────────────────────────────────────────────────╯(func pid=7957) [1,  2000] loss: 2.425(func pid=7957) [1,  4000] loss: 1.163(func pid=7957) [1,  6000] loss: 0.770Trial status: 4 TERMINATED | 1 RUNNING | 5 PENDINGCurrent time: 2026-02-19 17:09:11. Total running time: 16min 32sLogical resource usage: 2.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:A10G)Current best trial: 669ef_00002 with loss=1.2774354283332825 and params={'l1': 4, 'l2': 256, 'lr': 0.00049422038084244, 'batch_size': 16, 'device': 'cuda'}╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮│ Trial name                status         l1     l2            lr     batch_size     iter     total time (s)      loss     accuracy │├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤│ train_cifar_669ef_00004   RUNNING         8      1   0.000384563              2                                                    ││ train_cifar_669ef_00000   TERMINATED     32      4   0.00732106               2       10           608.295    2.31656       0.0973 ││ train_cifar_669ef_00001   TERMINATED    128      8   0.0139934                8       10           164.411    2.30709       0.0997 ││ train_cifar_669ef_00002   TERMINATED      4    256   0.00049422              16       10            88.9943   1.27744       0.5276 ││ train_cifar_669ef_00003   TERMINATED      4     64   0.00662052              16       10            89.7402   1.50707       0.4545 ││ train_cifar_669ef_00005   PENDING         8     32   0.0489862                4                                                    ││ train_cifar_669ef_00006   PENDING       256    128   0.000152455              2                                                    ││ train_cifar_669ef_00007   PENDING         1     64   0.00384703              16                                                    ││ train_cifar_669ef_00008   PENDING         1      4   0.073969                 4                                                    ││ train_cifar_669ef_00009   PENDING         1     32   0.00174425              16                                                    │╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯(func pid=7957) [1,  8000] loss: 0.576(func pid=7957) [2026-02-19 17:09:19,449 E 7957 7992] core_worker_process.cc:837: Failed to establish connection to the metrics exporter agent. Metrics will not be exported. Exporter agent status: RpcError: Running out of retries to initialize the metrics agent. rpc_code: 14(func pid=7957) [1, 10000] loss: 0.461(func pid=7957) [1, 12000] loss: 0.384(func pid=7957) [1, 14000] loss: 0.329(func pid=7957) [1, 16000] loss: 0.288(func pid=7957) [1, 18000] loss: 0.256Trial status: 4 TERMINATED | 1 RUNNING | 5 PENDINGCurrent time: 2026-02-19 17:09:41. Total running time: 17min 2sLogical resource usage: 2.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:A10G)Current best trial: 669ef_00002 with loss=1.2774354283332825 and params={'l1': 4, 'l2': 256, 'lr': 0.00049422038084244, 'batch_size': 16, 'device': 'cuda'}╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮│ Trial name                status         l1     l2            lr     batch_size     iter     total time (s)      loss     accuracy │├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤│ train_cifar_669ef_00004   RUNNING         8      1   0.000384563              2                                                    ││ train_cifar_669ef_00000   TERMINATED     32      4   0.00732106               2       10           608.295    2.31656       0.0973 ││ train_cifar_669ef_00001   TERMINATED    128      8   0.0139934                8       10           164.411    2.30709       0.0997 ││ train_cifar_669ef_00002   TERMINATED      4    256   0.00049422              16       10            88.9943   1.27744       0.5276 ││ train_cifar_669ef_00003   TERMINATED      4     64   0.00662052              16       10            89.7402   1.50707       0.4545 ││ train_cifar_669ef_00005   PENDING         8     32   0.0489862                4                                                    ││ train_cifar_669ef_00006   PENDING       256    128   0.000152455              2                                                    ││ train_cifar_669ef_00007   PENDING         1     64   0.00384703              16                                                    ││ train_cifar_669ef_00008   PENDING         1      4   0.073969                 4                                                    ││ train_cifar_669ef_00009   PENDING         1     32   0.00174425              16                                                    │╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯(func pid=7957) [1, 20000] loss: 0.230Trial train_cifar_669ef_00004 completed after 1 iterations at 2026-02-19 17:09:56. Total running time: 17min 16s╭────────────────────────────────────────────────────────────╮│ Trial train_cifar_669ef_00004 result                       │├────────────────────────────────────────────────────────────┤│ checkpoint_dir_name                      checkpoint_000000 ││ time_this_iter_s                                  63.47155 ││ time_total_s                                      63.47155 ││ training_iteration                                       1 ││ accuracy                                            0.0992 ││ loss                                               2.30331 │╰────────────────────────────────────────────────────────────╯(func pid=7957) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2026-02-19_16-52-39/train_cifar_669ef_00004_4_batch_size=2,l1=8,l2=1,lr=0.0004_2026-02-19_16-52-39/checkpoint_000000)Trial train_cifar_669ef_00005 started with configuration:╭──────────────────────────────────────────────────╮│ Trial train_cifar_669ef_00005 config             │├──────────────────────────────────────────────────┤│ batch_size                                     4 ││ device                                      cuda ││ l1                                             8 ││ l2                                            32 ││ lr                                       0.04899 │╰──────────────────────────────────────────────────╯(func pid=8093) [1,  2000] loss: 2.331Trial status: 5 TERMINATED | 1 RUNNING | 4 PENDINGCurrent time: 2026-02-19 17:10:12. Total running time: 17min 32sLogical resource usage: 2.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:A10G)Current best trial: 669ef_00002 with loss=1.2774354283332825 and params={'l1': 4, 'l2': 256, 'lr': 0.00049422038084244, 'batch_size': 16, 'device': 'cuda'}╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮│ Trial name                status         l1     l2            lr     batch_size     iter     total time (s)      loss     accuracy │├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤│ train_cifar_669ef_00005   RUNNING         8     32   0.0489862                4                                                    ││ train_cifar_669ef_00000   TERMINATED     32      4   0.00732106               2       10           608.295    2.31656       0.0973 ││ train_cifar_669ef_00001   TERMINATED    128      8   0.0139934                8       10           164.411    2.30709       0.0997 ││ train_cifar_669ef_00002   TERMINATED      4    256   0.00049422              16       10            88.9943   1.27744       0.5276 ││ train_cifar_669ef_00003   TERMINATED      4     64   0.00662052              16       10            89.7402   1.50707       0.4545 ││ train_cifar_669ef_00004   TERMINATED      8      1   0.000384563              2        1            63.4715   2.30331       0.0992 ││ train_cifar_669ef_00006   PENDING       256    128   0.000152455              2                                                    ││ train_cifar_669ef_00007   PENDING         1     64   0.00384703              16                                                    ││ train_cifar_669ef_00008   PENDING         1      4   0.073969                 4                                                    ││ train_cifar_669ef_00009   PENDING         1     32   0.00174425              16                                                    │╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯(func pid=8093) [1,  4000] loss: 1.164(func pid=8093) [1,  6000] loss: 0.777(func pid=8093) [1,  8000] loss: 0.582(func pid=8093) [2026-02-19 17:10:27,456 E 8093 8128] core_worker_process.cc:837: Failed to establish connection to the metrics exporter agent. Metrics will not be exported. Exporter agent status: RpcError: Running out of retries to initialize the metrics agent. rpc_code: 14(func pid=8093) [1, 10000] loss: 0.466(func pid=8093) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2026-02-19_16-52-39/train_cifar_669ef_00005_5_batch_size=4,l1=8,l2=32,lr=0.0490_2026-02-19_16-52-39/checkpoint_000000)Trial train_cifar_669ef_00005 completed after 1 iterations at 2026-02-19 17:10:33. Total running time: 17min 53s╭────────────────────────────────────────────────────────────╮│ Trial train_cifar_669ef_00005 result                       │├────────────────────────────────────────────────────────────┤│ checkpoint_dir_name                      checkpoint_000000 ││ time_this_iter_s                                  33.14767 ││ time_total_s                                      33.14767 ││ training_iteration                                       1 ││ accuracy                                            0.0991 ││ loss                                               2.32344 │╰────────────────────────────────────────────────────────────╯Trial train_cifar_669ef_00006 started with configuration:╭──────────────────────────────────────────────────╮│ Trial train_cifar_669ef_00006 config             │├──────────────────────────────────────────────────┤│ batch_size                                     2 ││ device                                      cuda ││ l1                                           256 ││ l2                                           128 ││ lr                                       0.00015 │╰──────────────────────────────────────────────────╯Trial status: 6 TERMINATED | 1 RUNNING | 3 PENDINGCurrent time: 2026-02-19 17:10:42. Total running time: 18min 2sLogical resource usage: 2.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:A10G)Current best trial: 669ef_00002 with loss=1.2774354283332825 and params={'l1': 4, 'l2': 256, 'lr': 0.00049422038084244, 'batch_size': 16, 'device': 'cuda'}╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮│ Trial name                status         l1     l2            lr     batch_size     iter     total time (s)      loss     accuracy │├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤│ train_cifar_669ef_00006   RUNNING       256    128   0.000152455              2                                                    ││ train_cifar_669ef_00000   TERMINATED     32      4   0.00732106               2       10           608.295    2.31656       0.0973 ││ train_cifar_669ef_00001   TERMINATED    128      8   0.0139934                8       10           164.411    2.30709       0.0997 ││ train_cifar_669ef_00002   TERMINATED      4    256   0.00049422              16       10            88.9943   1.27744       0.5276 ││ train_cifar_669ef_00003   TERMINATED      4     64   0.00662052              16       10            89.7402   1.50707       0.4545 ││ train_cifar_669ef_00004   TERMINATED      8      1   0.000384563              2        1            63.4715   2.30331       0.0992 ││ train_cifar_669ef_00005   TERMINATED      8     32   0.0489862                4        1            33.1477   2.32344       0.0991 ││ train_cifar_669ef_00007   PENDING         1     64   0.00384703              16                                                    ││ train_cifar_669ef_00008   PENDING         1      4   0.073969                 4                                                    ││ train_cifar_669ef_00009   PENDING         1     32   0.00174425              16                                                    │╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯(func pid=8225) [1,  2000] loss: 2.294(func pid=8225) [1,  4000] loss: 1.082(func pid=8225) [1,  6000] loss: 0.655(func pid=8225) [1,  8000] loss: 0.467(func pid=8225) [2026-02-19 17:11:04,457 E 8225 8260] core_worker_process.cc:837: Failed to establish connection to the metrics exporter agent. Metrics will not be exported. Exporter agent status: RpcError: Running out of retries to initialize the metrics agent. rpc_code: 14(func pid=8225) [1, 10000] loss: 0.359(func pid=8225) [1, 12000] loss: 0.286Trial status: 6 TERMINATED | 1 RUNNING | 3 PENDINGCurrent time: 2026-02-19 17:11:12. Total running time: 18min 32sLogical resource usage: 2.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:A10G)Current best trial: 669ef_00002 with loss=1.2774354283332825 and params={'l1': 4, 'l2': 256, 'lr': 0.00049422038084244, 'batch_size': 16, 'device': 'cuda'}╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮│ Trial name                status         l1     l2            lr     batch_size     iter     total time (s)      loss     accuracy │├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤│ train_cifar_669ef_00006   RUNNING       256    128   0.000152455              2                                                    ││ train_cifar_669ef_00000   TERMINATED     32      4   0.00732106               2       10           608.295    2.31656       0.0973 ││ train_cifar_669ef_00001   TERMINATED    128      8   0.0139934                8       10           164.411    2.30709       0.0997 ││ train_cifar_669ef_00002   TERMINATED      4    256   0.00049422              16       10            88.9943   1.27744       0.5276 ││ train_cifar_669ef_00003   TERMINATED      4     64   0.00662052              16       10            89.7402   1.50707       0.4545 ││ train_cifar_669ef_00004   TERMINATED      8      1   0.000384563              2        1            63.4715   2.30331       0.0992 ││ train_cifar_669ef_00005   TERMINATED      8     32   0.0489862                4        1            33.1477   2.32344       0.0991 ││ train_cifar_669ef_00007   PENDING         1     64   0.00384703              16                                                    ││ train_cifar_669ef_00008   PENDING         1      4   0.073969                 4                                                    ││ train_cifar_669ef_00009   PENDING         1     32   0.00174425              16                                                    │╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯(func pid=8225) [1, 14000] loss: 0.239(func pid=8225) [1, 16000] loss: 0.203(func pid=8225) [1, 18000] loss: 0.176(func pid=8225) [1, 20000] loss: 0.154(func pid=8225) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2026-02-19_16-52-39/train_cifar_669ef_00006_6_batch_size=2,l1=256,l2=128,lr=0.0002_2026-02-19_16-52-39/checkpoint_000000)Trial status: 6 TERMINATED | 1 RUNNING | 3 PENDINGCurrent time: 2026-02-19 17:11:42. Total running time: 19min 2sLogical resource usage: 2.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:A10G)Current best trial: 669ef_00002 with loss=1.2774354283332825 and params={'l1': 4, 'l2': 256, 'lr': 0.00049422038084244, 'batch_size': 16, 'device': 'cuda'}╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮│ Trial name                status         l1     l2            lr     batch_size     iter     total time (s)      loss     accuracy │├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤│ train_cifar_669ef_00006   RUNNING       256    128   0.000152455              2        1            62.72     1.53703       0.4389 ││ train_cifar_669ef_00000   TERMINATED     32      4   0.00732106               2       10           608.295    2.31656       0.0973 ││ train_cifar_669ef_00001   TERMINATED    128      8   0.0139934                8       10           164.411    2.30709       0.0997 ││ train_cifar_669ef_00002   TERMINATED      4    256   0.00049422              16       10            88.9943   1.27744       0.5276 ││ train_cifar_669ef_00003   TERMINATED      4     64   0.00662052              16       10            89.7402   1.50707       0.4545 ││ train_cifar_669ef_00004   TERMINATED      8      1   0.000384563              2        1            63.4715   2.30331       0.0992 ││ train_cifar_669ef_00005   TERMINATED      8     32   0.0489862                4        1            33.1477   2.32344       0.0991 ││ train_cifar_669ef_00007   PENDING         1     64   0.00384703              16                                                    ││ train_cifar_669ef_00008   PENDING         1      4   0.073969                 4                                                    ││ train_cifar_669ef_00009   PENDING         1     32   0.00174425              16                                                    │╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯(func pid=8225) [2,  2000] loss: 1.498(func pid=8225) [2,  4000] loss: 0.753(func pid=8225) [2,  6000] loss: 0.497(func pid=8225) [2,  8000] loss: 0.370(func pid=8225) [2, 10000] loss: 0.289(func pid=8225) [2, 12000] loss: 0.239Trial status: 6 TERMINATED | 1 RUNNING | 3 PENDINGCurrent time: 2026-02-19 17:12:12. Total running time: 19min 32sLogical resource usage: 2.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:A10G)Current best trial: 669ef_00002 with loss=1.2774354283332825 and params={'l1': 4, 'l2': 256, 'lr': 0.00049422038084244, 'batch_size': 16, 'device': 'cuda'}╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮│ Trial name                status         l1     l2            lr     batch_size     iter     total time (s)      loss     accuracy │├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤│ train_cifar_669ef_00006   RUNNING       256    128   0.000152455              2        1            62.72     1.53703       0.4389 ││ train_cifar_669ef_00000   TERMINATED     32      4   0.00732106               2       10           608.295    2.31656       0.0973 ││ train_cifar_669ef_00001   TERMINATED    128      8   0.0139934                8       10           164.411    2.30709       0.0997 ││ train_cifar_669ef_00002   TERMINATED      4    256   0.00049422              16       10            88.9943   1.27744       0.5276 ││ train_cifar_669ef_00003   TERMINATED      4     64   0.00662052              16       10            89.7402   1.50707       0.4545 ││ train_cifar_669ef_00004   TERMINATED      8      1   0.000384563              2        1            63.4715   2.30331       0.0992 ││ train_cifar_669ef_00005   TERMINATED      8     32   0.0489862                4        1            33.1477   2.32344       0.0991 ││ train_cifar_669ef_00007   PENDING         1     64   0.00384703              16                                                    ││ train_cifar_669ef_00008   PENDING         1      4   0.073969                 4                                                    ││ train_cifar_669ef_00009   PENDING         1     32   0.00174425              16                                                    │╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯(func pid=8225) [2, 14000] loss: 0.202(func pid=8225) [2, 16000] loss: 0.177(func pid=8225) [2, 18000] loss: 0.154(func pid=8225) [2, 20000] loss: 0.140(func pid=8225) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2026-02-19_16-52-39/train_cifar_669ef_00006_6_batch_size=2,l1=256,l2=128,lr=0.0002_2026-02-19_16-52-39/checkpoint_000001)Trial status: 6 TERMINATED | 1 RUNNING | 3 PENDINGCurrent time: 2026-02-19 17:12:42. Total running time: 20min 2sLogical resource usage: 2.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:A10G)Current best trial: 669ef_00002 with loss=1.2774354283332825 and params={'l1': 4, 'l2': 256, 'lr': 0.00049422038084244, 'batch_size': 16, 'device': 'cuda'}╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮│ Trial name                status         l1     l2            lr     batch_size     iter     total time (s)      loss     accuracy │├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤│ train_cifar_669ef_00006   RUNNING       256    128   0.000152455              2        2           123.459    1.40869       0.4904 ││ train_cifar_669ef_00000   TERMINATED     32      4   0.00732106               2       10           608.295    2.31656       0.0973 ││ train_cifar_669ef_00001   TERMINATED    128      8   0.0139934                8       10           164.411    2.30709       0.0997 ││ train_cifar_669ef_00002   TERMINATED      4    256   0.00049422              16       10            88.9943   1.27744       0.5276 ││ train_cifar_669ef_00003   TERMINATED      4     64   0.00662052              16       10            89.7402   1.50707       0.4545 ││ train_cifar_669ef_00004   TERMINATED      8      1   0.000384563              2        1            63.4715   2.30331       0.0992 ││ train_cifar_669ef_00005   TERMINATED      8     32   0.0489862                4        1            33.1477   2.32344       0.0991 ││ train_cifar_669ef_00007   PENDING         1     64   0.00384703              16                                                    ││ train_cifar_669ef_00008   PENDING         1      4   0.073969                 4                                                    ││ train_cifar_669ef_00009   PENDING         1     32   0.00174425              16                                                    │╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯(func pid=8225) [3,  2000] loss: 1.357(func pid=8225) [3,  4000] loss: 0.660(func pid=8225) [3,  6000] loss: 0.439(func pid=8225) [3,  8000] loss: 0.332(func pid=8225) [3, 10000] loss: 0.263Trial status: 6 TERMINATED | 1 RUNNING | 3 PENDINGCurrent time: 2026-02-19 17:13:12. Total running time: 20min 32sLogical resource usage: 2.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:A10G)Current best trial: 669ef_00002 with loss=1.2774354283332825 and params={'l1': 4, 'l2': 256, 'lr': 0.00049422038084244, 'batch_size': 16, 'device': 'cuda'}╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮│ Trial name                status         l1     l2            lr     batch_size     iter     total time (s)      loss     accuracy │├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤│ train_cifar_669ef_00006   RUNNING       256    128   0.000152455              2        2           123.459    1.40869       0.4904 ││ train_cifar_669ef_00000   TERMINATED     32      4   0.00732106               2       10           608.295    2.31656       0.0973 ││ train_cifar_669ef_00001   TERMINATED    128      8   0.0139934                8       10           164.411    2.30709       0.0997 ││ train_cifar_669ef_00002   TERMINATED      4    256   0.00049422              16       10            88.9943   1.27744       0.5276 ││ train_cifar_669ef_00003   TERMINATED      4     64   0.00662052              16       10            89.7402   1.50707       0.4545 ││ train_cifar_669ef_00004   TERMINATED      8      1   0.000384563              2        1            63.4715   2.30331       0.0992 ││ train_cifar_669ef_00005   TERMINATED      8     32   0.0489862                4        1            33.1477   2.32344       0.0991 ││ train_cifar_669ef_00007   PENDING         1     64   0.00384703              16                                                    ││ train_cifar_669ef_00008   PENDING         1      4   0.073969                 4                                                    ││ train_cifar_669ef_00009   PENDING         1     32   0.00174425              16                                                    │╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯(func pid=8225) [3, 12000] loss: 0.221(func pid=8225) [3, 14000] loss: 0.187(func pid=8225) [3, 16000] loss: 0.163(func pid=8225) [3, 18000] loss: 0.141(func pid=8225) [3, 20000] loss: 0.127(func pid=8225) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2026-02-19_16-52-39/train_cifar_669ef_00006_6_batch_size=2,l1=256,l2=128,lr=0.0002_2026-02-19_16-52-39/checkpoint_000002)Trial status: 6 TERMINATED | 1 RUNNING | 3 PENDINGCurrent time: 2026-02-19 17:13:42. Total running time: 21min 2sLogical resource usage: 2.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:A10G)Current best trial: 669ef_00002 with loss=1.2774354283332825 and params={'l1': 4, 'l2': 256, 'lr': 0.00049422038084244, 'batch_size': 16, 'device': 'cuda'}╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮│ Trial name                status         l1     l2            lr     batch_size     iter     total time (s)      loss     accuracy │├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤│ train_cifar_669ef_00006   RUNNING       256    128   0.000152455              2        3           184.424    1.3264        0.5261 ││ train_cifar_669ef_00000   TERMINATED     32      4   0.00732106               2       10           608.295    2.31656       0.0973 ││ train_cifar_669ef_00001   TERMINATED    128      8   0.0139934                8       10           164.411    2.30709       0.0997 ││ train_cifar_669ef_00002   TERMINATED      4    256   0.00049422              16       10            88.9943   1.27744       0.5276 ││ train_cifar_669ef_00003   TERMINATED      4     64   0.00662052              16       10            89.7402   1.50707       0.4545 ││ train_cifar_669ef_00004   TERMINATED      8      1   0.000384563              2        1            63.4715   2.30331       0.0992 ││ train_cifar_669ef_00005   TERMINATED      8     32   0.0489862                4        1            33.1477   2.32344       0.0991 ││ train_cifar_669ef_00007   PENDING         1     64   0.00384703              16                                                    ││ train_cifar_669ef_00008   PENDING         1      4   0.073969                 4                                                    ││ train_cifar_669ef_00009   PENDING         1     32   0.00174425              16                                                    │╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯(func pid=8225) [4,  2000] loss: 1.192(func pid=8225) [4,  4000] loss: 0.618(func pid=8225) [4,  6000] loss: 0.407(func pid=8225) [4,  8000] loss: 0.297(func pid=8225) [4, 10000] loss: 0.246Trial status: 6 TERMINATED | 1 RUNNING | 3 PENDINGCurrent time: 2026-02-19 17:14:12. Total running time: 21min 32sLogical resource usage: 2.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:A10G)Current best trial: 669ef_00002 with loss=1.2774354283332825 and params={'l1': 4, 'l2': 256, 'lr': 0.00049422038084244, 'batch_size': 16, 'device': 'cuda'}╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮│ Trial name                status         l1     l2            lr     batch_size     iter     total time (s)      loss     accuracy │├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤│ train_cifar_669ef_00006   RUNNING       256    128   0.000152455              2        3           184.424    1.3264        0.5261 ││ train_cifar_669ef_00000   TERMINATED     32      4   0.00732106               2       10           608.295    2.31656       0.0973 ││ train_cifar_669ef_00001   TERMINATED    128      8   0.0139934                8       10           164.411    2.30709       0.0997 ││ train_cifar_669ef_00002   TERMINATED      4    256   0.00049422              16       10            88.9943   1.27744       0.5276 ││ train_cifar_669ef_00003   TERMINATED      4     64   0.00662052              16       10            89.7402   1.50707       0.4545 ││ train_cifar_669ef_00004   TERMINATED      8      1   0.000384563              2        1            63.4715   2.30331       0.0992 ││ train_cifar_669ef_00005   TERMINATED      8     32   0.0489862                4        1            33.1477   2.32344       0.0991 ││ train_cifar_669ef_00007   PENDING         1     64   0.00384703              16                                                    ││ train_cifar_669ef_00008   PENDING         1      4   0.073969                 4                                                    ││ train_cifar_669ef_00009   PENDING         1     32   0.00174425              16                                                    │╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯(func pid=8225) [4, 12000] loss: 0.204(func pid=8225) [4, 14000] loss: 0.171(func pid=8225) [4, 16000] loss: 0.147(func pid=8225) [4, 18000] loss: 0.134(func pid=8225) [4, 20000] loss: 0.120Trial status: 6 TERMINATED | 1 RUNNING | 3 PENDINGCurrent time: 2026-02-19 17:14:42. Total running time: 22min 2sLogical resource usage: 2.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:A10G)Current best trial: 669ef_00002 with loss=1.2774354283332825 and params={'l1': 4, 'l2': 256, 'lr': 0.00049422038084244, 'batch_size': 16, 'device': 'cuda'}╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮│ Trial name                status         l1     l2            lr     batch_size     iter     total time (s)      loss     accuracy │├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤│ train_cifar_669ef_00006   RUNNING       256    128   0.000152455              2        3           184.424    1.3264        0.5261 ││ train_cifar_669ef_00000   TERMINATED     32      4   0.00732106               2       10           608.295    2.31656       0.0973 ││ train_cifar_669ef_00001   TERMINATED    128      8   0.0139934                8       10           164.411    2.30709       0.0997 ││ train_cifar_669ef_00002   TERMINATED      4    256   0.00049422              16       10            88.9943   1.27744       0.5276 ││ train_cifar_669ef_00003   TERMINATED      4     64   0.00662052              16       10            89.7402   1.50707       0.4545 ││ train_cifar_669ef_00004   TERMINATED      8      1   0.000384563              2        1            63.4715   2.30331       0.0992 ││ train_cifar_669ef_00005   TERMINATED      8     32   0.0489862                4        1            33.1477   2.32344       0.0991 ││ train_cifar_669ef_00007   PENDING         1     64   0.00384703              16                                                    ││ train_cifar_669ef_00008   PENDING         1      4   0.073969                 4                                                    ││ train_cifar_669ef_00009   PENDING         1     32   0.00174425              16                                                    │╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯(func pid=8225) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2026-02-19_16-52-39/train_cifar_669ef_00006_6_batch_size=2,l1=256,l2=128,lr=0.0002_2026-02-19_16-52-39/checkpoint_000003)(func pid=8225) [5,  2000] loss: 1.082(func pid=8225) [5,  4000] loss: 0.572(func pid=8225) [5,  6000] loss: 0.379(func pid=8225) [5,  8000] loss: 0.278(func pid=8225) [5, 10000] loss: 0.225Trial status: 6 TERMINATED | 1 RUNNING | 3 PENDINGCurrent time: 2026-02-19 17:15:12. Total running time: 22min 32sLogical resource usage: 2.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:A10G)Current best trial: 669ef_00006 with loss=1.23761638380941 and params={'l1': 256, 'l2': 128, 'lr': 0.00015245510744902382, 'batch_size': 2, 'device': 'cuda'}╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮│ Trial name                status         l1     l2            lr     batch_size     iter     total time (s)      loss     accuracy │├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤│ train_cifar_669ef_00006   RUNNING       256    128   0.000152455              2        4           245.267    1.23762       0.5559 ││ train_cifar_669ef_00000   TERMINATED     32      4   0.00732106               2       10           608.295    2.31656       0.0973 ││ train_cifar_669ef_00001   TERMINATED    128      8   0.0139934                8       10           164.411    2.30709       0.0997 ││ train_cifar_669ef_00002   TERMINATED      4    256   0.00049422              16       10            88.9943   1.27744       0.5276 ││ train_cifar_669ef_00003   TERMINATED      4     64   0.00662052              16       10            89.7402   1.50707       0.4545 ││ train_cifar_669ef_00004   TERMINATED      8      1   0.000384563              2        1            63.4715   2.30331       0.0992 ││ train_cifar_669ef_00005   TERMINATED      8     32   0.0489862                4        1            33.1477   2.32344       0.0991 ││ train_cifar_669ef_00007   PENDING         1     64   0.00384703              16                                                    ││ train_cifar_669ef_00008   PENDING         1      4   0.073969                 4                                                    ││ train_cifar_669ef_00009   PENDING         1     32   0.00174425              16                                                    │╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯(func pid=8225) [5, 12000] loss: 0.192(func pid=8225) [5, 14000] loss: 0.159(func pid=8225) [5, 16000] loss: 0.140(func pid=8225) [5, 18000] loss: 0.126(func pid=8225) [5, 20000] loss: 0.113Trial status: 6 TERMINATED | 1 RUNNING | 3 PENDINGCurrent time: 2026-02-19 17:15:42. Total running time: 23min 3sLogical resource usage: 2.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:A10G)Current best trial: 669ef_00006 with loss=1.23761638380941 and params={'l1': 256, 'l2': 128, 'lr': 0.00015245510744902382, 'batch_size': 2, 'device': 'cuda'}╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮│ Trial name                status         l1     l2            lr     batch_size     iter     total time (s)      loss     accuracy │├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤│ train_cifar_669ef_00006   RUNNING       256    128   0.000152455              2        4           245.267    1.23762       0.5559 ││ train_cifar_669ef_00000   TERMINATED     32      4   0.00732106               2       10           608.295    2.31656       0.0973 ││ train_cifar_669ef_00001   TERMINATED    128      8   0.0139934                8       10           164.411    2.30709       0.0997 ││ train_cifar_669ef_00002   TERMINATED      4    256   0.00049422              16       10            88.9943   1.27744       0.5276 ││ train_cifar_669ef_00003   TERMINATED      4     64   0.00662052              16       10            89.7402   1.50707       0.4545 ││ train_cifar_669ef_00004   TERMINATED      8      1   0.000384563              2        1            63.4715   2.30331       0.0992 ││ train_cifar_669ef_00005   TERMINATED      8     32   0.0489862                4        1            33.1477   2.32344       0.0991 ││ train_cifar_669ef_00007   PENDING         1     64   0.00384703              16                                                    ││ train_cifar_669ef_00008   PENDING         1      4   0.073969                 4                                                    ││ train_cifar_669ef_00009   PENDING         1     32   0.00174425              16                                                    │╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯(func pid=8225) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2026-02-19_16-52-39/train_cifar_669ef_00006_6_batch_size=2,l1=256,l2=128,lr=0.0002_2026-02-19_16-52-39/checkpoint_000004)(func pid=8225) [6,  2000] loss: 1.087(func pid=8225) [6,  4000] loss: 0.512(func pid=8225) [6,  6000] loss: 0.355(func pid=8225) [6,  8000] loss: 0.267(func pid=8225) [6, 10000] loss: 0.203Trial status: 6 TERMINATED | 1 RUNNING | 3 PENDINGCurrent time: 2026-02-19 17:16:12. Total running time: 23min 33sLogical resource usage: 2.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:A10G)Current best trial: 669ef_00006 with loss=1.253446035902109 and params={'l1': 256, 'l2': 128, 'lr': 0.00015245510744902382, 'batch_size': 2, 'device': 'cuda'}╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮│ Trial name                status         l1     l2            lr     batch_size     iter     total time (s)      loss     accuracy │├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤│ train_cifar_669ef_00006   RUNNING       256    128   0.000152455              2        5           306.222    1.25345       0.5651 ││ train_cifar_669ef_00000   TERMINATED     32      4   0.00732106               2       10           608.295    2.31656       0.0973 ││ train_cifar_669ef_00001   TERMINATED    128      8   0.0139934                8       10           164.411    2.30709       0.0997 ││ train_cifar_669ef_00002   TERMINATED      4    256   0.00049422              16       10            88.9943   1.27744       0.5276 ││ train_cifar_669ef_00003   TERMINATED      4     64   0.00662052              16       10            89.7402   1.50707       0.4545 ││ train_cifar_669ef_00004   TERMINATED      8      1   0.000384563              2        1            63.4715   2.30331       0.0992 ││ train_cifar_669ef_00005   TERMINATED      8     32   0.0489862                4        1            33.1477   2.32344       0.0991 ││ train_cifar_669ef_00007   PENDING         1     64   0.00384703              16                                                    ││ train_cifar_669ef_00008   PENDING         1      4   0.073969                 4                                                    ││ train_cifar_669ef_00009   PENDING         1     32   0.00174425              16                                                    │╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯(func pid=8225) [6, 12000] loss: 0.177(func pid=8225) [6, 14000] loss: 0.152(func pid=8225) [6, 16000] loss: 0.130(func pid=8225) [6, 18000] loss: 0.115(func pid=8225) [6, 20000] loss: 0.105Trial status: 6 TERMINATED | 1 RUNNING | 3 PENDINGCurrent time: 2026-02-19 17:16:42. Total running time: 24min 3sLogical resource usage: 2.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:A10G)Current best trial: 669ef_00006 with loss=1.253446035902109 and params={'l1': 256, 'l2': 128, 'lr': 0.00015245510744902382, 'batch_size': 2, 'device': 'cuda'}╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮│ Trial name                status         l1     l2            lr     batch_size     iter     total time (s)      loss     accuracy │├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤│ train_cifar_669ef_00006   RUNNING       256    128   0.000152455              2        5           306.222    1.25345       0.5651 ││ train_cifar_669ef_00000   TERMINATED     32      4   0.00732106               2       10           608.295    2.31656       0.0973 ││ train_cifar_669ef_00001   TERMINATED    128      8   0.0139934                8       10           164.411    2.30709       0.0997 ││ train_cifar_669ef_00002   TERMINATED      4    256   0.00049422              16       10            88.9943   1.27744       0.5276 ││ train_cifar_669ef_00003   TERMINATED      4     64   0.00662052              16       10            89.7402   1.50707       0.4545 ││ train_cifar_669ef_00004   TERMINATED      8      1   0.000384563              2        1            63.4715   2.30331       0.0992 ││ train_cifar_669ef_00005   TERMINATED      8     32   0.0489862                4        1            33.1477   2.32344       0.0991 ││ train_cifar_669ef_00007   PENDING         1     64   0.00384703              16                                                    ││ train_cifar_669ef_00008   PENDING         1      4   0.073969                 4                                                    ││ train_cifar_669ef_00009   PENDING         1     32   0.00174425              16                                                    │╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯(func pid=8225) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2026-02-19_16-52-39/train_cifar_669ef_00006_6_batch_size=2,l1=256,l2=128,lr=0.0002_2026-02-19_16-52-39/checkpoint_000005)(func pid=8225) [7,  2000] loss: 0.963(func pid=8225) [7,  4000] loss: 0.489(func pid=8225) [7,  6000] loss: 0.315(func pid=8225) [7,  8000] loss: 0.251(func pid=8225) [7, 10000] loss: 0.206Trial status: 6 TERMINATED | 1 RUNNING | 3 PENDINGCurrent time: 2026-02-19 17:17:13. Total running time: 24min 33sLogical resource usage: 2.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:A10G)Current best trial: 669ef_00006 with loss=1.1807722110938281 and params={'l1': 256, 'l2': 128, 'lr': 0.00015245510744902382, 'batch_size': 2, 'device': 'cuda'}╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮│ Trial name                status         l1     l2            lr     batch_size     iter     total time (s)      loss     accuracy │├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤│ train_cifar_669ef_00006   RUNNING       256    128   0.000152455              2        6           367.267    1.18077       0.5838 ││ train_cifar_669ef_00000   TERMINATED     32      4   0.00732106               2       10           608.295    2.31656       0.0973 ││ train_cifar_669ef_00001   TERMINATED    128      8   0.0139934                8       10           164.411    2.30709       0.0997 ││ train_cifar_669ef_00002   TERMINATED      4    256   0.00049422              16       10            88.9943   1.27744       0.5276 ││ train_cifar_669ef_00003   TERMINATED      4     64   0.00662052              16       10            89.7402   1.50707       0.4545 ││ train_cifar_669ef_00004   TERMINATED      8      1   0.000384563              2        1            63.4715   2.30331       0.0992 ││ train_cifar_669ef_00005   TERMINATED      8     32   0.0489862                4        1            33.1477   2.32344       0.0991 ││ train_cifar_669ef_00007   PENDING         1     64   0.00384703              16                                                    ││ train_cifar_669ef_00008   PENDING         1      4   0.073969                 4                                                    ││ train_cifar_669ef_00009   PENDING         1     32   0.00174425              16                                                    │╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯(func pid=8225) [7, 12000] loss: 0.166(func pid=8225) [7, 14000] loss: 0.139(func pid=8225) [7, 16000] loss: 0.126(func pid=8225) [7, 18000] loss: 0.110(func pid=8225) [7, 20000] loss: 0.096Trial status: 6 TERMINATED | 1 RUNNING | 3 PENDINGCurrent time: 2026-02-19 17:17:43. Total running time: 25min 3sLogical resource usage: 2.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:A10G)Current best trial: 669ef_00006 with loss=1.1807722110938281 and params={'l1': 256, 'l2': 128, 'lr': 0.00015245510744902382, 'batch_size': 2, 'device': 'cuda'}╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮│ Trial name                status         l1     l2            lr     batch_size     iter     total time (s)      loss     accuracy │├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤│ train_cifar_669ef_00006   RUNNING       256    128   0.000152455              2        6           367.267    1.18077       0.5838 ││ train_cifar_669ef_00000   TERMINATED     32      4   0.00732106               2       10           608.295    2.31656       0.0973 ││ train_cifar_669ef_00001   TERMINATED    128      8   0.0139934                8       10           164.411    2.30709       0.0997 ││ train_cifar_669ef_00002   TERMINATED      4    256   0.00049422              16       10            88.9943   1.27744       0.5276 ││ train_cifar_669ef_00003   TERMINATED      4     64   0.00662052              16       10            89.7402   1.50707       0.4545 ││ train_cifar_669ef_00004   TERMINATED      8      1   0.000384563              2        1            63.4715   2.30331       0.0992 ││ train_cifar_669ef_00005   TERMINATED      8     32   0.0489862                4        1            33.1477   2.32344       0.0991 ││ train_cifar_669ef_00007   PENDING         1     64   0.00384703              16                                                    ││ train_cifar_669ef_00008   PENDING         1      4   0.073969                 4                                                    ││ train_cifar_669ef_00009   PENDING         1     32   0.00174425              16                                                    │╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯(func pid=8225) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2026-02-19_16-52-39/train_cifar_669ef_00006_6_batch_size=2,l1=256,l2=128,lr=0.0002_2026-02-19_16-52-39/checkpoint_000006)(func pid=8225) [8,  2000] loss: 0.886(func pid=8225) [8,  4000] loss: 0.431(func pid=8225) [8,  6000] loss: 0.300(func pid=8225) [8,  8000] loss: 0.232(func pid=8225) [8, 10000] loss: 0.189Trial status: 6 TERMINATED | 1 RUNNING | 3 PENDINGCurrent time: 2026-02-19 17:18:13. Total running time: 25min 33sLogical resource usage: 2.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:A10G)Current best trial: 669ef_00006 with loss=1.1179582848095466 and params={'l1': 256, 'l2': 128, 'lr': 0.00015245510744902382, 'batch_size': 2, 'device': 'cuda'}╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮│ Trial name                status         l1     l2            lr     batch_size     iter     total time (s)      loss     accuracy │├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤│ train_cifar_669ef_00006   RUNNING       256    128   0.000152455              2        7           428.052    1.11796       0.6139 ││ train_cifar_669ef_00000   TERMINATED     32      4   0.00732106               2       10           608.295    2.31656       0.0973 ││ train_cifar_669ef_00001   TERMINATED    128      8   0.0139934                8       10           164.411    2.30709       0.0997 ││ train_cifar_669ef_00002   TERMINATED      4    256   0.00049422              16       10            88.9943   1.27744       0.5276 ││ train_cifar_669ef_00003   TERMINATED      4     64   0.00662052              16       10            89.7402   1.50707       0.4545 ││ train_cifar_669ef_00004   TERMINATED      8      1   0.000384563              2        1            63.4715   2.30331       0.0992 ││ train_cifar_669ef_00005   TERMINATED      8     32   0.0489862                4        1            33.1477   2.32344       0.0991 ││ train_cifar_669ef_00007   PENDING         1     64   0.00384703              16                                                    ││ train_cifar_669ef_00008   PENDING         1      4   0.073969                 4                                                    ││ train_cifar_669ef_00009   PENDING         1     32   0.00174425              16                                                    │╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯(func pid=8225) [8, 12000] loss: 0.156(func pid=8225) [8, 14000] loss: 0.136(func pid=8225) [8, 16000] loss: 0.115(func pid=8225) [8, 18000] loss: 0.104(func pid=8225) [8, 20000] loss: 0.092Trial status: 6 TERMINATED | 1 RUNNING | 3 PENDINGCurrent time: 2026-02-19 17:18:43. Total running time: 26min 3sLogical resource usage: 2.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:A10G)Current best trial: 669ef_00006 with loss=1.1179582848095466 and params={'l1': 256, 'l2': 128, 'lr': 0.00015245510744902382, 'batch_size': 2, 'device': 'cuda'}╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮│ Trial name                status         l1     l2            lr     batch_size     iter     total time (s)      loss     accuracy │├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤│ train_cifar_669ef_00006   RUNNING       256    128   0.000152455              2        7           428.052    1.11796       0.6139 ││ train_cifar_669ef_00000   TERMINATED     32      4   0.00732106               2       10           608.295    2.31656       0.0973 ││ train_cifar_669ef_00001   TERMINATED    128      8   0.0139934                8       10           164.411    2.30709       0.0997 ││ train_cifar_669ef_00002   TERMINATED      4    256   0.00049422              16       10            88.9943   1.27744       0.5276 ││ train_cifar_669ef_00003   TERMINATED      4     64   0.00662052              16       10            89.7402   1.50707       0.4545 ││ train_cifar_669ef_00004   TERMINATED      8      1   0.000384563              2        1            63.4715   2.30331       0.0992 ││ train_cifar_669ef_00005   TERMINATED      8     32   0.0489862                4        1            33.1477   2.32344       0.0991 ││ train_cifar_669ef_00007   PENDING         1     64   0.00384703              16                                                    ││ train_cifar_669ef_00008   PENDING         1      4   0.073969                 4                                                    ││ train_cifar_669ef_00009   PENDING         1     32   0.00174425              16                                                    │╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯(func pid=8225) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2026-02-19_16-52-39/train_cifar_669ef_00006_6_batch_size=2,l1=256,l2=128,lr=0.0002_2026-02-19_16-52-39/checkpoint_000007)(func pid=8225) [9,  2000] loss: 0.841(func pid=8225) [9,  4000] loss: 0.394(func pid=8225) [9,  6000] loss: 0.287(func pid=8225) [9,  8000] loss: 0.213(func pid=8225) [9, 10000] loss: 0.178Trial status: 6 TERMINATED | 1 RUNNING | 3 PENDINGCurrent time: 2026-02-19 17:19:13. Total running time: 26min 33sLogical resource usage: 2.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:A10G)Current best trial: 669ef_00006 with loss=1.1099405274234828 and params={'l1': 256, 'l2': 128, 'lr': 0.00015245510744902382, 'batch_size': 2, 'device': 'cuda'}╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮│ Trial name                status         l1     l2            lr     batch_size     iter     total time (s)      loss     accuracy │├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤│ train_cifar_669ef_00006   RUNNING       256    128   0.000152455              2        8           488.705    1.10994       0.6193 ││ train_cifar_669ef_00000   TERMINATED     32      4   0.00732106               2       10           608.295    2.31656       0.0973 ││ train_cifar_669ef_00001   TERMINATED    128      8   0.0139934                8       10           164.411    2.30709       0.0997 ││ train_cifar_669ef_00002   TERMINATED      4    256   0.00049422              16       10            88.9943   1.27744       0.5276 ││ train_cifar_669ef_00003   TERMINATED      4     64   0.00662052              16       10            89.7402   1.50707       0.4545 ││ train_cifar_669ef_00004   TERMINATED      8      1   0.000384563              2        1            63.4715   2.30331       0.0992 ││ train_cifar_669ef_00005   TERMINATED      8     32   0.0489862                4        1            33.1477   2.32344       0.0991 ││ train_cifar_669ef_00007   PENDING         1     64   0.00384703              16                                                    ││ train_cifar_669ef_00008   PENDING         1      4   0.073969                 4                                                    ││ train_cifar_669ef_00009   PENDING         1     32   0.00174425              16                                                    │╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯(func pid=8225) [9, 12000] loss: 0.144(func pid=8225) [9, 14000] loss: 0.126(func pid=8225) [9, 16000] loss: 0.107(func pid=8225) [9, 18000] loss: 0.094(func pid=8225) [9, 20000] loss: 0.088Trial status: 6 TERMINATED | 1 RUNNING | 3 PENDINGCurrent time: 2026-02-19 17:19:43. Total running time: 27min 3sLogical resource usage: 2.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:A10G)Current best trial: 669ef_00006 with loss=1.1099405274234828 and params={'l1': 256, 'l2': 128, 'lr': 0.00015245510744902382, 'batch_size': 2, 'device': 'cuda'}╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮│ Trial name                status         l1     l2            lr     batch_size     iter     total time (s)      loss     accuracy │├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤│ train_cifar_669ef_00006   RUNNING       256    128   0.000152455              2        8           488.705    1.10994       0.6193 ││ train_cifar_669ef_00000   TERMINATED     32      4   0.00732106               2       10           608.295    2.31656       0.0973 ││ train_cifar_669ef_00001   TERMINATED    128      8   0.0139934                8       10           164.411    2.30709       0.0997 ││ train_cifar_669ef_00002   TERMINATED      4    256   0.00049422              16       10            88.9943   1.27744       0.5276 ││ train_cifar_669ef_00003   TERMINATED      4     64   0.00662052              16       10            89.7402   1.50707       0.4545 ││ train_cifar_669ef_00004   TERMINATED      8      1   0.000384563              2        1            63.4715   2.30331       0.0992 ││ train_cifar_669ef_00005   TERMINATED      8     32   0.0489862                4        1            33.1477   2.32344       0.0991 ││ train_cifar_669ef_00007   PENDING         1     64   0.00384703              16                                                    ││ train_cifar_669ef_00008   PENDING         1      4   0.073969                 4                                                    ││ train_cifar_669ef_00009   PENDING         1     32   0.00174425              16                                                    │╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯(func pid=8225) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2026-02-19_16-52-39/train_cifar_669ef_00006_6_batch_size=2,l1=256,l2=128,lr=0.0002_2026-02-19_16-52-39/checkpoint_000008)(func pid=8225) [10,  2000] loss: 0.744(func pid=8225) [10,  4000] loss: 0.381(func pid=8225) [10,  6000] loss: 0.259(func pid=8225) [10,  8000] loss: 0.198Trial status: 6 TERMINATED | 1 RUNNING | 3 PENDINGCurrent time: 2026-02-19 17:20:13. Total running time: 27min 33sLogical resource usage: 2.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:A10G)Current best trial: 669ef_00006 with loss=1.1184868916329929 and params={'l1': 256, 'l2': 128, 'lr': 0.00015245510744902382, 'batch_size': 2, 'device': 'cuda'}╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮│ Trial name                status         l1     l2            lr     batch_size     iter     total time (s)      loss     accuracy │├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤│ train_cifar_669ef_00006   RUNNING       256    128   0.000152455              2        9           549.808    1.11849       0.6191 ││ train_cifar_669ef_00000   TERMINATED     32      4   0.00732106               2       10           608.295    2.31656       0.0973 ││ train_cifar_669ef_00001   TERMINATED    128      8   0.0139934                8       10           164.411    2.30709       0.0997 ││ train_cifar_669ef_00002   TERMINATED      4    256   0.00049422              16       10            88.9943   1.27744       0.5276 ││ train_cifar_669ef_00003   TERMINATED      4     64   0.00662052              16       10            89.7402   1.50707       0.4545 ││ train_cifar_669ef_00004   TERMINATED      8      1   0.000384563              2        1            63.4715   2.30331       0.0992 ││ train_cifar_669ef_00005   TERMINATED      8     32   0.0489862                4        1            33.1477   2.32344       0.0991 ││ train_cifar_669ef_00007   PENDING         1     64   0.00384703              16                                                    ││ train_cifar_669ef_00008   PENDING         1      4   0.073969                 4                                                    ││ train_cifar_669ef_00009   PENDING         1     32   0.00174425              16                                                    │╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯(func pid=8225) [10, 10000] loss: 0.162(func pid=8225) [10, 12000] loss: 0.132(func pid=8225) [10, 14000] loss: 0.117(func pid=8225) [10, 16000] loss: 0.105(func pid=8225) [10, 18000] loss: 0.091(func pid=8225) [10, 20000] loss: 0.083Trial status: 6 TERMINATED | 1 RUNNING | 3 PENDINGCurrent time: 2026-02-19 17:20:43. Total running time: 28min 3sLogical resource usage: 2.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:A10G)Current best trial: 669ef_00006 with loss=1.1184868916329929 and params={'l1': 256, 'l2': 128, 'lr': 0.00015245510744902382, 'batch_size': 2, 'device': 'cuda'}╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮│ Trial name                status         l1     l2            lr     batch_size     iter     total time (s)      loss     accuracy │├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤│ train_cifar_669ef_00006   RUNNING       256    128   0.000152455              2        9           549.808    1.11849       0.6191 ││ train_cifar_669ef_00000   TERMINATED     32      4   0.00732106               2       10           608.295    2.31656       0.0973 ││ train_cifar_669ef_00001   TERMINATED    128      8   0.0139934                8       10           164.411    2.30709       0.0997 ││ train_cifar_669ef_00002   TERMINATED      4    256   0.00049422              16       10            88.9943   1.27744       0.5276 ││ train_cifar_669ef_00003   TERMINATED      4     64   0.00662052              16       10            89.7402   1.50707       0.4545 ││ train_cifar_669ef_00004   TERMINATED      8      1   0.000384563              2        1            63.4715   2.30331       0.0992 ││ train_cifar_669ef_00005   TERMINATED      8     32   0.0489862                4        1            33.1477   2.32344       0.0991 ││ train_cifar_669ef_00007   PENDING         1     64   0.00384703              16                                                    ││ train_cifar_669ef_00008   PENDING         1      4   0.073969                 4                                                    ││ train_cifar_669ef_00009   PENDING         1     32   0.00174425              16                                                    │╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯Trial train_cifar_669ef_00006 completed after 10 iterations at 2026-02-19 17:20:48. Total running time: 28min 8s╭────────────────────────────────────────────────────────────╮│ Trial train_cifar_669ef_00006 result                       │├────────────────────────────────────────────────────────────┤│ checkpoint_dir_name                      checkpoint_000009 ││ time_this_iter_s                                  60.87206 ││ time_total_s                                     610.67964 ││ training_iteration                                      10 ││ accuracy                                            0.6302 ││ loss                                               1.12487 │╰────────────────────────────────────────────────────────────╯(func pid=8225) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2026-02-19_16-52-39/train_cifar_669ef_00006_6_batch_size=2,l1=256,l2=128,lr=0.0002_2026-02-19_16-52-39/checkpoint_000009)Trial train_cifar_669ef_00007 started with configuration:╭──────────────────────────────────────────────────╮│ Trial train_cifar_669ef_00007 config             │├──────────────────────────────────────────────────┤│ batch_size                                    16 ││ device                                      cuda ││ l1                                             1 ││ l2                                            64 ││ lr                                       0.00385 │╰──────────────────────────────────────────────────╯(func pid=8991) [1,  2000] loss: 2.025(func pid=8991) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2026-02-19_16-52-39/train_cifar_669ef_00007_7_batch_size=16,l1=1,l2=64,lr=0.0038_2026-02-19_16-52-39/checkpoint_000000)(func pid=8991) [2,  2000] loss: 1.913Trial train_cifar_669ef_00007 completed after 2 iterations at 2026-02-19 17:21:12. Total running time: 28min 32s╭────────────────────────────────────────────────────────────╮│ Trial train_cifar_669ef_00007 result                       │├────────────────────────────────────────────────────────────┤│ checkpoint_dir_name                      checkpoint_000001 ││ time_this_iter_s                                   8.90776 ││ time_total_s                                      19.77789 ││ training_iteration                                       2 ││ accuracy                                            0.2036 ││ loss                                               1.87426 │╰────────────────────────────────────────────────────────────╯(func pid=8991) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2026-02-19_16-52-39/train_cifar_669ef_00007_7_batch_size=16,l1=1,l2=64,lr=0.0038_2026-02-19_16-52-39/checkpoint_000001)Trial status: 8 TERMINATED | 2 PENDINGCurrent time: 2026-02-19 17:21:13. Total running time: 28min 33sLogical resource usage: 2.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:A10G)Current best trial: 669ef_00006 with loss=1.1248697945517254 and params={'l1': 256, 'l2': 128, 'lr': 0.00015245510744902382, 'batch_size': 2, 'device': 'cuda'}╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮│ Trial name                status         l1     l2            lr     batch_size     iter     total time (s)      loss     accuracy │├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤│ train_cifar_669ef_00000   TERMINATED     32      4   0.00732106               2       10           608.295    2.31656       0.0973 ││ train_cifar_669ef_00001   TERMINATED    128      8   0.0139934                8       10           164.411    2.30709       0.0997 ││ train_cifar_669ef_00002   TERMINATED      4    256   0.00049422              16       10            88.9943   1.27744       0.5276 ││ train_cifar_669ef_00003   TERMINATED      4     64   0.00662052              16       10            89.7402   1.50707       0.4545 ││ train_cifar_669ef_00004   TERMINATED      8      1   0.000384563              2        1            63.4715   2.30331       0.0992 ││ train_cifar_669ef_00005   TERMINATED      8     32   0.0489862                4        1            33.1477   2.32344       0.0991 ││ train_cifar_669ef_00006   TERMINATED    256    128   0.000152455              2       10           610.68     1.12487       0.6302 ││ train_cifar_669ef_00007   TERMINATED      1     64   0.00384703              16        2            19.7779   1.87426       0.2036 ││ train_cifar_669ef_00008   PENDING         1      4   0.073969                 4                                                    ││ train_cifar_669ef_00009   PENDING         1     32   0.00174425              16                                                    │╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯Trial train_cifar_669ef_00008 started with configuration:╭──────────────────────────────────────────────────╮│ Trial train_cifar_669ef_00008 config             │├──────────────────────────────────────────────────┤│ batch_size                                     4 ││ device                                      cuda ││ l1                                             1 ││ l2                                             4 ││ lr                                       0.07397 │╰──────────────────────────────────────────────────╯(func pid=9187) [1,  2000] loss: 2.347(func pid=9187) [1,  4000] loss: 1.173(func pid=9187) [1,  6000] loss: 0.781(func pid=9187) [1,  8000] loss: 0.586(func pid=9187) [2026-02-19 17:21:43,522 E 9187 9222] core_worker_process.cc:837: Failed to establish connection to the metrics exporter agent. Metrics will not be exported. Exporter agent status: RpcError: Running out of retries to initialize the metrics agent. rpc_code: 14Trial status: 8 TERMINATED | 1 RUNNING | 1 PENDINGCurrent time: 2026-02-19 17:21:43. Total running time: 29min 3sLogical resource usage: 2.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:A10G)Current best trial: 669ef_00006 with loss=1.1248697945517254 and params={'l1': 256, 'l2': 128, 'lr': 0.00015245510744902382, 'batch_size': 2, 'device': 'cuda'}╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮│ Trial name                status         l1     l2            lr     batch_size     iter     total time (s)      loss     accuracy │├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤│ train_cifar_669ef_00008   RUNNING         1      4   0.073969                 4                                                    ││ train_cifar_669ef_00000   TERMINATED     32      4   0.00732106               2       10           608.295    2.31656       0.0973 ││ train_cifar_669ef_00001   TERMINATED    128      8   0.0139934                8       10           164.411    2.30709       0.0997 ││ train_cifar_669ef_00002   TERMINATED      4    256   0.00049422              16       10            88.9943   1.27744       0.5276 ││ train_cifar_669ef_00003   TERMINATED      4     64   0.00662052              16       10            89.7402   1.50707       0.4545 ││ train_cifar_669ef_00004   TERMINATED      8      1   0.000384563              2        1            63.4715   2.30331       0.0992 ││ train_cifar_669ef_00005   TERMINATED      8     32   0.0489862                4        1            33.1477   2.32344       0.0991 ││ train_cifar_669ef_00006   TERMINATED    256    128   0.000152455              2       10           610.68     1.12487       0.6302 ││ train_cifar_669ef_00007   TERMINATED      1     64   0.00384703              16        2            19.7779   1.87426       0.2036 ││ train_cifar_669ef_00009   PENDING         1     32   0.00174425              16                                                    │╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯(func pid=9187) [1, 10000] loss: 0.469Trial train_cifar_669ef_00008 completed after 1 iterations at 2026-02-19 17:21:50. Total running time: 29min 10s╭────────────────────────────────────────────────────────────╮│ Trial train_cifar_669ef_00008 result                       │├────────────────────────────────────────────────────────────┤│ checkpoint_dir_name                      checkpoint_000000 ││ time_this_iter_s                                  33.70781 ││ time_total_s                                      33.70781 ││ training_iteration                                       1 ││ accuracy                                            0.0946 ││ loss                                               2.31397 │╰────────────────────────────────────────────────────────────╯(func pid=9187) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2026-02-19_16-52-39/train_cifar_669ef_00008_8_batch_size=4,l1=1,l2=4,lr=0.0740_2026-02-19_16-52-39/checkpoint_000000)Trial train_cifar_669ef_00009 started with configuration:╭──────────────────────────────────────────────────╮│ Trial train_cifar_669ef_00009 config             │├──────────────────────────────────────────────────┤│ batch_size                                    16 ││ device                                      cuda ││ l1                                             1 ││ l2                                            32 ││ lr                                       0.00174 │╰──────────────────────────────────────────────────╯(func pid=9320) [1,  2000] loss: 2.304Trial train_cifar_669ef_00009 completed after 1 iterations at 2026-02-19 17:22:05. Total running time: 29min 25s╭────────────────────────────────────────────────────────────╮│ Trial train_cifar_669ef_00009 result                       │├────────────────────────────────────────────────────────────┤│ checkpoint_dir_name                      checkpoint_000000 ││ time_this_iter_s                                  10.75416 ││ time_total_s                                      10.75416 ││ training_iteration                                       1 ││ accuracy                                            0.1018 ││ loss                                               2.30415 │╰────────────────────────────────────────────────────────────╯2026-02-19 17:22:05,367 INFO tune.py:1009 -- Wrote the latest version of all result files and experiment state to '/var/lib/ci-user/ray_results/train_cifar_2026-02-19_16-52-39' in 0.0068s.Trial status: 10 TERMINATEDCurrent time: 2026-02-19 17:22:05. Total running time: 29min 25sLogical resource usage: 2.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:A10G)Current best trial: 669ef_00006 with loss=1.1248697945517254 and params={'l1': 256, 'l2': 128, 'lr': 0.00015245510744902382, 'batch_size': 2, 'device': 'cuda'}╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮│ Trial name                status         l1     l2            lr     batch_size     iter     total time (s)      loss     accuracy │├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤│ train_cifar_669ef_00000   TERMINATED     32      4   0.00732106               2       10           608.295    2.31656       0.0973 ││ train_cifar_669ef_00001   TERMINATED    128      8   0.0139934                8       10           164.411    2.30709       0.0997 ││ train_cifar_669ef_00002   TERMINATED      4    256   0.00049422              16       10            88.9943   1.27744       0.5276 ││ train_cifar_669ef_00003   TERMINATED      4     64   0.00662052              16       10            89.7402   1.50707       0.4545 ││ train_cifar_669ef_00004   TERMINATED      8      1   0.000384563              2        1            63.4715   2.30331       0.0992 ││ train_cifar_669ef_00005   TERMINATED      8     32   0.0489862                4        1            33.1477   2.32344       0.0991 ││ train_cifar_669ef_00006   TERMINATED    256    128   0.000152455              2       10           610.68     1.12487       0.6302 ││ train_cifar_669ef_00007   TERMINATED      1     64   0.00384703              16        2            19.7779   1.87426       0.2036 ││ train_cifar_669ef_00008   TERMINATED      1      4   0.073969                 4        1            33.7078   2.31397       0.0946 ││ train_cifar_669ef_00009   TERMINATED      1     32   0.00174425              16        1            10.7542   2.30415       0.1018 │╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯Best trial config: {'l1': 256, 'l2': 128, 'lr': 0.00015245510744902382, 'batch_size': 2, 'device': 'cuda'}Best trial final validation loss: 1.1248697945517254Best trial final validation accuracy: 0.6302(func pid=9320) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2026-02-19_16-52-39/train_cifar_669ef_00009_9_batch_size=16,l1=1,l2=32,lr=0.0017_2026-02-19_16-52-39/checkpoint_000000)Best trial test set accuracy: 0.6197

Results#

Your Ray Tune trial summary output looks something like this. The texttable summarizes the validation performance of the trials and highlightsthe best hyperparameter configuration:

Numberoftrials:10/10(10TERMINATED)+-----+--------------+------+------+-------------+--------+---------+------------+|...|batch_size|l1|l2|lr|iter|loss|accuracy||-----+--------------+------+------+-------------+--------+---------+------------||...|2|1|256|0.000668163|1|2.31479|0.0977||...|4|64|8|0.0331514|1|2.31605|0.0983||...|4|2|1|0.000150295|1|2.30755|0.1023||...|16|32|32|0.0128248|10|1.66912|0.4391||...|4|8|128|0.00464561|2|1.7316|0.3463||...|8|256|8|0.00031556|1|2.19409|0.1736||...|4|16|256|0.00574329|2|1.85679|0.3368||...|8|2|2|0.00325652|1|2.30272|0.0984||...|2|2|2|0.000342987|2|1.76044|0.292||...|4|64|32|0.003734|8|1.53101|0.4761|+-----+--------------+------+------+-------------+--------+---------+------------+Besttrialconfig:{'l1':64,'l2':32,'lr':0.0037339984519545164,'batch_size':4}Besttrialfinalvalidationloss:1.5310075663924216Besttrialfinalvalidationaccuracy:0.4761Besttrialtestsetaccuracy:0.4737

Most trials stopped early to conserve resources. The best performingtrial achieved a validation accuracy of approximately 47%, which thetest set confirms.

Observability#

Monitoring is critical when running large-scale experiments. Rayprovides adashboardthat lets you view the status of your trials, check cluster resourceuse, and inspect logs in real time.

For debugging, Ray also offersdistributed debuggingtoolsthat let you attach a debugger to running trials across the cluster.

Conclusion#

In this tutorial, you learned how to tune the hyperparameters of aPyTorch model using Ray Tune. You saw how to integrate Ray Tune intoyour PyTorch training loop, define a search space for yourhyperparameters, use an efficient scheduler likeASHAScheduler toterminate low-performing trials early, save checkpoints and reportmetrics to Ray Tune, and run the hyperparameter search and analyze theresults.

Ray Tune makes it straightforward to scale your experiments from asingle machine to a large cluster, helping you find the best modelconfiguration efficiently.

Movatterモバイル変換