Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

RLlib: dist_class is missed while I try to use Policy.learn_on_batch() #47011

Open
Labels
P3Issue moderate in impact or severitybugSomething that is supposed to be working; but isn'trllibRLlib related issueswindows
@hazelyu0215

Description

@hazelyu0215

What happened + What you expected to happen

I'm trying to train a policy by using a batch collected from two envs. I defined 2 trainers to collect data from each env, and concat the data collected from both envs. I used policy.learn_on_batch(batch) but it shows me the following error:
TypeError: 'NoneType' object is not callable

The above exception was the direct cause of the following exception:

File "C:\Users\me\Documents\my_proj\mix_environment.py", line 197, in
trainer_1.workers.local_worker().learn_on_batch(minibatch)
ValueError: 'NoneType' object is not callable
tracebackTraceback (most recent call last):
File "c:\Users\me\Documents\my_proj\env\lib\site-packages\ray\rllib\policy\torch_policy_v2.py", line 1367, in _worker
self.loss(model, self.dist_class, sample_batch)
File "c:\Users\me\Documents\my_proj\env\lib\site-packages\ray\rllib\algorithms\ppo\ppo_torch_policy.py", line 85, in loss
curr_action_dist = dist_class(logits, model)
TypeError: 'NoneType' object is not callable

In tower 0 on device cpu

I think the problem might caused by the missing dist_class since I check the dist_class of my policy is None. However, I have no idea how I can define a dist_class or get the default dist_class.

How I define the trainer:
config_1 = PPOConfig()
config_1["env"] = "tilt_with_noise"
config_1["env_config"] = ENV_CONFIG_1
config_1["num_rollout_workers"] = 2
config_1["train_batch_size"] = noisy_data
config_1["sgd_minibatch_size"] = 16
config_1["gamma"] = 0.25
config_1["lr"] = 0.001
# config_1["create_env_on_driver"] = True
config_1["policies"] = {f"policy_{i}": (None, observation_space, action_space, {}) for i in range(21)}
config_1["policy_mapping_fn"] = policy_mapping_fn
trainer_1 = PPO(env="tilt_with_noise", config=config_1)

Versions / Dependencies

Ray == 2.8.1,
Python == 3.9.13
OS == Windows 11

Reproduction script

config_1 = PPOConfig()
config_1["env"] = "tilt_with_noise"
config_1["env_config"] = ENV_CONFIG_1
config_1["num_rollout_workers"] = 2
config_1["train_batch_size"] = noisy_data
config_1["sgd_minibatch_size"] = 16
config_1["gamma"] = 0.25
config_1["lr"] = 0.001
# config_1["create_env_on_driver"] = True
config_1["policies"] = {f"policy_{i}": (None, observation_space, action_space, {}) for i in range(21)}
config_1["policy_mapping_fn"] = policy_mapping_fn
trainer_1 = PPO(env="tilt_with_noise", config=config_1)

Another trainer is almost the same

sync_policy_weights(trainer_1, trainer)sample_with_noise = collect_samples(trainer_1)sample_no_noise = collect_samples(trainer)mixed_batch = concat_samples([sample_with_noise, sample_no_noise])minibatch = sample_minibatch(mixed_batch, 16)results = {}for policy_id, batch in minibatch.policy_batches.items():    policy = trainer_1.workers.local_worker().policy_map[policy_id]    results[policy_id] = policy.learn_on_batch(batch)

Issue Severity

Low: It annoys or frustrates me.

Metadata

Metadata

Assignees

No one assigned

    Labels

    P3Issue moderate in impact or severitybugSomething that is supposed to be working; but isn'trllibRLlib related issueswindows

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions


      [8]ページ先頭

      ©2009-2025 Movatter.jp