NotificationsYou must be signed in to change notification settings
Fork6.5k
Star37.8k

RLlib: dist_class is missed while I try to use Policy.learn_on_batch() #47011

Open

RLlib: dist_class is missed while I try to use Policy.learn_on_batch()#47011

Labels

P3Issue moderate in impact or severitybugSomething that is supposed to be working; but isn'trllibRLlib related issueswindows

Description

hazelyu0215

opened

on Aug 8, 2024

What happened + What you expected to happen

I'm trying to train a policy by using a batch collected from two envs. I defined 2 trainers to collect data from each env, and concat the data collected from both envs. I used policy.learn_on_batch(batch) but it shows me the following error:
TypeError: 'NoneType' object is not callable

The above exception was the direct cause of the following exception:

File "C:\Users\me\Documents\my_proj\mix_environment.py", line 197, in
trainer_1.workers.local_worker().learn_on_batch(minibatch)
ValueError: 'NoneType' object is not callable
tracebackTraceback (most recent call last):
File "c:\Users\me\Documents\my_proj\env\lib\site-packages\ray\rllib\policy\torch_policy_v2.py", line 1367, in _worker
self.loss(model, self.dist_class, sample_batch)
File "c:\Users\me\Documents\my_proj\env\lib\site-packages\ray\rllib\algorithms\ppo\ppo_torch_policy.py", line 85, in loss
curr_action_dist = dist_class(logits, model)
TypeError: 'NoneType' object is not callable

In tower 0 on device cpu

I think the problem might caused by the missing dist_class since I check the dist_class of my policy is None. However, I have no idea how I can define a dist_class or get the default dist_class.

How I define the trainer:
config_1 = PPOConfig()
config_1["env"] = "tilt_with_noise"
config_1["env_config"] = ENV_CONFIG_1
config_1["num_rollout_workers"] = 2
config_1["train_batch_size"] = noisy_data
config_1["sgd_minibatch_size"] = 16
config_1["gamma"] = 0.25
config_1["lr"] = 0.001
# config_1["create_env_on_driver"] = True
config_1["policies"] = {f"policy_{i}": (None, observation_space, action_space, {}) for i in range(21)}
config_1["policy_mapping_fn"] = policy_mapping_fn
trainer_1 = PPO(env="tilt_with_noise", config=config_1)

Versions / Dependencies

Ray == 2.8.1,
Python == 3.9.13
OS == Windows 11

Reproduction script

config_1 = PPOConfig()
config_1["env"] = "tilt_with_noise"
config_1["env_config"] = ENV_CONFIG_1
config_1["num_rollout_workers"] = 2
config_1["train_batch_size"] = noisy_data
config_1["sgd_minibatch_size"] = 16
config_1["gamma"] = 0.25
config_1["lr"] = 0.001
# config_1["create_env_on_driver"] = True
config_1["policies"] = {f"policy_{i}": (None, observation_space, action_space, {}) for i in range(21)}
config_1["policy_mapping_fn"] = policy_mapping_fn
trainer_1 = PPO(env="tilt_with_noise", config=config_1)

Another trainer is almost the same

sync_policy_weights(trainer_1, trainer)sample_with_noise = collect_samples(trainer_1)sample_no_noise = collect_samples(trainer)mixed_batch = concat_samples([sample_with_noise, sample_no_noise])minibatch = sample_minibatch(mixed_batch, 16)results = {}for policy_id, batch in minibatch.policy_batches.items():    policy = trainer_1.workers.local_worker().policy_map[policy_id]    results[policy_id] = policy.learn_on_batch(batch)

Issue Severity

Low: It annoys or frustrates me.

Metadata

Assignees

No one assigned

Labels

P3Issue moderate in impact or severitybugSomething that is supposed to be working; but isn'trllibRLlib related issueswindows

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

RLlib: dist_class is missed while I try to use Policy.learn_on_batch() #47011

Description

What happened + What you expected to happen

Versions / Dependencies

Reproduction script

Another trainer is almost the same

Issue Severity

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions