- Notifications
You must be signed in to change notification settings - Fork6.5k
Description
What happened + What you expected to happen
I'm trying to train a policy by using a batch collected from two envs. I defined 2 trainers to collect data from each env, and concat the data collected from both envs. I used policy.learn_on_batch(batch) but it shows me the following error:
TypeError: 'NoneType' object is not callable
The above exception was the direct cause of the following exception:
File "C:\Users\me\Documents\my_proj\mix_environment.py", line 197, in
trainer_1.workers.local_worker().learn_on_batch(minibatch)
ValueError: 'NoneType' object is not callable
tracebackTraceback (most recent call last):
File "c:\Users\me\Documents\my_proj\env\lib\site-packages\ray\rllib\policy\torch_policy_v2.py", line 1367, in _worker
self.loss(model, self.dist_class, sample_batch)
File "c:\Users\me\Documents\my_proj\env\lib\site-packages\ray\rllib\algorithms\ppo\ppo_torch_policy.py", line 85, in loss
curr_action_dist = dist_class(logits, model)
TypeError: 'NoneType' object is not callable
In tower 0 on device cpu
I think the problem might caused by the missing dist_class since I check the dist_class of my policy is None. However, I have no idea how I can define a dist_class or get the default dist_class.
How I define the trainer:
config_1 = PPOConfig()
config_1["env"] = "tilt_with_noise"
config_1["env_config"] = ENV_CONFIG_1
config_1["num_rollout_workers"] = 2
config_1["train_batch_size"] = noisy_data
config_1["sgd_minibatch_size"] = 16
config_1["gamma"] = 0.25
config_1["lr"] = 0.001
# config_1["create_env_on_driver"] = True
config_1["policies"] = {f"policy_{i}": (None, observation_space, action_space, {}) for i in range(21)}
config_1["policy_mapping_fn"] = policy_mapping_fn
trainer_1 = PPO(env="tilt_with_noise", config=config_1)
Versions / Dependencies
Ray == 2.8.1,
Python == 3.9.13
OS == Windows 11
Reproduction script
config_1 = PPOConfig()
config_1["env"] = "tilt_with_noise"
config_1["env_config"] = ENV_CONFIG_1
config_1["num_rollout_workers"] = 2
config_1["train_batch_size"] = noisy_data
config_1["sgd_minibatch_size"] = 16
config_1["gamma"] = 0.25
config_1["lr"] = 0.001
# config_1["create_env_on_driver"] = True
config_1["policies"] = {f"policy_{i}": (None, observation_space, action_space, {}) for i in range(21)}
config_1["policy_mapping_fn"] = policy_mapping_fn
trainer_1 = PPO(env="tilt_with_noise", config=config_1)
Another trainer is almost the same
sync_policy_weights(trainer_1, trainer)sample_with_noise = collect_samples(trainer_1)sample_no_noise = collect_samples(trainer)mixed_batch = concat_samples([sample_with_noise, sample_no_noise])minibatch = sample_minibatch(mixed_batch, 16)results = {}for policy_id, batch in minibatch.policy_batches.items(): policy = trainer_1.workers.local_worker().policy_map[policy_id] results[policy_id] = policy.learn_on_batch(batch)
Issue Severity
Low: It annoys or frustrates me.