- Notifications
You must be signed in to change notification settings - Fork6.2k
Open
Description
What happened + What you expected to happen
InitializingImpalaTF2Policy
currently throws a ValueError sinceself.cur_lr
is a tf.Variable but the optimizer class only takes floats, LearningRateSchedules or callables.
File "site-packages\ray\rllib\algorithms\impala\impala_tf_policy.py", line 316, in __init__ self.maybe_initialize_optimizer_and_loss() File "site-packages\ray\rllib\policy\eager_tf_policy_v2.py", line 462, in maybe_initialize_optimizer_and_loss optimizers = force_list(self.optimizer()) ^^^^^^^^^^^^^^^^ File "site-packages\ray\rllib\algorithms\impala\impala_tf_policy.py", line 230, in optimizer optim = tf.keras.optimizers.Adam(self.cur_lr) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "site-packages\keras\src\optimizers\adam.py", line 62, in __init__ super().__init__( File "site-packages\keras\src\backend\tensorflow\optimizer.py", line 22, in __init__ super().__init__(*args, **kwargs) File "site-packages\keras\src\optimizers\base_optimizer.py", line 124, in __init__ raise ValueError(ValueError: Argument `learning_rate` should be float, or an instance of LearningRateSchedule, or a callable (that takes in the current iteration value and returns the corresponding learning rate value). Received instead: learning_rate=<tf.Variable 'lr:0' shape=() dtype=float32, numpy=0.0005>
Versions / Dependencies
Ray == 2.10.0
Python == 3.11.9
OS == Win10
Tensorflow == 2.16.1
Reproduction script
from ray.rllib.algorithms.impala.impala import ImpalaConfigfrom ray.rllib.algorithms.impala.impala_tf_policy import ImpalaTF2Policyimport gymnasium as gymobs_space = gym.spaces.Box(high=1, low=-1, shape=(10,))action_space = gym.spaces.Box(high=1, low=-1, shape=(5,))config = ImpalaConfig()config.framework_str = "tf2"policy = ImpalaTF2Policy(obs_space, action_space, config)
Issue Severity
High: It blocks me from completing my task.