nemo_rl.environments.math_environment#
Module Contents#
Classes#
Functions#
API#
- classnemo_rl.environments.math_environment.MathEnvConfig#
Bases:
typing.TypedDict- num_workers:int#
None
- stop_strings:NotRequired[list[str]|None]#
None
- verifier_type:NotRequired[str|None]#
None
- math_verify_impl:NotRequired[str|None]#
None
- nemo_rl.environments.math_environment._mute_output()#
- classnemo_rl.environments.math_environment.HFVerifyWorker#
Initialization
- verify(
- pred_responses:list[str],
- ground_truths:list[str],
- return_extracted_answer:bool=False,
- **kwargs,
Verify the correctness of the predicted responses against the ground truth.
- Parameters:
pred_responses – list[str]. The predicted responses from the LLM.
ground_truths – list[str]. The ground truth responses.
- Returns:
Union[list[float], tuple[list[float], list[str | None]]].If return_extracted_answer is False, returns only the scores.If return_extracted_answer is True, returns (scores, extracted_answers).
- classnemo_rl.environments.math_environment.MultilingualMultichoiceVerifyWorker#
- verify(
- pred_responses:list[str],
- ground_truths:list[str],
- return_extracted_answer:bool=False,
- **kwargs,
Verify the correctness of the predicted responses against the ground truth.
- Parameters:
pred_responses – list[str]. The predicted responses from the LLM.
ground_truths – list[str]. The ground truth responses.
- Returns:
Union[list[float], tuple[list[float], list[str | None]]].If return_extracted_answer is False, returns only the scores.If return_extracted_answer is True, returns (scores, extracted_answers).
- classnemo_rl.environments.math_environment.EnglishMultichoiceVerifyWorker#
- verify(
- pred_responses:list[str],
- ground_truths:list[str],
- return_extracted_answer:bool=False,
- **kwargs,
Verify the correctness of the predicted responses against the ground truth.
- Parameters:
pred_responses – list[str]. The predicted responses from the LLM.
ground_truths – list[str]. The ground truth responses.
- Returns:
Union[list[float], tuple[list[float], list[str | None]]].If return_extracted_answer is False, returns only the scores.If return_extracted_answer is True, returns (scores, extracted_answers).
- classnemo_rl.environments.math_environment.MathEnvironmentMetadata#
Bases:
typing.TypedDict- ground_truth:str#
None
- extracted_answer:str|None#
None
- classnemo_rl.environments.math_environment.MathEnvironment( )#
Bases:
nemo_rl.environments.interfaces.EnvironmentInterface[nemo_rl.environments.math_environment.MathEnvironmentMetadata]- shutdown()→None#
- step(
- message_log_batch:list[nemo_rl.data.interfaces.LLMMessageLogType],
- metadata:list[nemo_rl.environments.math_environment.MathEnvironmentMetadata],
- return_extracted_answer:bool=False,
Runs a step in the math environment.
- Parameters:
message_log – list[list[dict[str, str]]]. A batch of OpenAI-API-like message logs that represent interactions with the LLM.
metadata – list[MathEnvironmentMetadata]. The grader will use the ‘ground_truth’ key to evaluate correctness. The extracted answer will be stored to caculate cons@k.
- Returns:
A tuple containing:- list[dict[str, str]]: Observations/responses batch- list[dict]: Updated metadata- list[str]: Next stop strings for the next turn- Tensor: Rewards tensor- Tensor: Done flags tensor
- Return type:
- global_post_process_and_metrics( )→tuple[nemo_rl.distributed.batched_data_dict.BatchedDataDict[Any],dict[str,float|int]]#
Computes metrics for this environment given a global rollout batch.
Every rank will run this function, so you’re free to use distributedcalculations if you’d prefer for heavy metrics.