`nemo_rl.environments.math_environment`#

Module Contents#

Classes#

`MathEnvConfig`
`HFVerifyWorker`
`MultilingualMultichoiceVerifyWorker`
`EnglishMultichoiceVerifyWorker`
`MathEnvironmentMetadata`
`MathEnvironment`

Functions#

_mute_output

API#

classnemo_rl.environments.math_environment.MathEnvConfig#

Bases:typing.TypedDict

num_workers:int#: None

stop_strings:NotRequired[list[str]|None]#: None

verifier_type:NotRequired[str|None]#: None

math_verify_impl:NotRequired[str|None]#: None

nemo_rl.environments.math_environment._mute_output()#

classnemo_rl.environments.math_environment.HFVerifyWorker#

Initialization

verify(
pred_responses:list[str],
ground_truths:list[str],
return_extracted_answer:bool=False,
**kwargs,
)→Union[list[float],tuple[list[float],list[str|None]]]#

Verify the correctness of the predicted responses against the ground truth.

Parameters:

pred_responses – list[str]. The predicted responses from the LLM.
ground_truths – list[str]. The ground truth responses.

Returns:

Union[list[float], tuple[list[float], list[str | None]]].If return_extracted_answer is False, returns only the scores.If return_extracted_answer is True, returns (scores, extracted_answers).

classnemo_rl.environments.math_environment.MultilingualMultichoiceVerifyWorker#

verify(
pred_responses:list[str],
ground_truths:list[str],
return_extracted_answer:bool=False,
**kwargs,
)→Union[list[float],tuple[list[float],list[str|None]]]#

Verify the correctness of the predicted responses against the ground truth.

Parameters:

pred_responses – list[str]. The predicted responses from the LLM.
ground_truths – list[str]. The ground truth responses.

Returns:

Union[list[float], tuple[list[float], list[str | None]]].If return_extracted_answer is False, returns only the scores.If return_extracted_answer is True, returns (scores, extracted_answers).

classnemo_rl.environments.math_environment.EnglishMultichoiceVerifyWorker#

verify(
pred_responses:list[str],
ground_truths:list[str],
return_extracted_answer:bool=False,
**kwargs,
)→Union[list[float],tuple[list[float],list[str|None]]]#

Verify the correctness of the predicted responses against the ground truth.

Parameters:

pred_responses – list[str]. The predicted responses from the LLM.
ground_truths – list[str]. The ground truth responses.

Returns:

Union[list[float], tuple[list[float], list[str | None]]].If return_extracted_answer is False, returns only the scores.If return_extracted_answer is True, returns (scores, extracted_answers).

classnemo_rl.environments.math_environment.MathEnvironmentMetadata#

Bases:typing.TypedDict

ground_truth:str#: None

extracted_answer:str|None#: None

classnemo_rl.environments.math_environment.MathEnvironment( cfg:nemo_rl.environments.math_environment.MathEnvConfig, )#

Bases:nemo_rl.environments.interfaces.EnvironmentInterface[nemo_rl.environments.math_environment.MathEnvironmentMetadata]

shutdown()→None#

step( message_log_batch:list[nemo_rl.data.interfaces.LLMMessageLogType], metadata:list[nemo_rl.environments.math_environment.MathEnvironmentMetadata], return_extracted_answer:bool=False, )→nemo_rl.environments.interfaces.EnvironmentReturn[nemo_rl.environments.math_environment.MathEnvironmentMetadata]#

Runs a step in the math environment.

Parameters:

message_log – list[list[dict[str, str]]]. A batch of OpenAI-API-like message logs that represent interactions with the LLM.
metadata – list[MathEnvironmentMetadata]. The grader will use the ‘ground_truth’ key to evaluate correctness. The extracted answer will be stored to caculate cons@k.

Returns:

A tuple containing:- list[dict[str, str]]: Observations/responses batch- list[dict]: Updated metadata- list[str]: Next stop strings for the next turn- Tensor: Rewards tensor- Tensor: Done flags tensor

Return type:

EnvironmentReturn

global_post_process_and_metrics( batch:nemo_rl.distributed.batched_data_dict.BatchedDataDict[Any], )→tuple[nemo_rl.distributed.batched_data_dict.BatchedDataDict[Any],dict[str,float|int]]#

Computes metrics for this environment given a global rollout batch.

Every rank will run this function, so you’re free to use distributedcalculations if you’d prefer for heavy metrics.

On this page

Movatterモバイル変換

nemo_rl.environments.math_environment#

Module Contents#

Classes#

Functions#

API#

`nemo_rl.environments.math_environment`#