Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

POCA trainer#5005

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Merged
andrewcoh merged 289 commits intomainfromdevelop-coma2-trainer
Mar 12, 2021
Merged

POCA trainer#5005

andrewcoh merged 289 commits intomainfromdevelop-coma2-trainer
Mar 12, 2021

Conversation

@andrewcoh
Copy link
Contributor

@andrewcohandrewcoh commentedFeb 24, 2021
edited
Loading

Proposed change(s)

This PR adds the POCA trainer and associated tests. In addition it makes changes to the extrinsic reward provider to enable team-based rewards to work.

Useful links (Github issues, JIRA tickets, ML-Agents forum threads etc.)

PR for documentation - to be merged after this one#5056
Explanation of some of the design choices:

Types of change(s)

  • Bug fix
  • New feature
  • Code refactor
  • Breaking change
  • Documentation update
  • Other (please describe)

Checklist

  • Added tests that prove my fix is effective or that my feature works
  • Updated thechangelog (if applicable)
  • Updated thedocumentation (if applicable)
  • Updated themigration guide (if applicable)

Other comments

Ervin Tengand others added30 commitsDecember 15, 2020 11:35
andrewcohand others added7 commitsMarch 10, 2021 15:57
* simple rl multiagent env* runs but does not train* assemble terminal steps* seems to train* fix final reward* Merge changes* fix multiple discrete actions* Lots of small fixes for multiagent env* Fix just_died* Add simple RL tests* Add LSTM simple_rl for COMA* adding comments to multiagent rl* Address commentsCo-authored-by: Ervin Teng <ervin@unity3d.com>
erge branch 'develop-poca-trainer' into develop-coma2-trainer
@andrewcohandrewcoh changed the titleCOMA2 trainerPOCA trainerMar 10, 2021
Copy link
Contributor

@ervtengervteng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

LGTM, will wait for at least one more reviewer

Copy link
Contributor

@vincentpierrevincentpierre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I approve once all comments have been resolved.
Including those that were wrongfully marked as outdated by github.

)
returnvalue_outputs,critic_mem_out

defforward(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Remove this method. It has no reason to be public.

# Convert to tensors
current_obs= [ModelUtils.list_to_tensor(obs)forobsincurrent_obs]
group_obs=GroupObsUtil.from_buffer(batch,n_obs)
group_obs= [
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Some of my comments got lost. Please review them in the conversation tab :#5005 (comment)

andrewcoh reacted with thumbs up emoji
Comment on lines 28 to 32
if (
BufferKey.GROUPMATE_REWARDSinmini_batch
andBufferKey.GROUP_REWARDinmini_batch
):
ifself.add_groupmate_rewards:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Invert these 2 ifs. No need to check the first one if there are no groumaterewards

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

These if conditions could be better:

if self.add_groupmate_rewards and BufferKey.GROUPMATE_REWARDS in mini_batch : do the groupmate reward
if BufferKey.GROUP_REWARD in mini_batch : Do the group reward

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Updated

returnrsa,x_self_encoder

@staticmethod
defencode_observations(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Stand by my statement, make create_residual_self_attention a module with encode_observations its forward method

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Call it ObservationEncoder

Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

courtesty of@ervteng#5093

@andrewcohandrewcoh merged commitd63a9d7 intomainMar 12, 2021
@delete-merged-branchdelete-merged-branchbot deleted the develop-coma2-trainer branchMarch 12, 2021 01:48
@github-actionsgithub-actionsbot locked asresolvedand limited conversation to collaboratorsMar 12, 2022
Sign up for freeto subscribe to this conversation on GitHub. Already have an account?Sign in.

Reviewers

@dongruopingdongruopingAwaiting requested review from dongruoping

3 more reviewers

@ervtengervtengervteng approved these changes

@awjulianiawjulianiawjuliani left review comments

@vincentpierrevincentpierrevincentpierre approved these changes

Reviewers whose approvals may not affect merge requirements

Assignees

No one assigned

Labels

None yet

Projects

None yet

Milestone

No milestone

Development

Successfully merging this pull request may close these issues.

5 participants

@andrewcoh@chriselion@ervteng@awjuliani@vincentpierre

[8]ページ先頭

©2009-2025 Movatter.jp