Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

[RLlib] Optimize rnn_sequencing performance#46502

Open
cpnota wants to merge1 commit intoray-project:master
base:master
Choose a base branch
Loading
fromcpnota:master

Conversation

cpnota
Copy link

Why are these changes needed?

We found the performance of LSTMs in Rllib to be extremely slow compared to other methods, with a single training iteration of PPO taking 179 seconds (compared to ~9 seconds with a similarly-sized MLP network). This made RNNs/LSTMs, as well as some transformer implementations, completely unusable for our purposes.

However, when profiling, we found this was primarily due to a very slow copy operation:

image

Further investigation revealed that most of this runtime was spent copying theinfos dict. We determined that the root cause was inconsistent handling of the dictionary inrnn_sequencing. While the non-recurrent implementation stores the list of dictionaries as a NumPy array of objects,rnn_sequencing instead stores it as a Python list:

image

We applied a one-line fix to make this behavior consistent and store the list as NumPy array:

# old and slowf_pad= [None]*length# new and fastf_pad=np.full([length],None,dtype=f.dtype)

This causes thecopy function to perform a shallow copy, drastically improving performance by ~6x to around 29 seconds:

image

However, we found that the training loop was still spending a lot of time in rnn_sequencing. We traced this down to a slow element-wise copy into an array. We instead replaced this with a vectorized copy:

# old and slowforseq_offsetinrange(len_):f_pad[seq_base+seq_offset]=f[i]i+=1# new and fastf_pad[seq_base :seq_base+len_]=f[i :i+len_]i+=len_

This further improved performance (in this sample, we also removed the 1-time summary logging, but did not remove this in this PR):

image

Combined, these changes reduced the run time of the training step from 179 seconds to 15 seconds, approximately a 12x speedup and competitive with training an MLP.

Related issue number

Checks

  • I've signed off every commit(by using the -s flag, i.e.,git commit -s) in this PR.
  • I've runscripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed forhttps://docs.ray.io/en/master/.
    • I've added any new APIs to the API Reference. For example, if I added a
      method in Tune, I've added it indoc/source/tune/api/ under the
      corresponding.rst file.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures athttps://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

PhilippWillms reacted with thumbs up emoji
@cpnotacpnota changed the titleOptimize rnn_sequencing performance[RLlib] Optimize rnn_sequencing performanceJul 10, 2024
@staleStale
Copy link

stalebot commentedFeb 25, 2025

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions.

  • If you'd like to keep this open, just leave any comment, and the stale label will be removed.

@stalestalebot added the staleThe issue is stale. It will be closed within 7 days unless there are further conversation labelFeb 25, 2025
@jcotant1jcotant1 added the rllibRLlib related issues labelMar 26, 2025
@stalestalebot removed the staleThe issue is stale. It will be closed within 7 days unless there are further conversation labelMar 26, 2025
@hainesmichaelchainesmichaelc added the community-contributionContributed by the community labelApr 4, 2025
Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment
Reviewers

@sven1977sven1977Awaiting requested review from sven1977sven1977 is a code owner

@ArturNiederfahrenhorstArturNiederfahrenhorstAwaiting requested review from ArturNiederfahrenhorst

@simonsays1980simonsays1980Awaiting requested review from simonsays1980simonsays1980 is a code owner

At least 1 approving review is required to merge this pull request.

Assignees
No one assigned
Labels
community-contributionContributed by the communityrllibRLlib related issues
Projects
None yet
Milestone
No milestone
Development

Successfully merging this pull request may close these issues.

3 participants
@cpnota@hainesmichaelc@jcotant1

[8]ページ先頭

©2009-2025 Movatter.jp