NotificationsYou must be signed in to change notification settings
Fork6.6k
Star38k

[RLlib] Optimize rnn_sequencing performance#46502

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Jump to bottom

Closed

cpnota wants to merge1 commit intoray-project:masterfromcpnota:master

Closed

[RLlib] Optimize rnn_sequencing performance#46502

cpnota wants to merge1 commit intoray-project:masterfromcpnota:master

+13 −8

Conversation

Copy link

cpnota commentedJul 9, 2024

Why are these changes needed?

We found the performance of LSTMs in Rllib to be extremely slow compared to other methods, with a single training iteration of PPO taking 179 seconds (compared to ~9 seconds with a similarly-sized MLP network). This made RNNs/LSTMs, as well as some transformer implementations, completely unusable for our purposes.

However, when profiling, we found this was primarily due to a very slow copy operation:

Further investigation revealed that most of this runtime was spent copying theinfos dict. We determined that the root cause was inconsistent handling of the dictionary inrnn_sequencing. While the non-recurrent implementation stores the list of dictionaries as a NumPy array of objects,rnn_sequencing instead stores it as a Python list:

We applied a one-line fix to make this behavior consistent and store the list as NumPy array:

# old and slowf_pad= [None]*length# new and fastf_pad=np.full([length],None,dtype=f.dtype)

This causes thecopy function to perform a shallow copy, drastically improving performance by ~6x to around 29 seconds:

However, we found that the training loop was still spending a lot of time in rnn_sequencing. We traced this down to a slow element-wise copy into an array. We instead replaced this with a vectorized copy:

# old and slowforseq_offsetinrange(len_):f_pad[seq_base+seq_offset]=f[i]i+=1# new and fastf_pad[seq_base :seq_base+len_]=f[i :i+len_]i+=len_

This further improved performance (in this sample, we also removed the 1-time summary logging, but did not remove this in this PR):

Combined, these changes reduced the run time of the training step from 179 seconds to 15 seconds, approximately a 12x speedup and competitive with training an MLP.

Related issue number

Checks

I've signed off every commit(by using the -s flag, i.e.,git commit -s) in this PR.
I've runscripts/format.sh to lint the changes in this PR.
I've included any doc changes needed forhttps://docs.ray.io/en/master/.
- I've added any new APIs to the API Reference. For example, if I added a
  method in Tune, I've added it indoc/source/tune/api/ under the
  corresponding.rst file.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures athttps://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

optimizing rnn_sequencing performance

c7ca406

cpnota requested review fromsven1977,ArturNiederfahrenhorst andsimonsays1980 ascode owners

July 9, 2024 14:28

cpnota changed the title~~Optimize rnn_sequencing performance~~[RLlib] Optimize rnn_sequencing performance

Jul 10, 2024

Copy link

stalebot commentedFeb 25, 2025

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions.

If you'd like to keep this open, just leave any comment, and the stale label will be removed.

stalebot added the staleThe issue is stale. It will be closed within 7 days unless there are further conversation label

Feb 25, 2025

jcotant1 added the rllibRLlib related issues label

Mar 26, 2025

stalebot removed the staleThe issue is stale. It will be closed within 7 days unless there are further conversation label

Mar 26, 2025

hainesmichaelc added the community-contributionContributed by the community label

Apr 4, 2025

Copy link

stalebot commentedMay 6, 2025

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions.