- Notifications
You must be signed in to change notification settings - Fork5.2k
Enumerable.ToArray performance improvement using InlineArray#90459
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
Uh oh!
There was an error while loading.Please reload this page.
Conversation
use InlineArray byte[] sequence instead of LargeArrayBuilder
ghost commentedAug 12, 2023
Tagging subscribers to this area: @dotnet/area-system-collections Issue DetailsI've optimized the Benchmark
publicclassToArrayBenchmark{[Params(10,100,1000,10000,100000,1000000,10000000,100000000,1000000000)]publicintLength{get;set;}[Benchmark]publicList<int>ToList(){returnGenerateNumber(Length).ToList();}[Benchmark]publicint[]ToArray(){returnGenerateNumber(Length).ToArray();}[Benchmark]publicint[]PR_ToArray(){returnEnumerableHelpers.ToArray2(GenerateNumber(Length));}publicstaticIEnumerable<int>GenerateNumber(intcount){for(inti=0;i<count;i++){yieldreturni;}}} Implementation detailIn this scenario, there's a fixed maximum length for the list of arrays that should be allocated. OptionIt's also possible to use
|
| returnresult; | ||
| } | ||
| LargeArrayBuilder<T>builder=new(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Thanks. What would it take to augment your PR to the point where we could delete LargeArrayBuilder? That'd make me much more interested in this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
LargeArrayBuilder is used in many places, and given that it includes numerous methods specific to certain scenarios, the amount of rewrites, including from the user's side, would likely be extensive. Ideally, we'd like to replace it entirely, but we want to exclude that from the initial implementation.
Using the ArrayPool as suggested in the option might not be advisable if we replace the LargeArrayBuilder.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
My concern is that these code paths are used in a bunch of places on common paths, and adding more generic types will increase code size. It'd be nice to get rid of the LargeArrayBuilder so that it's a net win for both throughput and size, rather than improving throughput at the expense of size.
There's no rush here as it's not going to make .NET 8, anyway, so there's plenty of time to make and evaluate the larger change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Thank you.
If this implementation is acceptable as a basic policy,
I'll take on the challenge of replacing the LargeArrayBuilder.
(Indeed, I was hoping it might make it into .NET 8! That would be great!)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
If this implementation is acceptable as a basic policy,
We'd support it in principle. Would you like to build on top of this PR or would you prefer to open a new one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
I am willing to work on this PR.
NET 8 release seems to be coming soon, so I will do it from the branch there when it is released. ......
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
main is the correct branch. Any such change will be for .NET 9.
neuecc commentedAug 12, 2023
@dotnet-policy-service agree |
Windows10CE commentedAug 12, 2023 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
Your benchmarks show this being faster than ToList (almost) across the board, could it use the same (or similar) optimization? |
| publicT[]ToArray(intlastBlockCount) | ||
| { | ||
| T[]array=GC.AllocateUninitializedArray<T>(_count+lastBlockCount); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
AllocateUninitializedArray will introduce additional overhead for small arrays here.
reflectronicAug 13, 2023 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
The code forAllocateUninitializedArray already has a special case for small-ish arrays:
runtime/src/coreclr/System.Private.CoreLib/src/System/GC.CoreCLR.cs
Lines 756 to 762 in616f758
| // small arrays are allocated using `new[]` as that is generally faster. | |
| #pragma warning disable8500// sizeof of managed types | |
| if(length<2048/sizeof(T)) | |
| #pragma warning restore8500 | |
| { | |
| returnnewT[length]; | |
| } |
Given that
AllocateUninitializedArray is marked asAggressiveInlining, there's likely no discernable perf impact here (modulo tuning on the threshold).There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
when length(_count + lastBlockCount) == 0, returnArray.Empty<byte>().
| for(inti=0;i<_index;i++) | ||
| { | ||
| _blocks[i].CopyTo(dest); | ||
| dest=dest.Slice(_blocks[i].Length); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
| for(inti=0;i<_index;i++) | |
| { | |
| _blocks[i].CopyTo(dest); | |
| dest=dest.Slice(_blocks[i].Length); | |
| } | |
| ReadOnlySpan<T[]>blocks=_blocks; | |
| foreach(T[]blockinblocks.Slice(0,_index)) | |
| { | |
| block.CopyTo(dest); | |
| dest=dest.Slice(block.Length); | |
| } |
This should remove the bounds checks here too. Not sure whether this is the best syntax possible here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
useref var item = ref _blocks[i].
reflectronic commentedAug 13, 2023 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
Something that'd be neat to add here (though the feasibility is questionable) is a special case for small arrays (say, fewer than 32 elements) using an structArrayBuilder<T>{[InlineArray(32)]structElementsBlock{T_field;}[InlineArray(32)]structArraysBlock{T[]_field;}ElementsBlock_elements;ArraysBlock_arrays;// ...} Inside the implementation of The problem with this idea is that it'll chew through the stack for large structs. This could cause stack overflows when user-defined types come into the picture, even if the entire The solutions are not pretty. You'd have to defer the stack allocation of I'm curious to hear if there's a way to salvage this idea. |
ghost commentedNov 14, 2023
This pull request has been automatically marked |
ghost commentedNov 29, 2023
This pull request has been automatically marked |
ghost commentedDec 13, 2023
This pull request will now be closed since it had been marked |
I've optimized the
Enumerable.ToArraymethod for the case when the source is a pureIEnumerable<T>.Typically, when the buffer overflows, an array with double the previous capacity is created, and the copying process repeats.
In this PR, although we still create an array of double the size, instead of copying immediately, we add the array to a list and perform all the copying at the end.
By reducing the number of copy operations, we see a significant performance improvement.
Benchmark
Implementation detail
In this scenario, there's a fixed maximum length for the list of arrays that should be allocated.
Starting with 4 and doubling the array size each time, we reach the maximum length after 29 arrays.
Therefore, using
[InlineArray(29)], I've reserved a fixed-length space for theT[].Option
It's also possible to use
ArrayPool<T>.Shared.Rentinstead ofnew T[].In that case, by returning the array when calling ToArray, both copying and returning can be efficiently handled.
I wasn't sure if using ArrayPool was appropriate for this scenario, so in this PR, I've opted for creating new arrays.