Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Enumerable.ToArray performance improvement using InlineArray#90459

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Closed

Conversation

@neuecc
Copy link

I've optimized theEnumerable.ToArray method for the case when the source is a pureIEnumerable<T>.
Typically, when the buffer overflows, an array with double the previous capacity is created, and the copying process repeats.
In this PR, although we still create an array of double the size, instead of copying immediately, we add the array to a list and perform all the copying at the end.
By reducing the number of copy operations, we see a significant performance improvement.

Benchmark

BenchmarkDotNet v0.13.7, Windows 10 (10.0.19045.3324/22H2/2022Update)AMD Ryzen 9 5950X, 1 CPU, 32 logical and 16 physical cores.NET SDK 8.0.100-preview.7.23376.3  [Host]     : .NET 8.0.0 (8.0.23.37506), X64 RyuJIT AVX2  Job-NGJHUQ : .NET 8.0.0 (8.0.23.37506), X64 RyuJIT AVX2IterationCount=1  WarmupCount=1
MethodLengthMeanErrorGen0Gen1Gen2Allocated
ToList1053.12 nsNA0.0153--256 B
ToArray1071.83 nsNA0.0153--256 B
PR_ToArray1038.40 nsNA0.0119--200 B
ToList100220.14 nsNA0.0730--1224 B
ToArray100326.60 nsNA0.0710--1192 B
PR_ToArray100180.50 nsNA0.0644--1080 B
ToList10001,679.32 nsNA0.50540.0057-8464 B
ToArray10002,081.20 nsNA0.5074--8536 B
PR_ToArray10001,384.58 nsNA0.4978--8336 B
ToList1000017,214.66 nsNA7.84301.2817-131440 B
ToArray1000021,014.18 nsNA6.3171--106224 B
PR_ToArray1000013,196.72 nsNA6.3171--105872 B
ToList100000338,986.18 nsNA285.6445285.6445285.64451049112 B
ToArray100000382,722.36 nsNA249.5117249.5117249.5117925132 B
PR_ToArray100000358,964.21 nsNA249.5117249.5117249.5117924780 B
ToList10000004,533,538.28 nsNA1984.37501984.37501984.37508389748 B
ToArray10000003,067,601.56 nsNA796.8750796.8750796.87508195588 B
PR_ToArray10000002,418,546.88 nsNA800.7813800.7813800.78138195040 B
ToList10000000197,990,525.00 nsNA3875.00003875.00003875.0000134219664 B
ToArray10000000229,693,275.00 nsNA1875.00001875.00001875.0000107110743 B
PR_ToArray1000000029,184,353.12 nsNA1968.75001968.75001968.7500107110478 B
ToList100000000583,235,200.00 nsNA6000.00006000.00006000.00001073744896 B
ToArray100000000501,066,800.00 nsNA3000.00003000.00003000.0000936873600 B
PR_ToArray100000000270,801,200.00 nsNA2000.00002000.00002000.0000936872432 B
ToList10000000004,677,241,700.00 nsNA9000.00009000.00009000.00008589938744 B
ToArray10000000005,229,577,600.00 nsNA4000.00004000.00004000.00008294970392 B
PR_ToArray10000000003,136,817,900.00 nsNA4000.00004000.00004000.00008294969760 B
publicclassToArrayBenchmark{[Params(10,100,1000,10000,100000,1000000,10000000,100000000,1000000000)]publicintLength{get;set;}[Benchmark]publicList<int>ToList(){returnGenerateNumber(Length).ToList();}[Benchmark]publicint[]ToArray(){returnGenerateNumber(Length).ToArray();}[Benchmark]publicint[]PR_ToArray(){returnEnumerableHelpers.ToArray2(GenerateNumber(Length));}publicstaticIEnumerable<int>GenerateNumber(intcount){for(inti=0;i<count;i++){yieldreturni;}}}

Implementation detail

In this scenario, there's a fixed maximum length for the list of arrays that should be allocated.
Starting with 4 and doubling the array size each time, we reach the maximum length after 29 arrays.
Therefore, using[InlineArray(29)], I've reserved a fixed-length space for theT[].

Option

It's also possible to useArrayPool<T>.Shared.Rent instead ofnew T[].
In that case, by returning the array when calling ToArray, both copying and returning can be efficiently handled.
I wasn't sure if using ArrayPool was appropriate for this scenario, so in this PR, I've opted for creating new arrays.

longfin reacted with thumbs up emojieiriktsarpalis, slang25, jodydonetti, breadnone, SugoiDev, IDisposable, akhanalcs, aeb-dev, longfin, and SirIntruder reacted with heart emojiZJKung, NatsuDragonEye, henrikwidlund, neon-sunset, nietras, Tan90909090, noseratio, royvou, expcat, ShreyasJejurkar, and 15 more reacted with rocket emoji
use InlineArray byte[] sequence instead of LargeArrayBuilder
@ghostghost added area-System.Collections community-contributionIndicates that the PR has been added by a community member labelsAug 12, 2023
@ghost
Copy link

Tagging subscribers to this area: @dotnet/area-system-collections
See info inarea-owners.md if you want to be subscribed.

Issue Details

I've optimized theEnumerable.ToArray method for the case when the source is a pureIEnumerable<T>.
Typically, when the buffer overflows, an array with double the previous capacity is created, and the copying process repeats.
In this PR, although we still create an array of double the size, instead of copying immediately, we add the array to a list and perform all the copying at the end.
By reducing the number of copy operations, we see a significant performance improvement.

Benchmark

BenchmarkDotNet v0.13.7, Windows 10 (10.0.19045.3324/22H2/2022Update)AMD Ryzen 9 5950X, 1 CPU, 32 logical and 16 physical cores.NET SDK 8.0.100-preview.7.23376.3  [Host]     : .NET 8.0.0 (8.0.23.37506), X64 RyuJIT AVX2  Job-NGJHUQ : .NET 8.0.0 (8.0.23.37506), X64 RyuJIT AVX2IterationCount=1  WarmupCount=1
MethodLengthMeanErrorGen0Gen1Gen2Allocated
ToList1053.12 nsNA0.0153--256 B
ToArray1071.83 nsNA0.0153--256 B
PR_ToArray1038.40 nsNA0.0119--200 B
ToList100220.14 nsNA0.0730--1224 B
ToArray100326.60 nsNA0.0710--1192 B
PR_ToArray100180.50 nsNA0.0644--1080 B
ToList10001,679.32 nsNA0.50540.0057-8464 B
ToArray10002,081.20 nsNA0.5074--8536 B
PR_ToArray10001,384.58 nsNA0.4978--8336 B
ToList1000017,214.66 nsNA7.84301.2817-131440 B
ToArray1000021,014.18 nsNA6.3171--106224 B
PR_ToArray1000013,196.72 nsNA6.3171--105872 B
ToList100000338,986.18 nsNA285.6445285.6445285.64451049112 B
ToArray100000382,722.36 nsNA249.5117249.5117249.5117925132 B
PR_ToArray100000358,964.21 nsNA249.5117249.5117249.5117924780 B
ToList10000004,533,538.28 nsNA1984.37501984.37501984.37508389748 B
ToArray10000003,067,601.56 nsNA796.8750796.8750796.87508195588 B
PR_ToArray10000002,418,546.88 nsNA800.7813800.7813800.78138195040 B
ToList10000000197,990,525.00 nsNA3875.00003875.00003875.0000134219664 B
ToArray10000000229,693,275.00 nsNA1875.00001875.00001875.0000107110743 B
PR_ToArray1000000029,184,353.12 nsNA1968.75001968.75001968.7500107110478 B
ToList100000000583,235,200.00 nsNA6000.00006000.00006000.00001073744896 B
ToArray100000000501,066,800.00 nsNA3000.00003000.00003000.0000936873600 B
PR_ToArray100000000270,801,200.00 nsNA2000.00002000.00002000.0000936872432 B
ToList10000000004,677,241,700.00 nsNA9000.00009000.00009000.00008589938744 B
ToArray10000000005,229,577,600.00 nsNA4000.00004000.00004000.00008294970392 B
PR_ToArray10000000003,136,817,900.00 nsNA4000.00004000.00004000.00008294969760 B
publicclassToArrayBenchmark{[Params(10,100,1000,10000,100000,1000000,10000000,100000000,1000000000)]publicintLength{get;set;}[Benchmark]publicList<int>ToList(){returnGenerateNumber(Length).ToList();}[Benchmark]publicint[]ToArray(){returnGenerateNumber(Length).ToArray();}[Benchmark]publicint[]PR_ToArray(){returnEnumerableHelpers.ToArray2(GenerateNumber(Length));}publicstaticIEnumerable<int>GenerateNumber(intcount){for(inti=0;i<count;i++){yieldreturni;}}}

Implementation detail

In this scenario, there's a fixed maximum length for the list of arrays that should be allocated.
Starting with 4 and doubling the array size each time, we reach the maximum length after 29 arrays.
Therefore, using[InlineArray(29)], I've reserved a fixed-length space for theT[].

Option

It's also possible to useArrayPool<T>.Shared.Rent instead ofnew T[].
In that case, by returning the array when calling ToArray, both copying and returning can be efficiently handled.
I wasn't sure if using ArrayPool was appropriate for this scenario, so in this PR, I've opted for creating new arrays.

Author:neuecc
Assignees:-
Labels:

area-System.Collections

Milestone:-
NijomonJoseM reacted with thumbs up emoji

returnresult;
}

LargeArrayBuilder<T>builder=new();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Thanks. What would it take to augment your PR to the point where we could delete LargeArrayBuilder? That'd make me much more interested in this.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

LargeArrayBuilder is used in many places, and given that it includes numerous methods specific to certain scenarios, the amount of rewrites, including from the user's side, would likely be extensive. Ideally, we'd like to replace it entirely, but we want to exclude that from the initial implementation.

Using the ArrayPool as suggested in the option might not be advisable if we replace the LargeArrayBuilder.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

My concern is that these code paths are used in a bunch of places on common paths, and adding more generic types will increase code size. It'd be nice to get rid of the LargeArrayBuilder so that it's a net win for both throughput and size, rather than improving throughput at the expense of size.

There's no rush here as it's not going to make .NET 8, anyway, so there's plenty of time to make and evaluate the larger change.

jkotas and akhanalcs reacted with thumbs up emoji
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Thank you.
If this implementation is acceptable as a basic policy,
I'll take on the challenge of replacing the LargeArrayBuilder.
(Indeed, I was hoping it might make it into .NET 8! That would be great!)

akhanalcs reacted with thumbs up emoji

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

If this implementation is acceptable as a basic policy,

We'd support it in principle. Would you like to build on top of this PR or would you prefer to open a new one?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I am willing to work on this PR.
NET 8 release seems to be coming soon, so I will do it from the branch there when it is released. ......

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

main is the correct branch. Any such change will be for .NET 9.

neuecc reacted with thumbs up emoji
@neuecc
Copy link
Author

@dotnet-policy-service agree

@Windows10CE
Copy link
Contributor

Windows10CE commentedAug 12, 2023
edited
Loading

Your benchmarks show this being faster than ToList (almost) across the board, could it use the same (or similar) optimization?


publicT[]ToArray(intlastBlockCount)
{
T[]array=GC.AllocateUninitializedArray<T>(_count+lastBlockCount);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

AllocateUninitializedArray will introduce additional overhead for small arrays here.

Copy link
Contributor

@reflectronicreflectronicAug 13, 2023
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

The code forAllocateUninitializedArray already has a special case for small-ish arrays:

// small arrays are allocated using `new[]` as that is generally faster.
#pragma warning disable8500// sizeof of managed types
if(length<2048/sizeof(T))
#pragma warning restore8500
{
returnnewT[length];
}

Given thatAllocateUninitializedArray is marked asAggressiveInlining, there's likely no discernable perf impact here (modulo tuning on the threshold).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

when length(_count + lastBlockCount) == 0, returnArray.Empty<byte>().

Comment on lines +181 to +185
for(inti=0;i<_index;i++)
{
_blocks[i].CopyTo(dest);
dest=dest.Slice(_blocks[i].Length);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Suggested change
for(inti=0;i<_index;i++)
{
_blocks[i].CopyTo(dest);
dest=dest.Slice(_blocks[i].Length);
}
ReadOnlySpan<T[]>blocks=_blocks;
foreach(T[]blockinblocks.Slice(0,_index))
{
block.CopyTo(dest);
dest=dest.Slice(block.Length);
}

This should remove the bounds checks here too. Not sure whether this is the best syntax possible here.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

useref var item = ref _blocks[i].

@reflectronic
Copy link
Contributor

reflectronic commentedAug 13, 2023
edited
Loading

Something that'd be neat to add here (though the feasibility is questionable) is a special case for small arrays (say, fewer than 32 elements) using anInlineArray ofT. Something like:

structArrayBuilder<T>{[InlineArray(32)]structElementsBlock{T_field;}[InlineArray(32)]structArraysBlock{T[]_field;}ElementsBlock_elements;ArraysBlock_arrays;// ...}

Inside the implementation ofArrayBuilder, you'd first fill upElementsBlock before adding any new elements toArraysBlock. This eliminates all intermediate allocations for small arrays.

The problem with this idea is that it'll chew through the stack for large structs. This could cause stack overflows when user-defined types come into the picture, even if the entireElementsBlock is not filled.

The solutions are not pretty. You'd have to defer the stack allocation ofElementsBlock until you're sure that its size is reasonable. But, as far as I'm aware, the only way to reliably do this is with a non-inlineable method at every callsite ofArrayBuilder, plus a very unpleasant contortion of the callsite's actual logic. Doesn't seem very appealing.

I'm curious to hear if there's a way to salvage this idea.

@eiriktsarpaliseiriktsarpalis added this to the9.0.0 milestoneAug 13, 2023
@eiriktsarpaliseiriktsarpalis added the needs-author-actionAn issue or pull request that requires more info or actions from the author. labelOct 27, 2023
@ghostghost removed the needs-author-actionAn issue or pull request that requires more info or actions from the author. labelOct 31, 2023
@eiriktsarpaliseiriktsarpalis added the needs-author-actionAn issue or pull request that requires more info or actions from the author. labelOct 31, 2023
@ghostghost added the no-recent-activity labelNov 14, 2023
@ghost
Copy link

This pull request has been automatically markedno-recent-activity because it has not had any activity for 14 days. It will be closed if no further activity occurs within 14 more days. Any new comment (by anyone, not necessarily the author) will removeno-recent-activity.

@eiriktsarpaliseiriktsarpalis self-assigned thisNov 15, 2023
@ghostghost removed the no-recent-activity labelNov 15, 2023
@ghostghost added the no-recent-activity labelNov 29, 2023
@ghost
Copy link

This pull request has been automatically markedno-recent-activity because it has not had any activity for 14 days. It will be closed if no further activity occurs within 14 more days. Any new comment (by anyone, not necessarily the author) will removeno-recent-activity.

@ghost
Copy link

This pull request will now be closed since it had been markedno-recent-activity but received no further activity in the past 14 days. It is still possible to reopen or comment on the pull request, but please note that it will be locked if it remains inactive for another 30 days.

This pull request wasclosed.
Sign up for freeto subscribe to this conversation on GitHub. Already have an account?Sign in.

Reviewers

@stephentoubstephentoubstephentoub left review comments

@eiriktsarpaliseiriktsarpaliseiriktsarpalis left review comments

+2 more reviewers

@reflectronicreflectronicreflectronic left review comments

@MichalPetrykaMichalPetrykaMichalPetryka left review comments

Reviewers whose approvals may not affect merge requirements

Assignees

@eiriktsarpaliseiriktsarpalis

Labels

area-System.Collectionscommunity-contributionIndicates that the PR has been added by a community memberneeds-author-actionAn issue or pull request that requires more info or actions from the author.no-recent-activity

Projects

None yet

Milestone

9.0.0

Development

Successfully merging this pull request may close these issues.

6 participants

@neuecc@Windows10CE@reflectronic@stephentoub@eiriktsarpalis@MichalPetryka

[8]ページ先頭

©2009-2025 Movatter.jp