NotificationsYou must be signed in to change notification settings
Fork5.2k
Star17.2k

Don't box every single value type dictionary key.#46460

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Jump to bottom

Merged

layomia merged 1 commit intodotnet:mainfromlezzi:json-dict-key-boxing

Mar 3, 2021

Merged

Don't box every single value type dictionary key.#46460

layomia merged 1 commit intodotnet:mainfromlezzi:json-dict-key-boxing

Mar 3, 2021

Conversation

Copy link

Contributor

lezzi commentedDec 30, 2020•
edited
Loading

Deserialization of dictionaries fromStream always causes keys boxing for value types becauseReadStackFrame.DictionaryKey is an object. Saving key in the state (boxing) is only needed when value can't be fully read (Read call returnsfalse).

In the best-case scenario when an entire dictionary fits in a buffer - it should completely remove unnecessary heap allocations for keys.

Dotnet-GitSync-Bot added the area-System.Text.Json label

Dec 30, 2020

lezzi force-pushed thejson-dict-key-boxing branch 2 times, most recently fromda5acaa to43ef1d7Compare

December 30, 2020 14:11

lezzi marked this pull request as ready for review

December 30, 2020 16:00

lezzi requested review fromahsonkhan,eiriktsarpalis,jozkee,layomia andsteveharter ascode owners

December 30, 2020 16:00

Copy link

Contributor

ahsonkhan commentedDec 30, 2020

This is great. How did you detect it?

In the best-case scenario when an entire dictionary fits in a buffer - it should completely remove unnecessary heap allocations for keys.

Can we run some benchmarks to get a sense of how much heap allocations we save? It might help modifying and running this benchmark on a dictionary input that you expect to get performance wins on (with value type keys). Please share the benchmarkdotnet results when you have them:
https://github.com/dotnet/performance/blob/0b094042e7dc114c8cd038b1ae90dad0e5696fa2/src/benchmarks/micro/libraries/System.Text.Json/Serializer/ReadJson.cs#L57-L64

ahsonkhan reviewed

Dec 30, 2020

View reviewed changes

....Json/src/System/Text/Json/Serialization/Converters/Collection/DictionaryDefaultConverter.cs OutdatedShow resolvedHide resolved

Copy link

ContributorAuthor

lezzi commentedDec 31, 2020

This is great. How did you detect it?
In the best-case scenario when an entire dictionary fits in a buffer - it should completely remove unnecessary heap allocations for keys.
Can we run some benchmarks to get a sense of how much heap allocations we save? It might help modifying and running this benchmark on a dictionary input that you expect to get performance wins on (with value type keys). Please share the benchmarkdotnet results when you have them:
https://github.com/dotnet/performance/blob/0b094042e7dc114c8cd038b1ae90dad0e5696fa2/src/benchmarks/micro/libraries/System.Text.Json/Serializer/ReadJson.cs#L57-L64

I am currently working on the high-performance dynamodb library which involves some low-level optimized JSON deserialization. I was able to detect several inefficiencies during some internal benchmark comparisons.

Here is the benchmark result forDictionary<int, string> with100 items. I think the most important column for this case isAllocated, where we can see that the2KB difference disappears after the PR. It also becomes slightly faster.

BenchmarkDotNet=v0.12.1.1466-nightly,OS=Windows 10.0.19041.572 (2004/May2020Update/20H1)Intel Core i7-7820X CPU 3.60GHz (Kaby Lake), 1 CPU, 6 logical and 3 physical cores.NETSDK=6.0.100-alpha.1.20630.6  [Host]     : .NET 6.0.0 (6.0.20.62903), X64 RyuJIT  Job-NDNXAQ : .NET 6.0.0 (42.42.42.42424), X64 RyuJIT  Job-EVXCXL : .NET 6.0.0 (42.42.42.42424), X64 RyuJITPowerPlanMode=00000000-0000-0000-0000-000000000000Arguments=/p:DebugType=portableIterationTime=250.0000 msMaxIterationCount=20MinIterationCount=15WarmupCount=1

Method	Job	Toolchain	Mean	Error	StdDev	Median	Min	Max	Ratio	RatioSD	Gen 0	Gen 1	Gen 2	Allocated
DeserializeFromString	Job-NDNXAQ	master	19.96 μs	0.156 μs	0.138 μs	19.93 μs	19.77 μs	20.20 μs	1.00	0.00	3.2861	0.2347	-	19 KB
DeserializeFromString	Job-EVXCXL	PR	20.05 μs	0.304 μs	0.312 μs	19.94 μs	19.77 μs	20.84 μs	1.01	0.02	3.2727	0.1596	-	19 KB

DeserializeFromUtf8Bytes	Job-NDNXAQ	master	19.56 μs	0.232 μs	0.194 μs	19.54 μs	19.30 μs	19.97 μs	1.00	0.00	3.2396	0.1580	-	19 KB
DeserializeFromUtf8Bytes	Job-EVXCXL	PR	19.52 μs	0.138 μs	0.122 μs	19.52 μs	19.34 μs	19.78 μs	1.00	0.01	3.2233	0.1572	-	19 KB

DeserializeFromStream	Job-NDNXAQ	master	21.66 μs	0.187 μs	0.175 μs	21.58 μs	21.49 μs	22.17 μs	1.00	0.00	3.6509	0.2608	-	21 KB
DeserializeFromStream	Job-EVXCXL	PR	21.25 μs	0.201 μs	0.178 μs	21.27 μs	20.90 μs	21.57 μs	0.98	0.01	3.2669	0.1675	-	19 KB

It is also worth noting another potential optimization that I am considering to create a PR for. On the benchmark, you can see that deserialization fromStream is visibly slower than from bytes array even when entire JSON fits in a single buffer. I suspect it happens because we don't useUseFastPath for a single buffer case. But it is unrelated and probably not worth discussing in this thread.

Don't box every single value type dictionary key.

f2fb38f

During slow path deserialization fallback to storing dictionary keyas object only when Read returns false.

lezzi force-pushed thejson-dict-key-boxing branch from43ef1d7 tof2fb38fCompare

January 5, 2021 19:58

stephentoub added this to the6.0.0 milestone

Jan 15, 2021

stephentoub approved these changes

Jan 15, 2021

View reviewed changes

steveharter added the tenet-performancePerformance related issue label

Jan 15, 2021

steveharter approved these changes

Jan 15, 2021

View reviewed changes

Copy link

Contributor

steveharter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Thanks!

ericstj assignedlayomia

Feb 17, 2021

Copy link

Member

ericstj commentedFeb 17, 2021

Looks like we have a couple approvals: Is this ready to merge? Might want to get fresh test results (current are from Jan 5)

Base automatically changed frommaster tomain

March 1, 2021 09:07

layomia approved these changes

Mar 3, 2021

View reviewed changes

Copy link

Contributor

layomia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Thanks for the PR; this good to merge. The perf results shouldn't be stale given the changes.

It is also worth noting another potential optimization that I am considering to create a PR for. On the benchmark, you can see that deserialization from Stream is visibly slower than from bytes array even when entire JSON fits in a single buffer. I suspect it happens because we don't use UseFastPath for a single buffer case. But it is unrelated and probably not worth discussing in this thread.

We'd be happy to take a PR exploring this if we can detect a single buffer case.