NotificationsYou must be signed in to change notification settings
Fork5.2k
Star17.2k

Optimize asm size for the biggest corelib method - WebUtility..cctor#48906

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Jump to bottom

Merged

EgorBo merged 4 commits intodotnet:mainfromEgorBo:precalc-webutils-table

Mar 3, 2021

Merged

Optimize asm size for the biggest corelib method - WebUtility..cctor#48906

EgorBo merged 4 commits intodotnet:mainfromEgorBo:precalc-webutils-table

Mar 3, 2021

Conversation

Copy link

Member

EgorBo commentedMar 1, 2021•
edited
Loading

According to the jit-diff tool WebUtility..cctor is the largest method (with a noticeable delay to compile it) - it's triggered whenWebUtility.HtmlDecode public API is called.

This PR shrinkss_lookupTable codegen from 17605 bytes to~~10083~~ 277 bytes.

I could use source gen here since we already use them for eventpipe stuff but I don't think it worth the complexity.

Optimize asm size for the biggest corelib method - WebUtility..cctor

3b16266

ghost added the area-System.Net label

Mar 1, 2021

Copy link

ghost commentedMar 1, 2021

Tagging subscribers to this area: @dotnet/ncl
See info in area-owners.md if you want to be subscribed.

Issue Details

According to the jit-diff tool WebUtility..cctor is the largest method (with a noticeable delay to compile it) - it's triggered whenWebUtility.HtmlDecode public API is called.

This PR shrinkss_lookupTable codegen from to 17605 bytes to 10083 bytes which is still a lot for an immutable compile-time known dictionary - I wonder how we can improve the performance for such cases in general.

I could use source gen here since we already use them for eventpipe stuff but I don't think it worth the complexity.

Author:	EgorBo
Assignees:	-
Labels:	`area-System.Net`
Milestone:	-

Copy link

Member

MichalStrehovsky commentedMar 1, 2021•
edited
Loading

Splitting the data into two static ReadOnlySpans (one for keys, one for values) would probably be the most efficient, space-wise, and codegen-wise. Then just do a for loop, indexing into both spans and adding into dictionary.

Copy link

MemberAuthor

EgorBo commentedMar 1, 2021•
edited
Loading

Splitting the data into two static ReadOnlySpans (one for keys, one for values) would probably be the most efficient, space-wise, and codegen-wise. Then just do a for loop, indexing into both spans and adding into dictionary.

Nice idea, wonder if it's possible to hide those two spans under the hood of the source gen then and let the user to specify any type of such immutable dictionaries

Copy link

Member

MichalStrehovsky commentedMar 1, 2021

Ah, I just realized that the static ReadOnlySpan trick won't work because these areints (the static ReadOnlySpan currently only works for bytes due to endianness issues with anything bigger than that). Might as well just make them arrays. The storage for these will be similarly optimal but we'll have a useless allocation and a memcpy operation into the array.

Copy link

Member

stephentoub commentedMar 1, 2021

Yeah, we need#24961, if anyone wants to tackle that in the runtime and compiler. :-)

Copy link

Member

jkotas commentedMar 1, 2021

You can have just one Span of bytes with the data and do the endianness conversion yourself.

         ReadOnlySpan<byte> data = ...         var d = new Dictionary<long, char>(data.Length / (sizeof(long) + sizeof(char)));         while (!data.IsEmpty)         {              d[BinaryPrimitives.ReadInt64LittleEndian(data)] = (char)BinaryPrimitives.ReadInt16LittleEndian(data.Slice(sizeof(long)));              data = data.Slice(sizeof(long) + sizeof(char));         }

Copy link

Member

jkotas commentedMar 1, 2021•
edited
Loading

Or if you wanted to optimize the heck out of this, you can create a perfect or nearly perfect hash function for this set and use theReadOnlySpan<byte> directly without copying it into Dictionary. The lookup function would be then just:

         ReadOnlySpan<byte> data = ...         long key = ToUInt64Key(entity);         int index = (sizeof(long) + sizeof(char)) * PerfectHash(key);         if (BinaryPrimitives.ReadInt64LittleEndian(data.Slice(index)) == key) return BinaryPrimitives.ReadInt64LittleEndian(data.Slice(index + sizeof(long)));

danmoseley approved these changes

Mar 1, 2021

View reviewed changes

src/libraries/System.Private.CoreLib/src/System/Net/WebUtility.csShow resolvedHide resolved

Use Jan's snippet

afd4d25

danmoseley reviewed

Mar 1, 2021

View reviewed changes

src/libraries/System.Private.CoreLib/src/System/Net/WebUtility.cs

		0x73,0x6D,0x61,0x69,0x64,0x00,0x00,0x00/ToUInt64Key("diams")/,0x66,0x26,/'\x2666'/
		};

		vardictionary=newDictionary<ulong,char>(tableData.Length/(sizeof(ulong)+sizeof(char)));

Copy link

Member

danmoseleyMar 1, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I think Jan was suggesting not using Dictionary at all, and instead a perfect hash over a buffer.

If you use what you're doing here, looks like you can remove the byte in the 8th column as it's always zero.

Copy link

MemberAuthor

EgorBoMar 1, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Well, this code is only executed once, my goal was to decrease the native size (from17605 bytes to only277 bytes!) using the first Jan's suggestion. If you insist on the second more performant one I can change!