This repository was archived by the owner on Jan 23, 2023. It is now read-only.

dotnet/corefxPublic archive

NotificationsYou must be signed in to change notification settings
Fork4.9k
Star17.6k

Improve performance of WebUtility.Decode for non-escaped strings#7671

Merged

davidsh merged 4 commits intodotnet:masterfromhughbe:webutility-decode

Apr 15, 2016

Merged

Improve performance of WebUtility.Decode for non-escaped strings#7671

davidsh merged 4 commits intodotnet:masterfromhughbe:webutility-decode

Apr 15, 2016

Conversation

Copy link

hughbe commentedApr 12, 2016

Cuts allocations down to 1/3 of the original
Doubles performance (in time)
We still allocate a little bit, as removing all allocations would harm the performance of strings that actually need escaping

Benchmark - escaping needed

No performance regressions

Benchmark - no escaping needed

Benchmark Code

Click here

static void Main(string[] args){    // Escaping    TimeAction("Old: ", () => Old.UrlDecode("%ABabc"));    TimeAction("New: ", () => New.UrlDecode("%ABabc"));    // No escaping    TimeAction("Old: ", () => Old.UrlDecode("abc"));    TimeAction("New: ", () => New.UrlDecode("abc"));    Console.ReadLine();}public static void TimeAction(string prefix, Action action){    var sw = new Stopwatch();    for (int iter = 0; iter < 5; iter++)    {        int gen0 = GC.CollectionCount(0);        sw.Restart();        for (int i = 0; i < 10000000; i++)        {            action();        }        sw.Stop();        Console.WriteLine($"{prefix}Time: {sw.Elapsed.TotalSeconds}\tGC0: {GC.CollectionCount(0) - gen0}");    }}

/cc@stephentoub @jamesqo @davidsh
Fixes #6542 together with#7546

Improve performance of WebUtility.Decode for non-escaped strings

8cb7fc4

- Cuts allocations down to 1/3 of the original- Doubles performance (in time)- We still allocate a little bit, as removing all allocations would harmthe performance of strings that actually need escapingFixes #6542

dnfclas added the cla-already-signed label

Apr 12, 2016

hughbe changed the title~~Improve performance of WebUtility.Decode for non-escaped string~~Improve performance of WebUtility.Decode for non-escaped strings

Apr 12, 2016

stephentoub reviewed

Apr 12, 2016

View reviewed changes

src/System.Runtime.Extensions/src/System/Net/WebUtility.cs Outdated

		privatestructUrlDecoder
		{
		publicbool_containsUnsafe;
		publicbool_containsSpaces;

Copy link

Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Why store these as fields on the UrlDecoder? Seems like they could just be a single localneedsDecoding, and then instead of doingreturn helper.GetString();, you'd doreturn needsDecoding ? helper.GetString() : value;.

Copy link

Author

hughbeApr 12, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I agree with moving them out of UrlDecoder.
I would like to keep the fieldscontainsUnsafe andcontainsSpaces as separate variables, as I'm working on a PR that's gonna heavily optimizeUrlEncode andUrlDecode for strings that only need space encoding/decoding. Is that OK?

Copy link

Member

stephentoubApr 12, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I'd prefer it if you kept them as a single local now. If a subsequent change needs them separated, then that change can do it. Otherwise, if for example that change never materializes or gets merged, we're left with debt.

Copy link

Author

hughbeApr 12, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Sure thing!

davidsh added the area-System.Net label

Apr 12, 2016

stephentoub reviewed

Apr 12, 2016

View reviewed changes

src/System.Runtime.Extensions/src/System/Net/WebUtility.cs


		_charBuffer=newchar[bufferSize];

		_charBuffer=null;//char buffer created on demand

Copy link

Member

stephentoubApr 12, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Is this really useful? Empty string is already special cased to avoid decoding. So isn't this only valuable in the case where all of the inputs are bytes rather than chars, and to enable that you're then doing an extra null check branch on every AddChar? And even then all of the bytes except for those at the end will end up getting dumped into the char[], forcing it to be allocated. Doesn't seem like a good tradeoff.

Copy link

Author

hughbeApr 12, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Well I found that this halved allocations for theUrlDecode("noescaping") case, as it seems that ASCII chars are added to the decoder asAddByte not asAddChar. I found this behaviour weird, but lazilly instantiating_charBuffer does help in the no escaping case.

Copy link

Author

hughbeApr 12, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Just a follow up: its due tothis line of code.

I don't really understand why "// 7 bit have to go as bytes because of Unicode" but hey

Copy link

Member

stephentoubApr 12, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I see, ok. (Seems strange then that we're lazily allocating the byte[] array; doing so suggests we don't expect any ASCII chars in common input, but let's not change it.)

Copy link

Author

hughbeApr 12, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

understood - obviously I'm not planning to change any of that behaviour in this PR, but do you have a theory as to why we add ascii chars as bytes not chars?

Copy link

Member

stephentoubApr 12, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

right but they allocate the array only when needed

No. In the code before this PR, the char[] is always allocated (unless the string is empty).

Copy link

Member

tarekghApr 12, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

right. looking more, it looks when having ascii characters, it needs to be encoded by the encoder. bu tif the character is not ascii will be stored as it is without encoding. so I believe what they are doing is collecting ascii charcaters in the array, and everytime need to flush it, will run the encoder just once on the array before the flush. so it is just optimizing to not calling the encoder with every ascii character.

Copy link

Author

hughbeApr 12, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

so it is just optimizing to not calling the encoder with every ascii character.

but doesn't the implementation check if there are any bytes in the buffer, and if there are none in the buffer it doesn't flush any bytes
So that means that each ascii character wouldn't flush any bytes, only the first in a string of ascii chars

Copy link

Member

tarekghApr 12, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

UrlDecoder is collecting the ascii bytes till either AddChar or GetString get called. at that time it will flush the bytes after encoding it. and then will start collecting the new encountered bytes again and so on.
imagine you have string like aaaaaNaaaaNaaaaNaaa
where a is ascii character and N is not ascii, this will have the byte array get filled 4 times and encoded 4 times. if you didn't collect the ascii in byte array as the code is doing, this mean you'll need to call the encoder the number of 'a' characters in the string.

Copy link

Author

hughbeApr 12, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

understood, thanks

Address PR feedback

5f256f3

Copy link

Author

hughbe commentedApr 12, 2016

Test Innerloop CentOS7.1 Release Build and Test
Test Innerloop Windows_NT Debug Build and Test

Copy link

Member

stephentoub commentedApr 12, 2016

@dotnet-bot test code coverage please

Copy link

Author

hughbe commentedApr 12, 2016

@stephentoub is this new/can non-Microsoft people use that code coverage feature

Copy link

Member

stephentoub commentedApr 12, 2016

is this new

nope

can non-Microsoft people use that code coverage feature

yup

Copy link

Author

hughbe commentedApr 12, 2016

nope

That's cool, I've never noticed that before. Is the idea that it fails if there is an overall decrease in coverage or does it just publish a report? (I guess we'll find out!)

Copy link

Member

stephentoub commentedApr 12, 2016

Is the idea that it fails if there is an overall decrease in coverage or does it just publish a report? (I guess we'll find out!)

It doesn't perform any comparison, but it archives the report so you can click through to view it.

Invalid percent encoded sequences don't need decodiign

ad9ec8b

stephentoub reviewed

Apr 13, 2016

View reviewed changes

src/System.Runtime.Extensions/src/System/Net/WebUtility.cs Outdated

		{
		// No decoding needed
		returnvalue;
		}

Copy link

Member

stephentoubApr 13, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

If it were me, I'd have written this like:

returnneedsDecoding?helper.GetString():value;

Copy link

Author

hughbe commentedApr 13, 2016

Thanks, I agree its a good idea to add some more tests to cover all the branches I've changed. I also added some more tests for invalid percent encoding, lowercase hex, space escaping and non-ASCII chars.
This brings coverage up to 100% for UrlDecodeInternal and almost 100% for UrlDecoder - I think the missing coverage may be dead code (line 683)

Add some more tests to WebUtility

9c7efbf

stephentoub added the netfx-port-consider label

Apr 13, 2016

AlexGhiondea added netfx-port-consider and removed netfx-port-consider labels

Apr 13, 2016

Copy link

Member

stephentoub commentedApr 15, 2016

LGTM

@davidsh?

Copy link

Contributor

davidsh commentedApr 15, 2016

LGTM.

Copy link

Contributor

davidsh commentedApr 15, 2016

CI Failures:
http://dotnet-ci.cloudapp.net/job/dotnet_corefx/job/master/job/windows_nt_debug_prtest/2/

(SslStream)

Copy link

Contributor

davidsh commentedApr 15, 2016•
edited
Loading

@dotnet-bot Test Innerloop Windows_NT Debug Build and Test

Copy link

Contributor

davidsh commentedApr 15, 2016

@dotnet-bot Test Innerloop CentOS7.1 Release Build and Test

davidsh merged commitbe2360b intodotnet:master

Apr 15, 2016

Copy link

Author

hughbe commentedApr 16, 2016

Glad to see this merged. Should I send over a similar PR to coreclr, or will this be automatically handled by the netfx-port-consider label

hughbe deleted the webutility-decode branch

April 16, 2016 16:44

karelz added this to the1.0.0-rtm milestone

Dec 3, 2016

karelz added this to the1.0.0-rtm milestone

Dec 3, 2016

picenka21 pushed a commit to picenka21/runtime that referenced this pull request

Feb 18, 2022

Merge pull requestdotnet/corefx#7671from hughbe/webutility-decode

ef6f6b5

Improve performance of WebUtility.Decode for non-escaped stringsCommit migrated fromdotnet/corefx@be2360b

Labels

area-System.Net

8 participants

Movatterモバイル変換

Improve performance of WebUtility.Decode for non-escaped strings#7671

Improve performance of WebUtility.Decode for non-escaped strings#7671

Uh oh!

Conversation

hughbe commentedApr 12, 2016

Benchmark - escaping needed

Benchmark - no escaping needed

Benchmark Code

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hughbe commentedApr 12, 2016

Uh oh!

stephentoub commentedApr 12, 2016

Uh oh!

hughbe commentedApr 12, 2016

Uh oh!

stephentoub commentedApr 12, 2016

Uh oh!

hughbe commentedApr 12, 2016

Uh oh!

stephentoub commentedApr 12, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hughbe commentedApr 13, 2016

Uh oh!

stephentoub commentedApr 15, 2016

Uh oh!

davidsh commentedApr 15, 2016

Uh oh!

davidsh commentedApr 15, 2016

Uh oh!

davidsh commentedApr 15, 2016• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Uh oh!

davidsh commentedApr 15, 2016

Uh oh!

hughbe commentedApr 16, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

davidsh commentedApr 15, 2016•
edited
Loading