Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings
This repository was archived by the owner on Jan 23, 2023. It is now read-only.
/corefxPublic archive

Improve performance of WebUtility.Decode for non-escaped strings#7671

Merged
davidsh merged 4 commits intodotnet:masterfromhughbe:webutility-decode
Apr 15, 2016
Merged

Improve performance of WebUtility.Decode for non-escaped strings#7671

davidsh merged 4 commits intodotnet:masterfromhughbe:webutility-decode
Apr 15, 2016

Conversation

@hughbe
Copy link

  • Cuts allocations down to 1/3 of the original
  • Doubles performance (in time)
  • We still allocate a little bit, as removing all allocations would harm the performance of strings that actually need escaping

Benchmark - escaping needed

  • No performance regressions

benchmark escaping

Benchmark - no escaping needed

benchmark no escaping

Benchmark Code

Click here

static void Main(string[] args){    // Escaping    TimeAction("Old: ", () => Old.UrlDecode("%ABabc"));    TimeAction("New: ", () => New.UrlDecode("%ABabc"));    // No escaping    TimeAction("Old: ", () => Old.UrlDecode("abc"));    TimeAction("New: ", () => New.UrlDecode("abc"));    Console.ReadLine();}public static void TimeAction(string prefix, Action action){    var sw = new Stopwatch();    for (int iter = 0; iter < 5; iter++)    {        int gen0 = GC.CollectionCount(0);        sw.Restart();        for (int i = 0; i < 10000000; i++)        {            action();        }        sw.Stop();        Console.WriteLine($"{prefix}Time: {sw.Elapsed.TotalSeconds}\tGC0: {GC.CollectionCount(0) - gen0}");    }}

/cc@stephentoub@jamesqo@davidsh
Fixes #6542 together with#7546

- Cuts allocations down to 1/3 of the original- Doubles performance (in time)- We still allocate a little bit, as removing all allocations would harmthe performance of strings that actually need escapingFixes #6542
@hughbehughbe changed the titleImprove performance of WebUtility.Decode for non-escaped stringImprove performance of WebUtility.Decode for non-escaped stringsApr 12, 2016
privatestructUrlDecoder
{
publicbool_containsUnsafe;
publicbool_containsSpaces;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Why store these as fields on the UrlDecoder? Seems like they could just be a single localneedsDecoding, and then instead of doingreturn helper.GetString();, you'd doreturn needsDecoding ? helper.GetString() : value;.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I agree with moving them out of UrlDecoder.
I would like to keep the fieldscontainsUnsafe andcontainsSpaces as separate variables, as I'm working on a PR that's gonna heavily optimizeUrlEncode andUrlDecode for strings that only need space encoding/decoding. Is that OK?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I'd prefer it if you kept them as a single local now. If a subsequent change needs them separated, then that change can do it. Otherwise, if for example that change never materializes or gets merged, we're left with debt.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Sure thing!


_charBuffer=newchar[bufferSize];
_charBuffer=null;//char buffer created on demand
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Is this really useful? Empty string is already special cased to avoid decoding. So isn't this only valuable in the case where all of the inputs are bytes rather than chars, and to enable that you're then doing an extra null check branch on every AddChar? And even then all of the bytes except for those at the end will end up getting dumped into the char[], forcing it to be allocated. Doesn't seem like a good tradeoff.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Well I found that this halved allocations for theUrlDecode("noescaping") case, as it seems that ASCII chars are added to the decoder asAddByte not asAddChar. I found this behaviour weird, but lazilly instantiating_charBuffer does help in the no escaping case.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Just a follow up: its due tothis line of code.

I don't really understand why "// 7 bit have to go as bytes because of Unicode" but hey

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I see, ok. (Seems strange then that we're lazily allocating the byte[] array; doing so suggests we don't expect any ASCII chars in common input, but let's not change it.)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

understood - obviously I'm not planning to change any of that behaviour in this PR, but do you have a theory as to why we add ascii chars as bytes not chars?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

right but they allocate the array only when needed

No. In the code before this PR, the char[] is always allocated (unless the string is empty).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

right. looking more, it looks when having ascii characters, it needs to be encoded by the encoder. bu tif the character is not ascii will be stored as it is without encoding. so I believe what they are doing is collecting ascii charcaters in the array, and everytime need to flush it, will run the encoder just once on the array before the flush. so it is just optimizing to not calling the encoder with every ascii character.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

so it is just optimizing to not calling the encoder with every ascii character.

but doesn't the implementation check if there are any bytes in the buffer, and if there are none in the buffer it doesn't flush any bytes
So that means that each ascii character wouldn't flush any bytes, only the first in a string of ascii chars

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

UrlDecoder is collecting the ascii bytes till either AddChar or GetString get called. at that time it will flush the bytes after encoding it. and then will start collecting the new encountered bytes again and so on.
imagine you have string like aaaaaNaaaaNaaaaNaaa
where a is ascii character and N is not ascii, this will have the byte array get filled 4 times and encoded 4 times. if you didn't collect the ascii in byte array as the code is doing, this mean you'll need to call the encoder the number of 'a' characters in the string.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

understood, thanks

@hughbe
Copy link
Author

Test Innerloop CentOS7.1 Release Build and Test
Test Innerloop Windows_NT Debug Build and Test

@stephentoub
Copy link
Member

@dotnet-bot test code coverage please

@hughbe
Copy link
Author

@stephentoub is this new/can non-Microsoft people use that code coverage feature

@stephentoub
Copy link
Member

is this new

nope

can non-Microsoft people use that code coverage feature

yup

@hughbe
Copy link
Author

nope

That's cool, I've never noticed that before. Is the idea that it fails if there is an overall decrease in coverage or does it just publish a report? (I guess we'll find out!)

@stephentoub
Copy link
Member

Is the idea that it fails if there is an overall decrease in coverage or does it just publish a report? (I guess we'll find out!)

It doesn't perform any comparison, but it archives the report so you can click through to view it.

{
// No decoding needed
returnvalue;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

If it were me, I'd have written this like:

returnneedsDecoding?helper.GetString():value;

@hughbe
Copy link
Author

Thanks, I agree its a good idea to add some more tests to cover all the branches I've changed. I also added some more tests for invalid percent encoding, lowercase hex, space escaping and non-ASCII chars.
This brings coverage up to 100% for UrlDecodeInternal and almost 100% for UrlDecoder - I think the missing coverage may be dead code (line 683)

@stephentoub
Copy link
Member

LGTM

@davidsh?

@davidsh
Copy link
Contributor

LGTM.

@davidsh
Copy link
Contributor

@davidsh
Copy link
Contributor

davidsh commentedApr 15, 2016
edited
Loading

@dotnet-bot Test Innerloop Windows_NT Debug Build and Test

@davidsh
Copy link
Contributor

@dotnet-bot Test Innerloop CentOS7.1 Release Build and Test

@davidshdavidsh merged commitbe2360b intodotnet:masterApr 15, 2016
@hughbe
Copy link
Author

Glad to see this merged. Should I send over a similar PR to coreclr, or will this be automatically handled by the netfx-port-consider label

@hughbehughbe deleted the webutility-decode branchApril 16, 2016 16:44
@karelzkarelz added this to the1.0.0-rtm milestoneDec 3, 2016
@karelzkarelz added this to the1.0.0-rtm milestoneDec 3, 2016
picenka21 pushed a commit to picenka21/runtime that referenced this pull requestFeb 18, 2022
Improve performance of WebUtility.Decode for non-escaped stringsCommit migrated fromdotnet/corefx@be2360b
Sign up for freeto subscribe to this conversation on GitHub. Already have an account?Sign in.

Reviewers

No reviews

Assignees

No one assigned

Projects

None yet

Milestone

1.0.0-rtm

Development

Successfully merging this pull request may close these issues.

WebUtility.UrlEncode/Decode methods allocate even when unneeded

8 participants

@hughbe@stephentoub@davidsh@jamesqo@tarekgh@karelz@AlexGhiondea@dnfclas

[8]ページ先頭

©2009-2025 Movatter.jp