- Notifications
You must be signed in to change notification settings - Fork4.9k
Improve performance of BigInteger.Pow/ModPow#2182
Uh oh!
There was an error while loading.Please reload this page.
Conversation
stephentoub commentedJun 27, 2015
Thanks,@axelheer. Will take a look, but in the meantime, do you know why the 64/10K case suffers a 33% regression? |
axelheer commentedJun 27, 2015
@stephentoub nope, that's a bit odd and needs further investigation. Maybe the Barrett reduction backfires for smaller numbers. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Shouldn't this be in the ItemGroup above with the rest of the BigIntegerCalculator.* files?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Sure. Will fix that.
axelheer commentedJun 29, 2015
@stephentoub is it of equal value to just "amend" further changes? Or has the push further commits / squash in the end process any pros? |
stephentoub commentedJun 29, 2015
In this case I'd suggesting adding additional commits and then squashing later. Personally, I prefer to just ammend the existing commit when the changes are small / trivial and I don't expect anyone to need to re-review, e.g. fixing up typos or making simple one-line fixes. For larger things, I prefer to have the additional commits so that it's clear to a reviewer what's changed. That's just me, though. |
axelheer commentedJun 30, 2015
@stephentoub@mellinoe the two internal helpers are structures now, the reducer seems to pay off already at 256 bits -- below that an ordinary remainder is used. But I still measure that regression at 64 bits, a bit smaller but still there. I have no idea what's wrong here... @jasonwilliams200OK a DNX console app doesn't produce an .exe. And a classic .NET 4.6 console application does not work since there are type forwards from |
ghost commentedJun 30, 2015
Ah, I thought you are using coreclr: |
axelheer commentedJul 1, 2015
@dotnet-bot test this please |
axelheer commentedJul 2, 2015
Update: it seems that the current Thus, I don't think right here is much more to optimize. But, I should review integer division for 64 bit numbers (for 64 bit divisors?) and prepare a separate PR when I figured out a solution. @stephentoub What do you think? |
stephentoub commentedJul 2, 2015
Just so I understand, you're saying we already took a regression in the previous change, and that's what you'll be looking to repair in a subsequent one? Can you quantify that regression? |
axelheer commentedJul 2, 2015
Yeah. If that's a problem we can rollback this stuff and wait until I figured out why it gets stuck at 64 bits.
It's the same situation for |
axelheer commentedJul 3, 2015
I finally rewrote the integer division, I just combined the algorithms of The performance regression seems to be fixed now: ModPow
I'm still not happy with the division code, maybe I tweak it a bit tomorrow. |
axelheer commentedJul 5, 2015
@stephentoub since the last commit the temporary memory of buffers can be shared between them, which means n / 2 - 1 less allocations (okay, just 3 instead of 4 in our case). I'm quite sure doing it with less allocations isn't possible. Agreed? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
And in particular it's a mutable struct. Could you note that and move this warning to above the class?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
The two Pow methods are almost identical. I wonder if they should be refactored to share most of the code, e.g.
publicstaticuint[]Pow(uintvalue,uintpower){intlength=PowBound(power,1,1);BitsBuffervalueBuffer=newBitsBuffer(length,value);returnPowCore(length,power,refvalueBuffer);}publicstaticuint[]Pow(uint[]value,uintpower){Debug.Assert(value!=null);intlength=PowBound(power,value.Length,1);BitsBuffervalueBuffer=newBitsBuffer(length,value);returnPowCore(length,power,refvalueBuffer);}privatestaticuint[]PowCore(intlength,uintpower,refBitsBuffervalueBuffer){BitsBuffertempBuffer=newBitsBuffer(length,0);BitsBufferresultBuffer=newBitsBuffer(length,1);PowCore(power,refvalueBuffer,refresultBuffer,reftempBuffer);returnresultBuffer.ToArray();}
or something like that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Sure.
stephentoub commentedJul 7, 2015
Also, I know we don't have any formal performance harnesses set up, but are your perf tests checked in anywhere? If not, could you please add them? You could check them in as unit tests with an [ActiveIssue("PerfTest")] or something like that for now, until we have a better performance-measurement system in place. |
axelheer commentedJul 7, 2015
@stephentoub thanks, I pushed a few changes addressing your input so far. For your debug / release concern I don't came up with a solution, maybe you've an idea what I should do? The performance tests I've made are based on a very primitive console app (I linked to its gist in the PR message), it's far from a bunch of unit tests. If you want me to check in that code (although I don't think it's that exciting), where? If you want me to write some tests based on it, how? |
stephentoub commentedJul 7, 2015
What is the time distribution like amongst the various tests? Are there any we could move to be [OuterLoop] such that we'd be able to always use the 32 value, still have reasonably good inner-loop test coverage, have that inner-loop coverage run relatively quickly, but still have the full suite available to run as part of outer loop?
I was thinking you could do something like this:
[Fact][ActiveIssue("PerformanceTest")]publicstaticvoidModPow(){ ...// most of the contents of your existing console test for ModPow} Then when someone wants to run the test, they can just remove the [ActiveIssue] attribute, or something like that. And it should hopefully make it easier for us to promote these to real and harnessed performance tests when we have the infrastructure available publically. Is that reasonable? I was aiming for something easy for you to do and with minimal ceremony. |
axelheer commentedJul 7, 2015
Just
What does I might add, that the changed behavior should be very harmless, it's just a threshold when to switch to the "more sophisticated" code, which doesn't depend on that value. On the contrary, if it works even for smaller values, it's a good thing.
I see, I can add that for all those operations, but I'd prefer to do that separately on a subsequent date. By the way, I did more cleanup on the "modpow" code based on your inputs and will push that shortly. |
stephentoub commentedJul 7, 2015
We currently have two main classes of tests. The first, called "inner loop", are meant to be fast, do basic validation, and run whenever you do a build locally or as part of CI on the server. The second, called "outer loop", are meant to be more robust and are ok to take longer; these don't run by default when you build locally, nor as part of normal CI runs... we have a special job on the server that runs them a few times a day. You can also run them locally with xunit by not passing "-notrait category=outerloop" on the command-line (this argument is specified by default when you do a test run locally).
That feels like a gap... doesn't that mean that the reducer stuff won't be used at all when tests are run on release builds?
That also means that we're not testing the normal algorithm in debug for values between 8 and 32 bits, yeah that's the algorithm we'll be using on that range in production. That feels wrong. How important is it to performance that ReducerThreshold be a const? If it were, for example, a static int that was read each time it was needed, would that be measurable? If that wouldn't affect anything, we could potentially test things by mucking with the threshold value from the tests via reflection at runtime. Not ideal, for sure, but could be the least evil thing.
That's fine, thanks. |
axelheer commentedJul 7, 2015
Hm, we've that debug / release thing already three times: multiply, square, modpow. I can remove this "trick" and change the thresholds to static values as you've suggested, which should work since xunit runs the tests at class level single threaded, and add further tests / extend existing tests to change the thresholds accordingly. That should do it, yeah. Not ideal, but better than the current one. |
axelheer commentedJul 7, 2015
@stephentoub the last commit should test as discussed. I actually like that solution now. The (new) tests for three big values became |
stephentoub commentedJul 8, 2015
Left a few minor comments, but otherwise LGTM. Feel free to squash when ready. |
f240d8d toced9151Comparestephentoub commentedJul 8, 2015
LGTM. Thanks for doing this,@axelheer. @KrzysztofCwalina, look good to you? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Feel free to commit the change as is, but it would be better if this file was called ...BitsBuffer.cs to match the type name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
@KrzysztofCwalina, which name is more suitable in this case:BigIntegerCalculator.BitsBuffer.cs orBitsBuffer.cs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
This project already uses the "outer_type.inner_type.cs" conventions, so I would stick to it. And in general, I like the convention, though I am not sure if it's standard over the whole CoreFx repo.@terrajobst, do we have a standard convention for naming files with partial types containing nested type?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
I'd preferBigIntegerCalculator.BitsBuffer.cs instead ofBitsBuffer.cs (lexicographical reasons). Since the other files have just abbreviations as suffix, I've chosen the shorterBigIntegerCalculator.Buffer.cs (long file names are so java...).
But I haven't that much emotion for these file names, just say and I amend. ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Let's go with BigIntegerCalculator.BitsBuffer.cs.
KrzysztofCwalina commentedJul 8, 2015
Looks good. Thanks! |
- To introduce further performance tweaks, the exponentiation algorithms are rewritten and ported to `BigIntegerCalculator`.- To scale better for bigger numbers a `FastReducer` is in use instead of "ordinary" modulo operations, which is based on multiplications.- Furthermore the newly introduced `FastReducer` triggers a bad corner case within the division algorithm, which gets fixed too.- A performance regression at 64 bits within integer division was found, which gets fixed too (no more allocations within that code).- The test code for threshold values of square / multiply / modpow now modifies these thresholds for more thorough testing.
ced9151 to1d59ab4Compareaxelheer commentedJul 9, 2015
@dotnet-bot test this please |
stephentoub commentedJul 9, 2015
Thanks! |
Improve performance of BigInteger.Pow/ModPow
…manceImprove performance of BigInteger.Pow/ModPowCommit migrated fromdotnet/corefx@3279ca8
To introduce further performance tweaks, the exponentiation algorithms are ported to
BigIntegerCalculator. Furthermore the newly introducedFastReducertriggers a bad corner case within the division algorithm, which gets fixed too.A basic performance comparison based onthis code unveils the following results:
ModPow