NotificationsYou must be signed in to change notification settings
Fork3.1k
Star14.4k

Significant performance improvements to Futures#7663

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Jump to bottom

Merged

SethTisue merged 7 commits intoscala:2.13.xfromviktorklang:wip-additional-perf-futures-√

Feb 15, 2019

Merged

Significant performance improvements to Futures#7663

SethTisue merged 7 commits intoscala:2.13.xfromviktorklang:wip-additional-perf-futures-√

Feb 15, 2019

The head ref may contain hidden characters:"wip-additional-perf-futures-\u221A"

Dismiss

Conversation

Copy link

Contributor

viktorklang commentedJan 17, 2019

Improvements to Batching Executor by separating async vs sync
Performance improvements to installation of BlockContext
Reusing EC if Batchable for some of the Future utilities
Using Semaphore instead of bespoke permit counter
Switching to batching only select Transformations which benefit
Updating the FutureBenchmark to accommodate incompatibilities

scala-jenkins added this to the2.13.0-RC1 milestone

Jan 17, 2019

viktorklang commented

Jan 17, 2019

View reviewed changes

src/library/scala/concurrent/BatchingExecutor.scala

		else {
		valnewLen=if (curLen==0)4else curLen<<1

		if (newLen<= curLen)thrownewStackOverflowError("Space limit of asynchronous stack reached:"+ curLen)

Copy link

ContributorAuthor

viktorklangJan 17, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

A bit uncertain on this, on one hand we could throw a "normal" exception, on the other hand, we'll get an OOME if we are unable to allocate a large enough array—which is a fatal exception. I'm leaning towards this as an acceptable solution.

Copy link

Contributor

NthPortalJan 20, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

it makes sense to me to throw a VMError, but SOE doesn't feel like quite the right one to me.VirtualMachineError doesn't appear to be abstract - what do you think of throwing that instead?
also, I'm wondering if the max size should go closer to the limit, rather than stopping atInt.MaxValue / 2 + 1

Copy link

Contributor

NthPortalJan 20, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

unless I'm misunderstanding. is the batch only grown from within itself? in that case, SOE makes more sense to me, but not if you've just added too manyRunnables from anywhere

Copy link

ContributorAuthor

viktorklangJan 20, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

You can think of it as tasks added to the EC within executing a Task of that EC (recursive task generation) instead of running them directly on the stack, they are added to the Batch—on the heap—and if that batch grows too large, it is really a bit of a StackOverflowError. And the reason it is a serious problem is that the program will not know how to deal with running out of resources.

viktorklang commented

Jan 17, 2019

View reviewed changes

src/library/scala/concurrent/BatchingExecutor.scala

		valcurLen= curOther.length
		if (curSize<= curLen) curOther
		else {
		valnewLen=if (curLen==0)4else curLen<<1

Copy link

ContributorAuthor

viktorklangJan 17, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I investigated different growth strategies, but here's the thing—if tasks are added incrementally they tend to be added faster than they can be consumed, which means that we need to grow fast anyway.

viktorklang commented

Jan 17, 2019

View reviewed changes

src/library/scala/concurrent/BatchingExecutor.scala

		private[concurrent]traitBatchingExecutorextendsExecutor {
		private[this]finalval_tasksLocal=newThreadLocal[Batch]()
		private[concurrent]traitBatchingExecutorextendsExecutor {
		private[this]finalval_tasksLocal=newThreadLocal[AnyRef]()

Copy link

ContributorAuthor

viktorklangJan 17, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Made this AnyVal so we can use a cheap sentinel (empty string) to trigger Batch inflation and avoid allocating a batch for the simple case of adding a single runnable and subsequently just executing it, going from 0-1-0.

viktorklang commented

Jan 17, 2019

View reviewed changes

src/library/scala/concurrent/BatchingExecutor.scala Outdated

		failure
		}

		@tailrecprivate[this]finaldefrunWithoutResubmit(failure:Throwable):Throwable=

Copy link

ContributorAuthor

viktorklangJan 17, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

We run this for sync / trampolines BatchingExecutors.

viktorklang commented

Jan 17, 2019

View reviewed changes

src/library/scala/concurrent/BatchingExecutor.scala Outdated

		private[this]finaldefhandleRunFailure(cause:Throwable):Throwable=
		if (size>0&&(NonFatal(cause)\|\| cause.isInstanceOf[InterruptedException])) {
		try {unbatchedExecute(this); cause }catch {
		if (resubmitOnBlock&&size>0) {

Copy link

ContributorAuthor

viktorklangJan 17, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

We only resubmitasync BatchingExecutors, and only if there's anything to resubmit.

viktorklang commented

Jan 17, 2019

View reviewed changes

src/library/scala/concurrent/BatchingExecutor.scala Outdated

		if (b.isInstanceOf[Batch]) b.asInstanceOf[Batch].push(runnable)
		else submitAsync(newBatch(runnable, resubmitOnBlock=true))
		}else submitAsync(runnable)
		}else {

Copy link

ContributorAuthor

viktorklangJan 17, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Delineation between async and sync implementations. We batch all sync Runnables, since not doing so would grow the actual thread stack (possibly infinitely).

Copy link

ContributorAuthor

viktorklang commentedJan 17, 2019

Ping@SethTisue @NthPortal

He-Pin reviewed

Jan 18, 2019

View reviewed changes

src/library/scala/concurrent/BatchingExecutor.scala Outdated

		else submitAsync(newBatch(runnable, resubmitOnBlock=true))
		}else submitAsync(runnable)
		}else {
		Objects.requireNonNull(runnable)// Make sure we're not trying to add nulls

Copy link

Contributor

He-PinJan 18, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

error messages

Copy link

ContributorAuthor

viktorklangJan 19, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

@hepin1989 You mean like "runnable is null"?

He-Pin reviewed

Jan 18, 2019

View reviewed changes

src/library/scala/concurrent/BatchingExecutor.scala Outdated

		valb= _tasksLocal.get
		if (b.isInstanceOf[Batch]) b.asInstanceOf[Batch].push(runnable)
		elseif (b nenull) {
		_tasksLocal.set("")// Set a marker to indicate that we are submitting synchronously

Copy link

Contributor

He-PinJan 18, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

change the"" to a constant?

Copy link

Contributor

NthPortalJan 19, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

yeah, the meaning might be clearer if it was a constant calledmarker orsentinel

Copy link

ContributorAuthor

viktorklangJan 19, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

"" is a constant in the constant pool. I wanted something which is allocation-free and belongs to the root class loader in case it isn't cleared. (Since that is benign and does not leak memory)

He-Pin reviewed

Jan 18, 2019

View reviewed changes

src/library/scala/concurrent/Future.scala

		in.iterator.foldLeft(successful(bf.newBuilder(in))) {
		(fr, fa)=> fr.zipWith(fa)(Future.addToBuilderFun)
		}.map(_.result())(InternalCallbackExecutor)
		}.map(_.result())(if (executor.isInstanceOf[BatchingExecutor]) executorelseInternalCallbackExecutor)

Copy link

Contributor

He-PinJan 18, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

extract to a method

Copy link

ContributorAuthor

viktorklangJan 19, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

@hepin1989 I might do that!

NthPortal reviewed

Jan 19, 2019

View reviewed changes

src/library/scala/concurrent/BatchingExecutor.scala

		*
		* WARNING: The underlying Executor's execute-method must not execute the submitted Runnable
		* in the calling thread synchronously. It must enqueue/handoff the Runnable.
		*/

Copy link

Contributor

NthPortalJan 19, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

is this doc comment no longer accurate? or something else?

Copy link

ContributorAuthor

viktorklangJan 19, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

There are several inaccuracies in there—let me see if I can correct it.

NthPortal reviewed

Jan 20, 2019

View reviewed changes

src/library/scala/concurrent/BatchingExecutor.scala Outdated

		}

		@tailrecprivate[this]finaldefrunWithoutResubmit(failure:Throwable):Throwable=
		if ((failure nenull)&& (failure.isInstanceOf[InterruptedException]\|\|NonFatal(failure))) {

Copy link

Contributor

NthPortalJan 20, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

if we're being consistent with your above change,

Suggested change

	if ((failurenenull)&& (failure.isInstanceOf[InterruptedException]\|\|NonFatal(failure))) {
	if ((failure!=null)&& (failure.isInstanceOf[InterruptedException]\|\|NonFatal(failure))) {

Copy link

ContributorAuthor

viktorklangJan 20, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Agreed!

NthPortal reviewed

Jan 20, 2019

View reviewed changes

src/library/scala/concurrent/BatchingExecutor.scala Outdated

		else submitAsync(newBatch(runnable, resubmitOnBlock=true))
		}else submitAsync(runnable)
		}else {
		Objects.requireNonNull(runnable)// Make sure we're not trying to add nulls

Copy link

Contributor

NthPortalJan 20, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

why is there only a null check if!isAsync?

Copy link

ContributorAuthor

viktorklangJan 20, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Great question!

batchable would returnfalse for anull parameter, which means that it will be submitted tosubmitAsync instead, which typically will callsuper.execute(runnable) and that call is specced to throw an exception if the Runnable is null:https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/Executor.html#execute(java.lang.Runnable)

I'll add a comment.

viktorklang force-pushed thewip-additional-perf-futures-√ branch fromcdee3bd tofce2dd3Compare

January 20, 2019 19:15

Copy link

Contributor

adriaanm commentedJan 23, 2019

/nothingtoseehere

Copy link

tarsa commentedJan 25, 2019•
edited
Loading

What about removing Promise linking altogether? It could be a significant performance win (though I don't know it). OTOH Promise linking is an undocumented (except some blog posts) heuristics that is an implementation detail, so removing it won't break any guarantees.

Copy link

ContributorAuthor

viktorklang commentedJan 26, 2019 via email

Trust me, if you remove it then things will break. If you can PR a workingproposal which provably doesn’t break then I am all ears. (Note: I havealready tried it and it doesn’t work, so YMMV)

On Sat, 26 Jan 2019 at 00:13, Piotr Tarsa ***@***.***> wrote: What about removing Promise linking altogether? It could be a significant performance win. OTOH Promise linking is an undocumented (except some blog posts) heuristics that is an implementation detail, so removing it won't break any guarantees. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#7663 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAAqd8wVODLu9wdCez2HQr4mxuzDXNdoks5vG4-ugaJpZM4aFdJX> .

-- Cheers,√

Copy link

tarsa commentedJan 26, 2019•
edited
Loading

Hi@viktorklang

I've removed promise linking and nothing broke, except of course a test that specifically tests linking (it stopped compiling because it was accessing non-public methods).

Here's my attempt:https://github.com/tarsa/scala/commits/promise-linking-removal

I've run all tests mentioned in main readme:

sbt test
sbt partest
sbt scalacheck/test

Also I've run benchmark from scala.concurrent.FutureBenchmark.scala but I've reduced iterations number severely (I didn't want to spend 17 days benchmarking) so the results aren't fully reliable. From my quick benchmark's results it seems that removing promise linking brings big wins (about 10x) in two benchmarks:

FirstCompletedOfFutureBenchmark.pre fjp
VariousFutureBenchmark.pre fie

In other tests our implementations were more or less trading blows. Maybe all of these differences are due to noise in the results (too short benchmarking).

Copy link

ContributorAuthor

viktorklang commentedJan 27, 2019 via email

Hi Piotr,Thanks! May I ask—have you had a look at memory usage (pressure andretention)?Important to set recursion to 8192 or greater to assess impact for longchains.Cheers,V

On Sat, 26 Jan 2019 at 15:06, Piotr Tarsa ***@***.***> wrote: Hi@viktorklang <https://github.com/viktorklang> I've removed promise linking and nothing broke, except of course a test that specifically tests linking (it stopped compiling because it was accessing non-public methods). Here's my attempt:https://github.com/tarsa/scala/commits/promise-linking-removal I've run all tests mentioned in main readme: - sbt test - sbt partest - sbt scalacheck/test Also I've run benchmark from scala.concurrent.FutureBenchmark.scala but I've reduced iterations number serverely (I didn't want to spend 17 days benchmarking). From my quick benchmark's results it seems that removing promise linking brings big wins (about 10x) in two benchmarks: - FirstCompletedOfFutureBenchmark.pre fjp - VariousFutureBenchmark.pre fie In other tests our implementations were more or less trading blows. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#7663 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAAqdwt7eWz0GOg1OU54jK-2E407zidbks5vHGEBgaJpZM4aFdJX> .

-- Cheers,√

Copy link

tarsa commentedJan 28, 2019

Thanks! May I ask—have you had a look at memory usage (pressure and retention)?

How to measure it automatically? I've reduced memory from 1G to 100M in the benchmark and increased recursion level from 1024 to 8192. That didn't change much in relative performance.

I've found that DefaultPromise.completeWith is rather unnecessarily slow because of extra dispatch to executor service. I've optimized it by doing synchronous DefaultPromise.completeWith, just like when you do DefaultPromise.complete. When doing DefaultPromise.complete the callbacks are traversed on the same thread that called DefaultPromise.complete. I've optimized it here:tarsa/scala@bfb96ea...tarsa:18e5676871928dafc5d2c13d3b36f09da32cf032 but this solution is probably prone to stack overflow. I'll come up with something stack safe that should be foundation to main change which is promise linking removal.

Copy link

ContributorAuthor

viktorklang commentedJan 28, 2019 via email

On Mon, Jan 28, 2019 at 9:51 PM Piotr Tarsa ***@***.***> wrote: Thanks! May I ask—have you had a look at memory usage (pressure and retention)? How to measure it automatically? I've reduced memory from 1G to 100M in the benchmark and increased recursion level from 1024 to 8192. That didn't change much in relative performance.

I'll hopefully have time to look at this something this week with a memoryprofiler. It would be awesome if we could get rid of the linking—but wereally need to make sure we don't break any user programs.

I've found that DefaultPromise.completeWith is rather unnecessarily slow because of extra dispatch to executor service. I've optimized it by doing synchronous DefaultPromise.completeWith, just like when you do DefaultPromise.complete. When doing DefaultPromise.complete the callbacks are traversed on the same thread that called DefaultPromise.complete. I've optimized it here: tarsa/scala@bfb96ea...tarsa:18e5676871928dafc5d2c13d3b36f09da32cf032 <tarsa@bfb96ea...tarsa:18e5676871928dafc5d2c13d3b36f09da32cf032> but this solution is probably prone to stack overflow. I'll come up with something stack safe that should be foundation to main change which is promise linking removal.

I've already attempted to address this on a private branch:class DefaultPromise[T] private[this] (initial: AnyRef) extendsAtomicReference[AnyRef](initial) with scala.concurrent.Promise[T] withscala.concurrent.Future[T] with (Try[T] => Unit) { /** * WARNING: the `resolved` value needs to have been pre-resolved using`resolve()` * INTERNAL API */ override final def apply(resolved: Try[T]): Unit = tryComplete0(get(), resolved)override final def completeWith(other: Future[T]): this.type = { if (other ne this) { val state = get() if (!state.isInstanceOf[Try[T]]) { val resolved = if (other.isInstanceOf[DefaultPromise[T]])other.asInstanceOf[DefaultPromise[T]].value0 else other.value.orNull if (resolved ne null) tryComplete0(state, resolved) else other.onComplete(this)(InternalCallbackExecutor) } } this }

…

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#7663 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAAqd15c6_iGdsrmmqI-ATp_b5jz2p_0ks5vH2K8gaJpZM4aFdJX> .

-- Cheers,√

Copy link

tarsa commentedJan 28, 2019•
edited
Loading

other.onComplete(this)(InternalCallbackExecutor)

Well, that still goes through executor, which IIUC imposes some overhead of managing task queue and thread synchronization. I'm not sure how that InternalCallbackExecutor works, though.

Promise linking has the advantage that it reduces the number of completeWith operations. If completeWith operations are slow then Promise linking can be a performance win. Therefore I'm working on some stack-safe optimization of completeWith.

Update:
Bypassing InternalCallbackExecutor (using an extra if in Transformation.submitWithValue) leads to worse performance on FlatMapFutureBenchmark, surprisingly. Hmm....

Copy link

tarsa commentedFeb 2, 2019•
edited
Loading

I have done some more changes (tarsa/scala@bfb96ea...tarsa:promise-linking-removal ), benchmarks and test, but overall conclusion is that there's no significant performance increase from removal of promise linking - at least not if stacked on top of this pull request. In fact there seems to be some performance regression (I tested mainly with recursion depth = 8192 instead of 1024).

Various benign small changes had big impact on performance, for better or worse (depending on particular benchmark). It seems that Java's JIT is somewhat unpredictable. I didn't want to spend time analyzing assembly code produced by JIT and overtune the source code to the JIT, so I'll stop the experiment. There are no significant performance improvements (that I hoped for initially) to warrant further effort investment.

OTOH, straight removal of promise linking hasn't caused any correctness or stability problems. Only when I tried stack-unsafe complete/completeWith things started to break (i.e. I got stack overflows), but stack-unsafe complete/completeWith is not tied to promise linking removal.

Overall, only in the (unlikely?) case where promise linking blocks other optimizations, I would consider promise linking removal.

Copy link

Contributor

NthPortal commentedFeb 2, 2019

@tarsa for unbatched execution,InternalCallbackExecutor just immediately invokesRunnable#run(); for batched execution, it acts like a normalBatchingExecutor

NthPortal reviewed

Feb 2, 2019

View reviewed changes

src/library/scala/concurrent/impl/Promise.scala


		/**
		* Returns theassociaed `Future` with this `Promise`
		* Returns theassociated `Future` with this `Promise`

Copy link

Contributor

NthPortalFeb 2, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

should this say 'Future associated with'?

Suggested change

	*Returns theassociated`Future`withthis `Promise`
	*Returns the `Future` associatedwiththis `Promise`

anatse reviewed

Feb 3, 2019

View reviewed changes

src/library/scala/concurrent/BatchingExecutor.scala Outdated

		size=n-1
		ret
		finaldefpop():Runnable= {
		valsz=this.size

Copy link

anatseFeb 3, 2019•
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I'm not clearly understood, can you explain where is an improvement? It seems like additional instructions appears in case when size equals to 0.

Copy link

ContributorAuthor

viktorklangFeb 4, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

The rationale is thatpop() is always invoked post asize > 0-check, so that means that the 0-case is never triggered. This change makes the method having a single branch instead of potentially 2, since we cannot know that the JVM will generate anything more efficient.

Copy link

Contributor

mkeskellsFeb 4, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

the JVM would only have one branch ( because of@switch) ensuring that the bytecode was atableswitch orlookupswitch etc (which generates a faster machine code as well on x64 than a series ofif statements) , but anif is still probably faster here :-)

It may be worth adding a doc comment on the method here to explain that size > 0 should be checked by the caller

If size cannot be< 1 should the test betest == 1?

Copy link

ContributorAuthor

viktorklangFeb 4, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

@mkeskells the size > 0 check is not mandatory, the logic will still work. AFAIK HotSpot passes both tableswitch and lookupswitch through the same codepath.

Copy link

Contributor

mkeskellsFeb 4, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

@viktorklang they are different constraints that allow generation of different machine code

Copy link

ContributorAuthor

viktorklangFeb 4, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

@mkeskells Agreed. I've changed the code to not do the size-check before and instead do the tableswitch.

adriaanm assignedSethTisue

Feb 4, 2019

viktorklang force-pushed thewip-additional-perf-futures-√ branch 2 times, most recently frombf96184 to2abdbcdCompare

February 4, 2019 17:50

viktorklang commented

Feb 11, 2019

View reviewed changes

src/library/scala/concurrent/BatchingExecutor.scala

		else {
		vali=if (b nenull) b.asInstanceOf[java.lang.Integer].intValueelse0
		if (i<BatchingExecutorStatics.syncPreBatchDepth) {
		tl.set(java.lang.Integer.valueOf(i+1))

Copy link

ContributorAuthor

viktorklangFeb 11, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

We use j.l.Integer here to control the bytecode generated—the Integers we will be using here are all internally cached inside Integer, so no allocations on this path.

viktorklang commented

Feb 11, 2019

View reviewed changes

src/library/scala/concurrent/BatchingExecutor.scala

		}
		}

		private[this]finalclassSyncBatch(runnable:Runnable)extendsAbstractBatch(runnable,BatchingExecutorStatics.emptyBatchArray,1)withRunnable {

Copy link

ContributorAuthor

viktorklangFeb 11, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

@tarsa check the "fie" pool benches with this commit. :D

viktorklang force-pushed thewip-additional-perf-futures-√ branch from5cf0d16 to909d125Compare

February 12, 2019 15:18

Copy link

Member

SethTisue commentedFeb 13, 2019•
edited
Loading

https://scala-ci.typesafe.com/job/scala-2.13.x-integrate-community-build/1787/

jrudolph reviewed

Feb 14, 2019

View reviewed changes

Copy link

Contributor

jrudolph left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Had only a quick look. It's looking good, looking forward to the performance improvements. :)

Generally, it would be good to break these kinds of changes up into smaller ones and accompany them with benchmarks to make it easier to verify if the added code complexity (in some parts) is actually worth it.

src/library/scala/concurrent/BatchingExecutor.scala

		finalvalmarker=""

		// Max number of Runnables executed nested before starting to batch (to prevent stack exhaustion)
		finalvalsyncPreBatchDepth=16

Copy link

Contributor

jrudolphFeb 14, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Making them configurable would be nice. In the best case, per dispatcher but if that's not possible maybe by JVM property?

src/library/scala/concurrent/impl/Promise.scala

		}
		if ((this ne p)&& compareAndSet(state, l)) {
		if (state neNoop) p.dispatchOrAddCallbacks(p.get(), state.asInstanceOf[Callbacks[T]])// Noop-check is important here
		}else linkRootOf(p, l)

Copy link

Contributor

jrudolphFeb 14, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Previously, for!(pe ne this)linkRootOf was not called. Was that change intentional?

Copy link

ContributorAuthor

viktorklangFeb 14, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

@jrudolph Yes, the change is intentional. Since the method is tail recursive it will be a no-op if it recurses:https://github.com/scala/scala/pull/7663/files#diff-e0c06059ba197fea5927c4e4b0f3d837R309

Copy link

ContributorAuthor

viktorklang commentedFeb 14, 2019

@jrudolph I agree that it is generally better to split it up in multiple small PRs, but this is a massive overhaul an some optimizations are invalidated by other optimizations or turn out to introduce design constraints for future progress—so I anticipate further optimizations to this design to be much smaller and self-contained. I do believe that I'm already up to the boundaries of what this design can perform like. So for a future design it might look completely different. :S

Copy link

ContributorAuthor

viktorklang commentedFeb 14, 2019 via email

In this case, the on-stack execution improved performance by 50% forFuture.InternalCallbackExecutor-- Cheers,√

Copy link

ContributorAuthor

viktorklang commentedFeb 14, 2019

@SethTisue

queued:
https://scala-ci.typesafe.com/job/scala-2.13.x-integrate-community-build/1787/

If only scala-xml failed then that should be fine, right?

Copy link

Member

SethTisue commentedFeb 14, 2019

If only scala-xml failed then that should be fine, right?

scala-xml failing takes out many downstream projects. I'll have to figure out what happened there, and try again. doesn't seem like it could be related to your PR.

Copy link

Member

SethTisue commentedFeb 14, 2019

@viktorklang could you rebase this onto current 2.13.x? otherwise I don't think we'll get usable community build results

viktorklang added7 commits

February 14, 2019 21:20

Significant performance improvements to Futures

a23907f

  * Improvements to Batching Executor by separating async vs sync  * Performance improvements to installation of BlockContext  * Reusing EC if Batchable for some of the Future utilities  * Using Semaphore instead of bespoke permit counter  * Switching to batching only select Transformations which benefit  * Updating the FutureBenchmark to accommodate incompatibilities

Enabling performance optimization for synchronous BatchingExecutors

cd466de

Modifications to FutureBenchmark, for performance

5654813

Addresses both performance and correctness aspects of Future

24a5713

  * A regression was introduced since one of the tests in    FutureSpec was comparing the wrong things, not noticing    that regression led to design considerations which are    in hindsight invalid. For that reason scala.concurrent.    Transformation now stores its ExecutionContext for longer,    this in order to make sure that we can use ec.reportFailure    on NonFatal Throwables thrown when executing Future.foreach    and Future.onComplete—this to live up to their contract, and    to facilitate easier debugging of user code which uses Future.  * I noticed that a typo had been introduced when changing between    successful and failed branches when submitting Runnables to    synchronous BatchingExecutors—this typo led to not activating    the fast-path of that code, which yields a significant performance    by delaying the inflation of allocating a Batch.  * Execution of batched Runnables for synchronous BatchingExecutors    has now been slightly improved due to reducing reads.  * Submitting Runnables to BatchingExectuor now null-checks all    Runnables, before would have cases where checks were not performed.  * Improves performance of completeWith by avoiding to allocate the    function which would attempt to do the actual completion.

Further improves performance of Futures via BatchingExecutor

bbda70d

  * Splits the BatchingExecutor into Sync and Async Batches  * Inlining logic in Promise.Transformation, and switching to Int    for _xform ordinal

Improves performance and fairness of BatchingExecutor

6514830

  * Allows for max 16 Runnables to execute nested on stack  * Allows for max 1024 Runnables to be executed before resubmit  * Makes synchronous BatchingExecutors also use submitForExecution    since this makes things easier to instrument.

Adding custom error messages that Promises&Future are not serializable

50c6f94

  * Complete with regression test verifying that they all throw    NotSerializableException

viktorklang force-pushed thewip-additional-perf-futures-√ branch from909d125 to50c6f94Compare

February 14, 2019 20:20

Copy link

ContributorAuthor

viktorklang commentedFeb 14, 2019

@SethTisue Rebased and pushed!

Copy link

Member

SethTisue commentedFeb 14, 2019•
edited
Loading

go, run 1791! do the best you can!https://scala-ci.typesafe.com/job/scala-2.13.x-integrate-community-build/1791/

events in France in 1791 included the Day of Daggers, the Champ de Mars Massacre, and the Massacres of La Glacière, so hopefully this goes at least that well

Copy link

tarsa commentedFeb 15, 2019•
edited
Loading

I've benchmarked and compared (eyeballed) results forFurther improves performance of Futures via BatchingExecutor Viktor Klang 06.02.19 19:21 c83124acb5cbd2d37cf35ecfb700f344f320fe7e andMakes Promise.Transformation's special fields not serialized Viktor Klang 11.02.19 15:01 5cf0d1626725391754bbb0d2480a0b5e0b1d7985
"fie" indeed gained strongly, while for other executors results were mixed (not sure if better or worse). Aggregate scores for each executor would make comparison easier.

Copy link

ContributorAuthor

viktorklang commentedFeb 15, 2019

@tarsa Others thanfie would not benefit from the changes I made to sync execution. But the changes made to fairness could impact performance slightly for the worse, but overall I think it's a good tradeoff to make sure that there is eventual progress for other tasks on the same pool.

Copy link

ContributorAuthor

viktorklang commentedFeb 15, 2019

@SethTisue I'm unsure as how to parse/interpret the results of that community build. How do I know what is related to my changes?

Copy link

Member

SethTisue commentedFeb 15, 2019•
edited
Loading

@SethTisue I'm unsure as how to parse/interpret the results of that community build. How do I know what is related to my changes?

we have to compare it to another recent run such ashttps://scala-ci.typesafe.com/job/scala-2.13.x-integrate-community-build/1789/ , which shows the same failures. no failures appear attributable to this PR

I intend to address those failures soon. we needn't hold up this PR over it. there will be many further community build runs between now and 2.13.0 final