- Notifications
You must be signed in to change notification settings - Fork23
Description
It's an open secret that I've some criticisms of async/await:
Fear of Threads
From the perspective of a newcomer who is looking to transfer existing
knowledge or code from another language to Nim, or someone who has zero
experience with concurrency in any language, I recommend threads.
Here follows the first example from thethreads.nim
documentation. I tweaked
the syntax a touch --it was written in 2011 -- but that is all.
import lockstype Payload=tuple[a, b:int]var thr:array[4, Thread[Payload]] L: Lockproc threadFunc(interval: Payload) {.thread.}=for iin interval.a.. interval.b: withLock L: echo iinitLock Lfor iin0.. thr.high: createThread(thr[i], threadFunc, (i*10, i*10+5))joinThreads thrdeinitLock L
Here's the compilation output:
$ nim c --gc:arc --define:danger --threads:on -f th.nim.....CC: stdlib_assertions.nimCC: stdlib_io.nimCC: stdlib_system.nimCC: stdlib_locks.nimCC: th.nimHint: 22456 lines; 0.121s; 20.68MiB peakmem; Dangerous Release build; proj: /home/adavidoff/th.nim; out: /home/adavidoff/th [SuccessX]
Here's the codegen:
Language files blank comment code-------------------------------------------------------------------------------C 5 57 54 3733
The resulting binary is 52,472 bytes in size and links against:
-rwxr-xr-x. 1 root root 147232 Nov 2 22:20 /lib64/libpthread-2.32.so
Here's an invocation:
0.00user 0.00system 0:00.00elapsed 146%CPU (0avgtext+0avgdata 1592maxresident)k0inputs+0outputs (0major+103minor)pagefaults 0swaps
A quick run of golden:
┌────────┬──────────┬──────────┬──────────┬──────────┐│ Runs │ Min │ Max │ Mean │ StdDev │├────────┼──────────┼──────────┼──────────┼──────────┤│ 10000 │ 0.000266 │ 0.009509 │ 0.000391 │ 0.000120 │└────────┴──────────┴──────────┴──────────┴──────────┘
And a valgrind, which probably means nothing.
==1915== HEAP SUMMARY:==1915== in use at exit: 0 bytes in 0 blocks==1915== total heap usage: 33 allocs, 33 frees, 3,164 bytes allocated==1915== ==1915== All heap blocks were freed -- no leaks are possible
Macro Complexity
or, if you think you're smart,
Poor Documentation
This is my port of the above program to async/await:
import asyncdispatchtype Payload=tuple[a, b:int]var thr:array[4, Future[void]]proc asyncFunc(interval: Payload) {.async.}=for iin interval.a.. interval.b: echo ifor iin0.. thr.high: thr[i]= asyncFunc (i*10, i*10+5)await all(thr)
Here's the compilation output:
$ nim c --gc:arc --define:danger -f aw.nim..................................../home/adavidoff/aw.nim(15, 7) template/generic instantiation of `await` from here/home/adavidoff/nims/lib/pure/asyncmacro.nim(144, 3) Error: 'yield' only allowed in an iterator
Oops. I didn'timport asyncmacro
and I never usedyield
. Curious.
A quick look at the documentation forasyncdispatch
yields a single
occurrence of "asyncmacro"; in fact, it's the last sentence of the
documentation, under a section helpfully titled "Limitations/Bugs":asyncdispatch
module depends on theasyncmacro
module to work properly.
I have no idea what this means. As far as I can tell, theasyncdispatch
module depends on theasyncmacro
for compilation failures. 😁
But I know the problem must be inawait
, right? There's no documentation for
theawait
template and the documentation that does exist for usingawait
is
remarkably unhelpful.
Take a critical read of the documentation forasyncdispatch
some time.
It's worthy of another long thread, so to speak, because it's pretty hard to
understand.
Thankfully, there are experienced users on IRC that point out that I should usewaitFor
instead ofawait
. I remember reading aboutwaitFor
but since it
merely said itblocks the currentthread, I figured it wasn't what I wanted
at all. I feel stupid, now.
But at least I can compile my program:
$ nim c --gc:arc --define:danger -f aw.nim/home/adavidoff/nims/lib/pure/asyncdispatch.nim(1286, 39) Hint: passing 'adata.readList' to a sink parameter introduces an implicit copy; if possible, rearrange your program's control flow to prevent it [Performance]CC: stdlib_assertions.nimCC: stdlib_io.nimCC: stdlib_system.nimCC: stdlib_math.nimCC: stdlib_strutils.nimCC: stdlib_options.nimCC: stdlib_times.nimCC: stdlib_os.nimCC: stdlib_heapqueue.nimCC: stdlib_deques.nimCC: stdlib_asyncfutures.nimCC: stdlib_monotimes.nimCC: stdlib_nativesockets.nimCC: stdlib_selectors.nimCC: stdlib_asyncdispatch.nimCC: aw.nimHint: 56742 lines; 0.558s; 76.172MiB peakmem; Dangerous Release build; proj: /home/adavidoff/aw.nim; out: /home/adavidoff/aw [SuccessX]
Here's the codegen:
Language files blank comment code-------------------------------------------------------------------------------C 16 236 182 10840
The resulting binary is 125,704 in size and links against:
-rwxr-xr-x. 1 root root 147232 Nov 2 22:20 /lib64/libpthread-2.32.so-rwxr-xr-x. 1 root root 1263504 Nov 2 22:21 /lib64/libm-2.32.so-rwxr-xr-x. 1 root root 35080 Nov 2 22:21 /lib64/librt-2.32.so
Here's an invocation:
0.00user 0.00system 0:00.00elapsed 93%CPU (0avgtext+0avgdata 1948maxresident)k0inputs+0outputs (0major+99minor)pagefaults 0swaps
A quick run of golden:
┌────────┬──────────┬──────────┬──────────┬──────────┐│ Runs │ Min │ Max │ Mean │ StdDev │├────────┼──────────┼──────────┼──────────┼──────────┤│ 10000 │ 0.000242 │ 0.013270 │ 0.000323 │ 0.000190 │└────────┴──────────┴──────────┴──────────┴──────────┘
And a valgrind, which probably means nothing.
==653== HEAP SUMMARY:==653== in use at exit: 736 bytes in 12 blocks==653== total heap usage: 43 allocs, 31 frees, 3,056 bytes allocated==653== ==653== LEAK SUMMARY:==653== definitely lost: 256 bytes in 4 blocks==653== indirectly lost: 192 bytes in 4 blocks==653== possibly lost: 288 bytes in 4 blocks==653== still reachable: 0 bytes in 0 blocks==653== suppressed: 0 bytes in 0 blocks==653== Rerun with --leak-check=full to see details of leaked memory
CPS
This example uses a so-calleduntyped
implementation from August; it's a bit
more verbose because it demonstrates the machinery explicitly, in case you are
new to continuation passing style.
The newertyped
implementation is better, in terms of both syntax and
semantics (read: performance and performance), but it is currently blocked by
at least one compiler bug.
import cpstype Payload=tuple[a, b:int] C=refobjectof RootObj# a continuation fn:proc(c: C): C {.nimcall.}proc dispatch(c: C)=# a dispatchervar c= cwhile c!=niland c.fn!=nil: c= c.fn(c)proc cpsEcho(i:int): C {.cpsMagic.}=# i/o echo ireturn cproc cpsFunc(interval: Payload) {.cps: C.}=var i:int= interval.awhile i<= interval.b: cps cpsEcho(i) inc ifor iin0..3: dispatch cpsFunc((i*10, i*10+5))
Here's the compilation output:
$ nim c -r --define:danger --gc:arc -f cp.nim# ...lots of debugging output omitted...CC: stdlib_io.nimCC: stdlib_system.nimCC: cp.nimHint: 40002 lines; 0.455s; 61.277MiB peakmem; Dangerous Release build; proj: /home/adavidoff/git/cps/cp.nim; out: /home/adavidoff/git/cps/cp [SuccessX]
Here's the codegen:
Language files blank comment code-------------------------------------------------------------------------------C 3 38 35 3106
The resulting binary is 46,000 bytes in size and links against no novel
libraries. Here's an invocation:
0.00user 0.00system 0:00.00elapsed 94%CPU (0avgtext+0avgdata 1772maxresident)k0inputs+0outputs (0major+87minor)pagefaults 0swaps
A quick run of golden:
┌────────┬──────────┬──────────┬──────────┬──────────┐│ Runs │ Min │ Max │ Mean │ StdDev │├────────┼──────────┼──────────┼──────────┼──────────┤│ 10000 │ 0.000077 │ 0.011548 │ 0.000267 │ 0.000145 │└────────┴──────────┴──────────┴──────────┴──────────┘
And a valgrind, which probably means nothing.
==32426== HEAP SUMMARY:==32426== in use at exit: 0 bytes in 0 blocks==32426== total heap usage: 29 allocs, 29 frees, 2,200 bytes allocated==32426== ==32426== All heap blocks were freed -- no leaks are possible
Performance
Obviously, only the threaded version is actually asynchronous. 😁
And, I mean, sure, this isn't a real program. So why is it so large?
- The async/await version generated ~3x more code.
- The async/await version's binary is 2.5x the size.
- The threaded version is slower despite using 50% more CPU.
And CPS...
- Has no platform-specific code, no extra libs, and the least C output.
- Has no lock, no
yield
,void
, ordiscard
novelties, nogcsafe
, novar
orclosure
limits, etc. - Has both a higher upper-bound to performance and less apparent jitter despite using the same amount of CPU.
- Requires compilation of twice as many lines of Nim (versus threads) and is ~4-5x slower to do so.
- Has lower memory usage here, though the
typed
implementation of CPS has fewer allocations still. In fact, more optimization is possible but we currently favor supporting fewer continuation types instead -- laziness, really. - Should be comparable in terms of readability; something like this example with a proc macro:
import cpstype Payload=tuple[a, b:int]proc cpsFunc(interval: Payload) {.cps.}=for iin interval.a.. interval.b: echo ifor iin0..3: cpsFunc (i*10, i*10+5)
or this version with a callsite pragma:
import cpstype Payload=tuple[a, b:int]proc cpsFunc(interval: Payload)=for iin interval.a.. interval.b: echo ifor iin0..3: cpsFunc((i*10, i*10+5)) {.cps.}
or this version using an effect:
import cpstype Payload=tuple[a, b:int]proc cpsFunc(interval: Payload) {.tags: [Continue].}=for iin interval.a.. interval.b: echo ifor iin0..3: cpsFunc (i*10, i*10+5)
...which brings me to...
Composition
threads:on
I can't say that thread support under async/await isarcane, exactly, but
it's clearly not a natural extension of the design the way it is with CPS.
color:on
If you aren't familiar with function color, you'll enjoy the following:
https://journal.stuffwithstuff.com/2015/02/01/what-color-is-your-function/
leaks:on
Look, let's just assume for the sake of argument that async/await is leak-free
under ORC; though by the time you read this, there may be another leak. Or
maybe there isn't. It doesn't matter.I don't care, and neither should you.
gc:orc
Except that the position that "it's ORC's job to be leak-free under async" is
silly. We don't want to have to choose between deploying new change-the-world
features that improve virtually every aspect of the language... and supporting
a concurrency implementation (even if the code were good).
We want everything to work, all the time, everywhere, right? Right?
asyncInside™
This async implementation, which is either stable (except for bug fixes) or
stagnant ("feature-complete"), depending on who you ask, should not be the
gatekeeper for compiler development. This is an argumentagainst tying the
implementation into the compiler. In fact, this tie was broken withchronos
.
asyncOutside™
So, yeah, now the community has to support two different async implementations
which largely suffer the same flaws, are incompatible in subtle ways, and of
which neither are reliable under the best memory manager for Nim.
except:
Exceptions are handled in separate and unequal ways...
- threads: Register a
handler = proc()
callback. - CPS: Catch them in your dispatcher, what else?
- async/await: I mean, I can't even... Maybe someone else wants to try?
What CPSis
CPS is research-grade; I would say it's above alpha-quality.
CPS is pure Nim. It probably compiles as slowly as it does because it uses
concepts, which allow you to specify the details of your continuation
implementation and those of your dispatcher. Everything is exposed to you, and
the deep access you have allows for novel forms of control flow, concurrency,
scope, and abstraction.
What CPSisnot
I'm coming up on the 2 year anniversary of my async/await divorce 💔:
https://irclogs.nim-lang.org/29-04-2019.html#16:31:55
CPS isnotCSP (communicating sequential processes), but it starts to
give us the bones with which we might build such an elegant beast.
To wit, I think I'm prepared to help with aCSP implementation to rival that
of Go's channels and Clojure's async.core, but as@zevv likes to point out,
it'd be a lot easier to implementCSP if we had a solidCPS working.
git blame
Threads
I'm really happy with threads. They work, they are cheap from a cognitive
perspective, the memory semantics are clear, and they are easier to use in
Nim than I expected. I regret not using threads for all my concurrent code.
The implementation is simple, it seems pretty stable, and threads will clearly
need to compose with future concurrency implementations for both hardware and
software reasons.
Async/Await
As@Araq says,"I don't care."
This is not the Nim concurrency story I want to tell to newcomers, and I think
if you're honest, it's not the one you want to share, either. Never has been.
It does not do justice to the elegance of the language, its platform support,
or its performance. It doesn't even have useful documentation.
It's kind of like the Porsche gearbox -- charming or even cute, from a
technical perspective, but to be pragmatic: easily the worst part of the car.
CPS
- @Araq shared a paper.
- @zevv implemented it.
- @Araq asked@disruptek to work on it.
If it makes you feel better, you can blame me for believing CPS is a
game-changer for Nim and you can blame me for criticizing async/await. Please
first spend a couple years helping (or even just watching) innumerable people
using software that is an objectively poor choice from the perspective of
performance, documentation, simplicity, and compromise. Now you can also blame
me for sending too many people down the wrong path.
Action Items
I've tried to convey a tone of authority but don't take it too seriously --
that's just how I talk about facts. 😉
I know how we implemented CPS (and in particular, what sucks about it) but I
really don't know much about how it should be exploited. I've tried to capture
some ideas in the repository, but@zevv has done most of the design work and
has a better grasp of the sorts of novel behaviors that CPS enables.
Most of my experience with coroutines is limited to Python, where they were
reliably a huge PITA. Go was great. Anyway, please take a look at the code:
https://github.com/disruptek/cps
Look at the examples (instash
). Look at the tests. Read the papers (the1011
one is a thriller, I promise). Get excited about this: it's not science
fiction; it already works, today, and it can work even better tomorrow.
You are going to invent new ideas on how we can exploit this style of
programming. You are going to see problems with our work (especially@zevv's)
that you can report or fix. You may even be motivated to improve async/await.
All to the good.
You are going to complain about design choices and demand that they are fixed.
Bring it on. This software is ready to be taken further while there are no
users to worry about breaking.@zevv has even promised to do some typing for
you. At least, I think that's what he said. 🤔
I believe@Araq would prefer that CPS was implemented in the compiler. I don't
know if he is motivated by the fear that it will be stolen if it is exposed in
a library, or whether he simply wants the implementation to be unfettered by
limitations in macro spec or compilation bugs. I personally think these are
valid concerns, and of course there are probably other good reasons to target
the compiler directly, as well.
Problem is, more users are intimidated by compiler code, and they have every
justification to feel this way. Most of the compiler is undocumented and new
code rarely receives comments. It's hard to know where to begin and embedding
CPS in the compiler will cause it to require at least some compiler expertise
to service -- a commodity in poor supply.
Limiting CPS to the compiler may also make it less accessible to users of
earlier versions of Nim. It may even fracture CPS support across different
versions of the compiler. If you agree that CPS should not be implemented in
the compiler, please give the RFC a 👎.
While I feel CPS is quite powerful, I'd like thetyped
implementation to
work, as it promises a much better experience. I believe this is a practical
prerequisite of future work, so building a similarly powerful abstraction for
CSP is blocked until we choose the next step forCPS.