nim-lang/RFCsPublic

NotificationsYou must be signed in to change notification settings
Fork23
Star137

next steps for CPS #295

New issue

Open

next steps for CPS#295

Description

disruptek

opened

on Dec 3, 2020

It's an open secret that I've some criticisms of async/await:

Fear of Threads

From the perspective of a newcomer who is looking to transfer existing
knowledge or code from another language to Nim, or someone who has zero
experience with concurrency in any language, I recommend threads.

Here follows the first example from thethreads.nim documentation. I tweaked
the syntax a touch --it was written in 2011 -- but that is all.

import lockstype Payload=tuple[a, b:int]var  thr:array[4, Thread[Payload]]  L: Lockproc threadFunc(interval: Payload) {.thread.}=for iin interval.a.. interval.b:    withLock L:      echo iinitLock Lfor iin0.. thr.high:  createThread(thr[i], threadFunc, (i*10, i*10+5))joinThreads thrdeinitLock L

Here's the compilation output:

$ nim c --gc:arc --define:danger --threads:on -f th.nim.....CC: stdlib_assertions.nimCC: stdlib_io.nimCC: stdlib_system.nimCC: stdlib_locks.nimCC: th.nimHint: 22456 lines; 0.121s; 20.68MiB peakmem; Dangerous Release build; proj: /home/adavidoff/th.nim; out: /home/adavidoff/th [SuccessX]

Here's the codegen:

Language                     files          blank        comment           code-------------------------------------------------------------------------------C                                5             57             54           3733

The resulting binary is 52,472 bytes in size and links against:

-rwxr-xr-x. 1 root root 147232 Nov  2 22:20 /lib64/libpthread-2.32.so

Here's an invocation:

0.00user 0.00system 0:00.00elapsed 146%CPU (0avgtext+0avgdata 1592maxresident)k0inputs+0outputs (0major+103minor)pagefaults 0swaps

A quick run of golden:

┌────────┬──────────┬──────────┬──────────┬──────────┐│ Runs   │ Min      │ Max      │ Mean     │ StdDev   │├────────┼──────────┼──────────┼──────────┼──────────┤│  10000 │ 0.000266 │ 0.009509 │ 0.000391 │ 0.000120 │└────────┴──────────┴──────────┴──────────┴──────────┘

And a valgrind, which probably means nothing.

==1915== HEAP SUMMARY:==1915==     in use at exit: 0 bytes in 0 blocks==1915==   total heap usage: 33 allocs, 33 frees, 3,164 bytes allocated==1915== ==1915== All heap blocks were freed -- no leaks are possible

Macro Complexity

or, if you think you're smart,

Poor Documentation

This is my port of the above program to async/await:

import asyncdispatchtype Payload=tuple[a, b:int]var  thr:array[4, Future[void]]proc asyncFunc(interval: Payload) {.async.}=for iin interval.a.. interval.b:    echo ifor iin0.. thr.high:  thr[i]= asyncFunc (i*10, i*10+5)await all(thr)

Here's the compilation output:

$ nim c --gc:arc --define:danger -f aw.nim..................................../home/adavidoff/aw.nim(15, 7) template/generic instantiation of `await` from here/home/adavidoff/nims/lib/pure/asyncmacro.nim(144, 3) Error: 'yield' only allowed in an iterator

Oops. I didn'timport asyncmacro and I never usedyield. Curious.

A quick look at the documentation forasyncdispatch yields a single
occurrence of "asyncmacro"; in fact, it's the last sentence of the
documentation, under a section helpfully titled "Limitations/Bugs":
asyncdispatch module depends on theasyncmacro module to work properly.

I have no idea what this means. As far as I can tell, theasyncdispatch
module depends on theasyncmacro for compilation failures. 😁

But I know the problem must be inawait, right? There's no documentation for
theawait template and the documentation that does exist for usingawait is
remarkably unhelpful.

Take a critical read of the documentation forasyncdispatch some time.
It's worthy of another long thread, so to speak, because it's pretty hard to
understand.

Thankfully, there are experienced users on IRC that point out that I should use
waitFor instead ofawait. I remember reading aboutwaitFor but since it
merely said itblocks the currentthread, I figured it wasn't what I wanted
at all. I feel stupid, now.

But at least I can compile my program:

$ nim c --gc:arc --define:danger -f aw.nim/home/adavidoff/nims/lib/pure/asyncdispatch.nim(1286, 39) Hint: passing 'adata.readList' to a sink parameter introduces an implicit copy; if possible, rearrange your program's control flow to prevent it [Performance]CC: stdlib_assertions.nimCC: stdlib_io.nimCC: stdlib_system.nimCC: stdlib_math.nimCC: stdlib_strutils.nimCC: stdlib_options.nimCC: stdlib_times.nimCC: stdlib_os.nimCC: stdlib_heapqueue.nimCC: stdlib_deques.nimCC: stdlib_asyncfutures.nimCC: stdlib_monotimes.nimCC: stdlib_nativesockets.nimCC: stdlib_selectors.nimCC: stdlib_asyncdispatch.nimCC: aw.nimHint: 56742 lines; 0.558s; 76.172MiB peakmem; Dangerous Release build; proj: /home/adavidoff/aw.nim; out: /home/adavidoff/aw [SuccessX]

Here's the codegen:

Language                     files          blank        comment           code-------------------------------------------------------------------------------C                               16            236            182          10840

The resulting binary is 125,704 in size and links against:

-rwxr-xr-x. 1 root root  147232 Nov  2 22:20 /lib64/libpthread-2.32.so-rwxr-xr-x. 1 root root 1263504 Nov  2 22:21 /lib64/libm-2.32.so-rwxr-xr-x. 1 root root   35080 Nov  2 22:21 /lib64/librt-2.32.so

Here's an invocation:

0.00user 0.00system 0:00.00elapsed 93%CPU (0avgtext+0avgdata 1948maxresident)k0inputs+0outputs (0major+99minor)pagefaults 0swaps

A quick run of golden:

┌────────┬──────────┬──────────┬──────────┬──────────┐│ Runs   │ Min      │ Max      │ Mean     │ StdDev   │├────────┼──────────┼──────────┼──────────┼──────────┤│  10000 │ 0.000242 │ 0.013270 │ 0.000323 │ 0.000190 │└────────┴──────────┴──────────┴──────────┴──────────┘

And a valgrind, which probably means nothing.

==653== HEAP SUMMARY:==653==     in use at exit: 736 bytes in 12 blocks==653==   total heap usage: 43 allocs, 31 frees, 3,056 bytes allocated==653== ==653== LEAK SUMMARY:==653==    definitely lost: 256 bytes in 4 blocks==653==    indirectly lost: 192 bytes in 4 blocks==653==      possibly lost: 288 bytes in 4 blocks==653==    still reachable: 0 bytes in 0 blocks==653==         suppressed: 0 bytes in 0 blocks==653== Rerun with --leak-check=full to see details of leaked memory

CPS

This example uses a so-calleduntyped implementation from August; it's a bit
more verbose because it demonstrates the machinery explicitly, in case you are
new to continuation passing style.

The newertyped implementation is better, in terms of both syntax and
semantics (read: performance and performance), but it is currently blocked by
at least one compiler bug.

import cpstype  Payload=tuple[a, b:int]  C=refobjectof RootObj# a continuation    fn:proc(c: C): C {.nimcall.}proc dispatch(c: C)=# a dispatchervar c= cwhile c!=niland c.fn!=nil:    c= c.fn(c)proc cpsEcho(i:int): C {.cpsMagic.}=# i/o  echo ireturn cproc cpsFunc(interval: Payload) {.cps: C.}=var i:int= interval.awhile i<= interval.b:    cps cpsEcho(i)    inc ifor iin0..3:  dispatch cpsFunc((i*10, i*10+5))

Here's the compilation output:

$ nim c -r --define:danger --gc:arc -f cp.nim# ...lots of debugging output omitted...CC: stdlib_io.nimCC: stdlib_system.nimCC: cp.nimHint: 40002 lines; 0.455s; 61.277MiB peakmem; Dangerous Release build; proj: /home/adavidoff/git/cps/cp.nim; out: /home/adavidoff/git/cps/cp [SuccessX]

Here's the codegen:

Language                     files          blank        comment           code-------------------------------------------------------------------------------C                                3             38             35           3106

The resulting binary is 46,000 bytes in size and links against no novel
libraries. Here's an invocation:

0.00user 0.00system 0:00.00elapsed 94%CPU (0avgtext+0avgdata 1772maxresident)k0inputs+0outputs (0major+87minor)pagefaults 0swaps

A quick run of golden:

┌────────┬──────────┬──────────┬──────────┬──────────┐│ Runs   │ Min      │ Max      │ Mean     │ StdDev   │├────────┼──────────┼──────────┼──────────┼──────────┤│  10000 │ 0.000077 │ 0.011548 │ 0.000267 │ 0.000145 │└────────┴──────────┴──────────┴──────────┴──────────┘

And a valgrind, which probably means nothing.

==32426== HEAP SUMMARY:==32426==     in use at exit: 0 bytes in 0 blocks==32426==   total heap usage: 29 allocs, 29 frees, 2,200 bytes allocated==32426== ==32426== All heap blocks were freed -- no leaks are possible

Performance

Obviously, only the threaded version is actually asynchronous. 😁

And, I mean, sure, this isn't a real program. So why is it so large?

The async/await version generated ~3x more code.
The async/await version's binary is 2.5x the size.
The threaded version is slower despite using 50% more CPU.

And CPS...

Has no platform-specific code, no extra libs, and the least C output.
Has no lock, noyield,void, ordiscard novelties, nogcsafe, novar orclosure limits, etc.
Has both a higher upper-bound to performance and less apparent jitter despite using the same amount of CPU.
Requires compilation of twice as many lines of Nim (versus threads) and is ~4-5x slower to do so.
Has lower memory usage here, though thetyped implementation of CPS has fewer allocations still. In fact, more optimization is possible but we currently favor supporting fewer continuation types instead -- laziness, really.
Should be comparable in terms of readability; something like this example with a proc macro:

import cpstype Payload=tuple[a, b:int]proc cpsFunc(interval: Payload) {.cps.}=for iin interval.a.. interval.b:    echo ifor iin0..3:  cpsFunc (i*10, i*10+5)

or this version with a callsite pragma:

import cpstype Payload=tuple[a, b:int]proc cpsFunc(interval: Payload)=for iin interval.a.. interval.b:    echo ifor iin0..3:  cpsFunc((i*10, i*10+5)) {.cps.}

or this version using an effect:

import cpstype Payload=tuple[a, b:int]proc cpsFunc(interval: Payload) {.tags: [Continue].}=for iin interval.a.. interval.b:    echo ifor iin0..3:  cpsFunc (i*10, i*10+5)

...which brings me to...

Composition

threads:on

I can't say that thread support under async/await isarcane, exactly, but
it's clearly not a natural extension of the design the way it is with CPS.

color:on

If you aren't familiar with function color, you'll enjoy the following:

https://journal.stuffwithstuff.com/2015/02/01/what-color-is-your-function/

leaks:on

Look, let's just assume for the sake of argument that async/await is leak-free
under ORC; though by the time you read this, there may be another leak. Or
maybe there isn't. It doesn't matter.I don't care, and neither should you.

gc:orc

Except that the position that "it's ORC's job to be leak-free under async" is
silly. We don't want to have to choose between deploying new change-the-world
features that improve virtually every aspect of the language... and supporting
a concurrency implementation (even if the code were good).

We want everything to work, all the time, everywhere, right? Right?

asyncInside™

This async implementation, which is either stable (except for bug fixes) or
stagnant ("feature-complete"), depending on who you ask, should not be the
gatekeeper for compiler development. This is an argumentagainst tying the
implementation into the compiler. In fact, this tie was broken withchronos.

asyncOutside™

So, yeah, now the community has to support two different async implementations
which largely suffer the same flaws, are incompatible in subtle ways, and of
which neither are reliable under the best memory manager for Nim.

except:

Exceptions are handled in separate and unequal ways...

threads: Register ahandler = proc() callback.
CPS: Catch them in your dispatcher, what else?
async/await: I mean, I can't even... Maybe someone else wants to try?

What CPS`is`

CPS is research-grade; I would say it's above alpha-quality.

CPS is pure Nim. It probably compiles as slowly as it does because it uses
concepts, which allow you to specify the details of your continuation
implementation and those of your dispatcher. Everything is exposed to you, and
the deep access you have allows for novel forms of control flow, concurrency,
scope, and abstraction.

What CPS`isnot`

I'm coming up on the 2 year anniversary of my async/await divorce 💔:
https://irclogs.nim-lang.org/29-04-2019.html#16:31:55

CPS isnotCSP (communicating sequential processes), but it starts to
give us the bones with which we might build such an elegant beast.

To wit, I think I'm prepared to help with aCSP implementation to rival that
of Go's channels and Clojure's async.core, but as@zevv likes to point out,
it'd be a lot easier to implementCSP if we had a solidCPS working.

git blame

Threads

I'm really happy with threads. They work, they are cheap from a cognitive
perspective, the memory semantics are clear, and they are easier to use in
Nim than I expected. I regret not using threads for all my concurrent code.

The implementation is simple, it seems pretty stable, and threads will clearly
need to compose with future concurrency implementations for both hardware and
software reasons.

Async/Await

As@Araq says,"I don't care."

This is not the Nim concurrency story I want to tell to newcomers, and I think
if you're honest, it's not the one you want to share, either. Never has been.
It does not do justice to the elegance of the language, its platform support,
or its performance. It doesn't even have useful documentation.

It's kind of like the Porsche gearbox -- charming or even cute, from a
technical perspective, but to be pragmatic: easily the worst part of the car.

CPS

@Araq shared a paper.
@zevv implemented it.
@Araq asked@disruptek to work on it.

If it makes you feel better, you can blame me for believing CPS is a
game-changer for Nim and you can blame me for criticizing async/await. Please
first spend a couple years helping (or even just watching) innumerable people
using software that is an objectively poor choice from the perspective of
performance, documentation, simplicity, and compromise. Now you can also blame
me for sending too many people down the wrong path.

Action Items

I've tried to convey a tone of authority but don't take it too seriously --
that's just how I talk about facts. 😉

I know how we implemented CPS (and in particular, what sucks about it) but I
really don't know much about how it should be exploited. I've tried to capture
some ideas in the repository, but@zevv has done most of the design work and
has a better grasp of the sorts of novel behaviors that CPS enables.

Most of my experience with coroutines is limited to Python, where they were
reliably a huge PITA. Go was great. Anyway, please take a look at the code:

https://github.com/disruptek/cps

Look at the examples (instash). Look at the tests. Read the papers (the
1011 one is a thriller, I promise). Get excited about this: it's not science
fiction; it already works, today, and it can work even better tomorrow.

You are going to invent new ideas on how we can exploit this style of
programming. You are going to see problems with our work (especially@zevv's)
that you can report or fix. You may even be motivated to improve async/await.
All to the good.

You are going to complain about design choices and demand that they are fixed.
Bring it on. This software is ready to be taken further while there are no
users to worry about breaking.@zevv has even promised to do some typing for
you. At least, I think that's what he said. 🤔

I believe@Araq would prefer that CPS was implemented in the compiler. I don't
know if he is motivated by the fear that it will be stolen if it is exposed in
a library, or whether he simply wants the implementation to be unfettered by
limitations in macro spec or compilation bugs. I personally think these are
valid concerns, and of course there are probably other good reasons to target
the compiler directly, as well.

Problem is, more users are intimidated by compiler code, and they have every
justification to feel this way. Most of the compiler is undocumented and new
code rarely receives comments. It's hard to know where to begin and embedding
CPS in the compiler will cause it to require at least some compiler expertise
to service -- a commodity in poor supply.

Limiting CPS to the compiler may also make it less accessible to users of
earlier versions of Nim. It may even fracture CPS support across different
versions of the compiler. If you agree that CPS should not be implemented in
the compiler, please give the RFC a 👎.

While I feel CPS is quite powerful, I'd like thetyped implementation to
work, as it promises a much better experience. I believe this is a practical
prerequisite of future work, so building a similarly powerful abstraction for
CSP is blocked until we choose the next step forCPS.

Metadata

Assignees

No one assigned

Labels

No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

next steps for CPS #295

Description

Fear of Threads

Macro Complexity

Poor Documentation

CPS

Performance

Composition

threads:on

color:on

leaks:on

gc:orc

asyncInside™

asyncOutside™

except:

What CPS`is`

What CPS`isnot`

git blame

Threads

Async/Await

CPS

Action Items

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Movatterモバイル変換

next steps for CPS #295

Description

Fear of Threads

Macro Complexity

Poor Documentation

CPS

Performance

Composition

threads:on

color:on

leaks:on

gc:orc

asyncInside™

asyncOutside™

except:

What CPSis

What CPSisnot

git blame

Threads

Async/Await

CPS

Action Items

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

What CPS`is`

What CPS`isnot`