pyscript/pyscriptPublic

NotificationsYou must be signed in to change notification settings
Fork1.5k
Star18.4k

Deep Dive: Our workers' based synchronous stack#2317

WebReflection started this conversation inShow and tell

WebReflection

Mar 17, 2025

· 0 comments

Return to top

Discussion options

WebReflection
Mar 17, 2025
Maintainer

Have you ever wondered how is it possible that we don't block the main thread (the visited page) while executing remote code that can synchronously read or write anywindow reference? ... well, you are about to find it out, at least on a high level 👍

Worker, Proxy, SharedArrayBuffer & Atomics

These are the ingredients to make it all happenand the reason for intensive UI tasks, things might feel not as fast as it is when everything just works on the main thread. The current issue with the latest approach though, is that if your code has awhile True: loop, if your code takes long time to bootstrap, or your code asks for aninput, there's not much we can do if not indirectly crashing the browser or its tab, block entirely interactivity or "stop the world" for a windowprompt that must be answered or dismissed.

The solution to all this is to useworkers, which operate in a separate thread, being that ahyper one or a whole differentCPU core.

The roundtrip's Tango

In a worker, we type, as example,window.location.href to retrieve the current page:

thewindow Proxy intercepts the need to access itslocation field
it creates aSharedArrayBuffer with 8 bytes (room for 2int32 values, one to notify and one to provide the resulting length of the outcome)
itpostMessage to the main thread, attaching what the proxy needs to know as details, suchSharedArrayBuffer, then it waits synchronously for anotify operations (viaAtomics.notify)
themain applies those Proxy details to itswindow and creates a unique identifier of thatlocation object as result
- it serializes as binary the resulting "JSON" representation of that outcome (note: this is not the object, rather an abstraction of that object that can be reused later on from theworker as known identity)
- it stores the length of such binary content asint32 atindex1 of theint32 array in charge of viewing theSharedArrayBuffer
- it notifies atindex0 with a positive integer that there is alength known to grab content
theworker unlocks itself onnotify, it reads thelength of the binary result and, until now, itpostMessage a newSharedArrayBuffer with enough room to store that binary data + 4 bytes to wait for the next notification
themain recognizes the follow up requests
- it sets the previously serialized as binary result into theSharedArrayBuffer, starting atbyteOffset4
- itnotifies at index0 that the operation has been completed
theworker unlocks itself and then:
- it grabs frombyteOffset4 to4 + length previously communicated the binary content that was returned
- itparses orunserialize such binary content asJS reference/value where, if such reference is not primitive, likeobjects orarrays orfunctions, it maps that reference once asProxy, so that this whole dance can be performed again withlocation.href

At the end of this convoluted orchestration we'll have thatstring value representing the current locationhref ... is this madness?

Well, somehow ... yes, but this dance is basically the samePyodide orMicroPython or anyFFI usually does: things are mapped bi-directionally, hooks in theGarbage Collector are created to avoid caching too heavily all possible references, nothing is usually strongly referenced because these programming languages arestrongly dynamic.

... but how can alocation.href change over time on the same page?

That's a lovely question and the simple answer is that we don't really knowahead of time what users are asking for, we can only guarantee that whatever they have asked provided a meaningful, and as fast as it can be, result.

Our polyfills' role

TheSharedArrayBuffer primitive has, unfortunately, historic reasons to not be always available unless special headers, provided by the server or a specialService Worker, are in place, and we orchestrated 2 variants that solve the issue, in a way or another, but both variants add some low to high overhead:

theService Worker variant is to enableSharedArrayBuffer by simulating correct headers for any request:
- it has the least overhead, it requires a tiny extra synchronousscript on the page,see the project's page for extra details
- it plays almost natively well, yet it needs to intercepts all network calls and that's slightly slower than native network requests with correct headers in place
theSharedArrayBuffer Always On variant, whichallows async interactions but it requires still aService Worker to grantsynchronous interactions:
- differently frommini-coi, it doesn't need to change all network requests' headers to work
- sadly, compared tomini-coi, thesynchronous interaction is 2X up to 10X slower per each single synchronous Proxy operation

When neither variants are fully available, you can see an error inPyScriptdevtools that states:

⚠️ unable to usewindow anddocument

This does not mean thatPyScript won't work though, it means that the whole magic provided bySharedArrayBuffer andAtomics.wait cannot practically be used so it's not possible to simulate synchronous code at that point.

There is still room for improvements!

Nowadays, bothArrayBuffer andSharedArrayBuffer instances cangrow in size overtime, this wasn't true 2.5 years ago when we first sketched this wholeworker/main dance.

On top of that, there are better primitives to deal with binary data, such asDataView, that helps to avoid duplicating the amount ofRAM needed to serialize or deserialize, whereTypedArrays asviews do a wonderful job at being thin abstract layers to deal with, dropping any unnecessary bloat.

So let's see how our initial plan/dance can be improved now, keeping thewindow.location.href example as reference, from aworker:

theworker allocatesonce aSharedArrayBuffer that can grow up to a few megabytes but starts as tiny as possible (64K or something similar, the upper bound should rarely be reached or needed at all)
thewindow Proxy intercepts the need to access itslocation field
itpostMessage to the main thread, attaching what the proxy needs to know as details, always the sameSharedArrayBuffer, then it waits synchronously for anotify operations (viaAtomics.notify)
themain applies those Proxy details to itswindow and:
- it serializes as binary directly, frombyteOffset4, into theSharedArrayBuffer the result, keeping the ability togrow on demand behind the scene
- it notifies atindex0 with a positive integer that everything is ready
theworker unlocks itself and then it deserializes directly asJS value frombyteOffset4

Done 🥳 ... or better, there is no need anymore to:

dopostDance twice to have a length and a content
duplicate the amount ofRAM until thewindow can pass the serialized data to the secondSharedArrayBuffer
stringify and parse content out of serialized data ...

In few words, what required 7 steps (ask/serialize/binary/length -> retrieve/binary/deserialze) and 2X theRAM, could instead use 3 steps (ask/binary-serialization -> binary-deserialization), reducing complexity and code bloat too, while improving performance by quite some margin.

An extra detail ...

I need to figure out if using aSharedWorker to create auniqueSharedArrayBuffer that can be used across allmain and theirworkers would work, which goal is to have predictable amount ofRAM needed to orchestrate one to dozens tabs runningPyScript with one to many workers ... that requires alsoWeb Lock API but if it can be done forSQLite I believe it could be done for our project too ... still surfing the edge of modern APIs but hey, we just want the best by all means 😇

We are close but not there yet

If you have followed recent community calls, you are probably bored by me demoing and benchmarking all the things but here an update of our current state:

I have managed to create an ad-hoc binary serializer which goal is simplicity and performance
I am planning to fully integrate that incoincident, our Proxy orchestrator, so that we can cut a lot of unnecessary steps in between communications
I am planning to port that intoPolyscript, our interpreters engine used byPyScript to then ...
use this new simplified and (hopefully) faster stack inPyScript 🤩

Are there unknowns?

At the logical level, I expect performance to improve that's a natural consequence of removing intermediate steps and reduce by 2 the time it takes to ask and retrieve databut, on the other hand, I cannot measure concretely improvementsuntil this has been done ... I mean, I could hack something around but that won't still provide real-world improvements so maybe I should not focus on that.

Last, but not least, it's unclear if/how I canpolyfill this new stack insabayon too but I also expect that dance to be even faster because it won't need more than a single and synchronousXMLHttpRequest to retrieve data, as opposite of2 plus the whole roundtrip there done to figure out who asked for what (multiple tabs usingPyScript as example).

Conclusion

I am very glad I've managed to tackle all performance issues that were either hidden in implementations (current serializer we use at the moment) or logic (not updated usage of most modern APIs) to actually being able to "draw" on the board what is the current goal and I hope everything will work as planned and that it won't take too long to have a release that showcases performance are finally reasonable enough to stop using themain thread so please follow this space to know further progresses and don't be afraid to ask anything you'd like to ask about these topics 👋

You must be logged in to vote

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Deep Dive: Our workers' based synchronous stack#2317

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

WebReflection
Mar 17, 2025
Maintainer

Worker, Proxy, SharedArrayBuffer & Atomics

The roundtrip's Tango

Our polyfills' role

There is still room for improvements!

An extra detail ...

We are close but not there yet

Are there unknowns?

Conclusion

Replies: 0 comments

Select a reply

Uh oh!

Movatterモバイル変換

Deep Dive: Our workers' based synchronous stack#2317

Uh oh!

Uh oh!

WebReflectionMar 17, 2025 Maintainer

Worker, Proxy, SharedArrayBuffer & Atomics

The roundtrip's Tango

Our polyfills' role

There is still room for improvements!

An extra detail ...

We are close but not there yet

Are there unknowns?

Conclusion

Replies: 0 comments

Uh oh!

WebReflection
Mar 17, 2025
Maintainer