Jwno

Working With Janet's Threads

Jwno »Devlog

219 days ago byAgent Kilo

Share this post:

Share on Bluesky Share on Twitter Share on Facebook

This post is about some internal workings of Janet and Jwno, especially how threads work and communicate. You may as well take it as me mumbling to myself. I try to explain things, but you may still need some basic knowledge about Janet and system programming to go through it.

I had never thought about how a virtual-machine-based and garbage-collected programming language manages its threads and memory, until I started playing with Janet, so what I wrote here may be wrong, and I’d be grateful if someone can help me point out the mistakes.

And obligatorily:

A Tale of Two Event Loops

Jwno is a Windows™ application, naturally it has awindow message event loop. It’s crucial for most GUI-related stuff.

And Jwno is an application written in Janet. Janet also has its ownevent loop built-in to the core. Some of Janet’s cool features (e.g. fibers) depend on it.

In my limited programming experience, when two event loops clash, there’re generally these three ways to make them work together:

“Cascade” those two event loops. This way we can avoid multi-threading comletely, and most parts of the application can work together seemlessly. A good example would be cascaded epoll instances in Linux. Windows hasMsgWaitForMultipleObjects to combine the window message event loop and other I/O events, while Janet usesI/O Completion Ports at the heart of its event loop. There may be a way to connect these two and fold them into one thread. While this looks like a fun research topic, I’m not sure I know enough about Windows APIs and Janet internals to take this path.
Use a polling architecture. This can also be done in a single thread. We just ask our event sources, one by one, for new events, then sleep for a while, and repeat. The window message event loop and Janet’s event loop can both work in a “single step” mode, so this could work, but adopting it in an interactive application like Jwno just feels… wrong.
Use threads to isolate them loops. To me this idea sounds boring and dangerous at the same time, but I know for sure it works, since it’s what Jwno is currently using. There’re details on managing threads and communicating between them, we’ll get to those in the remaining parts of this post.

Janet’s Threading Model

Simply put, Janet creates a separate instance of virtual machine for every thread it spawns. A VM instance has its own heap and garbage collector, so the garbage collection also mostly works on a per-thread basis. Unlike system threads that share the same process memmory address space, Janet threads by default can’t access data that’s managed by another Janet thread, since the data may suddenly get vaporized by the other garbage collector.

But consider this simple function:

(defn test-thread-isolation []  # thread #1  (def facts @{:janet-is-cool true})  (ev/do-thread    # thread #2    (put facts :ak-is-cool true)    (printf "From thread #2: %n" facts))  (printf "From thread #1: %n" facts))

And, spoiler alert, the function’s output:

From thread #2: @{:ak-is-cool true :janet-is-cool true}From thread #1: @{:janet-is-cool true}

The code insideev/do-thread (thread #2) sees:janet-is-cool, but the code outside (thread #1) can’t see:ak-is-cool.

Did thread #2 violate the “no access to other threads’ stuff” rule? Not really. The fact is, thread #2 was just modifying acopy of the original variable, and that’s why thread #1 can’t see the modification.ev/do-thread and friends carry out a complex ritual when spawning a new thread, called“marshalling”, to transparently copy data between threads, so that we can use closures seemlessly across thread boundaries.

The Ritual To Spin Up a Thread

To create a new thread, Janet does roughly these things:

Marshal (pack up) the actual code to be run in the new thread, together with its environment, including any variables captured in its closure.
Create a new system thread, and send it a buffer where the marshaled data resides.
Initialize a new VM instance in the new thread.
In the new thread, unmarshal (unpack) everything it received, and start running the freshly unmarshaled code.

After marshalling, the data looks like a network packet. The “packet” is independent of any thread, so can be moved around safely. When the receiving thread unmarshals the data, it takes the unmarshaled objects under its own management.

This whole process essentially copies Janet objects (both code and data) between threads. An object before marshalling isdistinct from its counterpart after unmarshalling. We can simulate this process in the REPL:

repl:28:> (def a @[1 2])@[1 2]repl:29:> (def buf (marshal a))@"\xD1\x02\x01\x02"repl:30:> (def aa (unmarshal buf))@[1 2]repl:31:> (= a aa)false

(Ab)normal Cross-Thread Communication

After the spin-up ritual, normal Janet threads usually use(ev/thread-chan) (threaded channel) objects to communicate. You put stuff in the channel from one thread, and then take stuff out of the channel from another thread. The channel does the (un)marshalling transparently. This is the happy path and you can go through theofficial docs for more info, so I won’t elaborate here.

I took the not-so-happy path because, I soon hit a blocker when trying to use channels in my application: One of my threads isnot anormal Janet thread. The original idea was toisolate two event loops, so there’s no event loop for Janet in my window-message-processing thread. It can’t run channel-related code, since that code depends on Janet’s event loop.

Then after some research on Janet’s internal rituals, I came up with this… abomination calledalloc-and-marshal, and its ocunterpartunmarshal-and-free.

Here’s how they work together:

In one thread,alloc-and-marshal allocates a buffer that’snot managed by the garbage collector, and saves the marshaled data in it.
The raw pointer pointing to that buffer gets sent to the other thread.
Then in the other thread,unmarshal-and-free unmarshals the data, and frees the buffer.

So the marshaled data goes rogue with the unmanaged buffer for a little while, until it gets handled by the receiving thread. If the receiving end crashed when the data is still in-flight, the data will be lost in the void. But then I’ll have problems more serious than memory leaks, so this works quite well in practice.

A Suprising Behavior of Unmarshalling

The marshaled data looks like a network packet, so I (incorrectly) assumed that marshalling works just like packet serialization. Jwno uses Win32 UI Automation event handlers, and they run in an system-controlled thread pool. A handler function can get called in different worker threads, so at one point I did things like this:

Save my marshaled handler function to a buffer.
When a worker thread needs to call the handler function, unmarshal it from the buffer.
Whenanother worker thread needs to call the same handler function, unmarshal it from thesame buffer again.

Unfortunately this caused random crashes.

After some hair-pulling investigation, it turned out the culprit was the act of unmarshalling a marshaledthreaded abstract object multiple times.

Janet has so-calledthreaded abstract types. Objects of these types are reference-counted, and different threads can hold references to the same in-memory threaded abstract object. To keep a threaded abstract object alive in-flight, the marshalling codeincrements its ref-count, then the unmarshalling code may accordinglydecrement the ref-count.

So unlike deserializing a network packet, which is usually free of side effects, unmarshalling the same marshaled data multiple timesmay destroy objects that are still in use, causing crashes.

Customizing How Something Gets Marshaled

Jwno uses Windows’low-level keyboard hook to intercept global key bindings, so it has a rather convoluted keymap system, to adapt to the requirements of using that hook.

Here’s roughly how Jwno handles keyboard events:

The main thread sends the whole keymap (which contains bound commands), by marshalling it, to the thread responsible for keymap handling.
When keyboard events arrive at the keymap thread, it tries to match a key binding in the keymap.
If there’s a match, the keymap thread sends the corresponding command back to the main thread, by marshalling itagain.
The main thread receives the command and runs it.

Note that commands getmarshaled twice in this process, and they may contain user-defined functions to run custom code. After marshalling (and unmarshalling), the command coming back to the main thread is adistinct copy of the original command, and so are all the variables that have been captured by the function closures it may contain. This imposes a limitation on these user-defined functions:They cannot access mutable states outside their own scopes.

For example, suppose we have this code:

(var my-flag false)(defn my-custom-action [&]  (if my-flag    :do-this    # else    :do-that))(:define-key root-keymap "Win + S" [:split-frame :horizontal nil nil my-custom-action])(:set-keymap (in jwno/context :key-manager) root-keymap)

And later, in the main thread, we try to alter the behavior ofmy-custom-action:

(set my-flag true)

This gives a surprising result: If triggered by the key binding,my-custom-action in the:split-frame command willalways run the:do-that branch. But if we callmy-custom-action directly in the main thread, it swithes to the:do-this branch correctly.

I did find a solution to this recently, and it turned out Janet already has good support for it: Exclude functions altogether when marshalling a keymap. All we need to do is to pass a “reverse-lookup table” tojanet_marshal, telling the marshalling code to replace function objects with our placeholders.

We can simulate this “customized” marshalling in the REPL too:

repl:1:> (var my-flag false)falserepl:2:> (defn my-fn [] (if my-flag :do-this :do-that))<function my-fn>repl:3:> (def rlookup @{my-fn 'my-placeholder})@{<function my-fn> my-placeholder}repl:4:> (def lookup (invert rlookup))@{my-placeholder <function my-fn>}repl:5:> (def dummy-lookup @{'my-placeholder 'my-placeholder})@{my-placeholder my-placeholder}

We havemy-fn depending onmy-flag to do its work, along with some lookup tables. Now we want to send an array containingmy-fn to another thread, withoutmy-fn tagging along, so we do the marshalling like this:

repl:6:> (def buf (marshal @[my-fn :other-info] rlookup))@"\xD1\x02\xD8\x0Emy-placeholder\xD0\nother-info"

Then we sendbothbuf anddummy-lookup to the other thread. If the other thread needs to access the array, it can still unmarshal the data usingdummy-lookup:

repl:7:> (def arr (unmarshal buf dummy-lookup))@[my-placeholder :other-info]

Notice howmy-fn turned intomy-placeholder. The other thread can also usedummy-lookup to marshal and send a block of data back to the main thread:

repl:8:> (array/push arr :more-info)@[my-placeholder :other-info :more-info]repl:9:> (def buf2 (marshal arr dummy-lookup))@"\xD1\x03\xD8\x0Emy-placeholder\xD0\nother-info\xD0\tmore-info"

And when the main thread useslookup (instead ofdummy-lookup) to unmarshal the data that came back, it can “restore”my-fn:

repl:10:> (def arr2 (unmarshal buf2 lookup))@[<function my-fn> :other-info :more-info]

Now we can verify that it’s indeed the original function, not a copy:

repl:11:> (= my-fn (first arr2))truerepl:12:> (set my-flag true)truerepl:13:> (apply (first arr2)):do-this

Some Conclusions

When building Jwno, I learned to be careful about these gotchas:

Spawning a new thread is quite a heavy operation, due to all the data copying. Instead of spawning threads ad-hoc, using fibers or a thread pool is usually better.
Sending mutable data structures across thread boundaries often leads to surprising behavior.
It’s dangerous to unmarshal the same buffer more than once (When usingJANET_MARSHAL_UNSAFE).

But I think Janet’s threads are generally quite nice to work with. The high-level APIs are concise, and the low-level C APIs have enough “escape hatches”, that I can use to realize my crazy ideas. The Janet people really did a great job landing an elegant design.

And thanks for reading through this long post, you’re really tolerant of my mumbling 😄.