Movatterモバイル変換


[0]ホーム

URL:


Following system colour schemeSelected dark colour schemeSelected light colour scheme

Python Enhancement Proposals

PEP 776 – Emscripten Support

Author:
Hood Chatham <roberthoodchatham at gmail.com>
Sponsor:
Łukasz Langa <lukasz at python.org>
Discussions-To:
Discourse thread
Status:
Draft
Type:
Informational
Created:
18-Mar-2025
Python-Version:
3.14
Post-History:
18-Mar-2025,28-Mar-2025

Table of Contents

Abstract

Emscripten is a complete open source compilertoolchain. It compiles C/C++ code into WebAssembly/JavaScript executables, foruse in JavaScript runtimes, including browsers and Node.js. The Rust languagealso maintains an Emscripten target.

This PEP formalizes the addition of Tier 3 for Emscripten support in Python 3.14whichwas approved by the Steering Council on October 25, 2024. The goals are:

  1. To describe the current state of the CPython Emscripten runtime
  2. To describe the current state of the Pyodide runtime
  3. To identify minor features to be upstreamed from the Pyodide runtime into theCPython Emscripten runtime

The minor features identified here are all features that could be implementedwithout a PEP. We discuss more significant runtime features that we would liketo implement but we defer decisions on those features to subsequent PEPs.

Motivation

A web browser is a universal computing platform, available on Windows, macOS,Linux, and every smartphone.

The Pyodide project has supported Emscripten Pythonsince 2018. Hundreds of thousands of students have learned Python throughPyodide via projects likeCapytaleandPyodideU. Pyodideis also increasingly being used by Python packages to provide interactivedocumentation. This demonstrates both the importance and the maturity of theEmscripten platform.

Emscripten and WASI are also the only supported platforms that offer anymeaningful sandboxing.

Emscripten Platform Information

“Pyodide” vs “Emscripten Python”

For the sake of this document, we use the term “Emscripten Python” to refer tothe Emscripten Python maintained in thepython/cpython repository, withoutany downstream additions. We contrast the features present in Emscripten Pythonto the features present in Pyodide.

Pyodide is maintainedon GitHub anddistributed viajsDelivr,npm, andGitHub releases.

Emscripten Python is not distributed, but it is possible to buildby followingthe instructions in the devguide

Background on Emscripten

Emscriptenconsists of a C and C++ compiler and linker based onLLVM, together with aruntime based on a mildly patched musl libc.

Emscripten is a POSIX-based platform. It uses theWebAssembly binary format,and theWebAssembly dynamic linking section.

Theemcc compiler is a wrapper aroundclang. Theemcc linker is awrapper aroundwasm-ld (also part of the LLVM toolchain).

Emscripten support for portable C/C++ code source compatibility with Linux isfairly comprehensive, with certain expected exceptions to be spelled out.CPython already supports compilation to Emscripten, and it only requires a verymodest number of modifications to the normal Linux target.

POSIX Compliance

Emscripten is a POSIX platform. However, there are POSIX APIs that exist butalways fail when called and POSIX APIs that don’t exist at all. In particular,there are problems with networking APIs and blocking I/O, and there is nosupport forfork(). SeeEmscripten Portability Guidelines.

Emscripten executables can be linked with threading support, but it comeswith several limitations:

  • Enabling threading requires websites to be served with special security headersthat indicate acceptance of the possibility ofSpectre-style informationleakage. These headers are a usability hazard for users who are not intimatelyfamiliar with the web platform.
  • If an executable is linked with both threading and a dynamic loader, Emscriptenprints a warning that using dynamic loading and pthreads together isexperimental. It may cause performance problems or crashes. These problems mayrequire WebAssembly standards work to resolve.

Because of these limitations, Pyodide standardizes a no-pthreads build ofPython. If there is sufficient demand, a pthreads build with no dynamic loadercould be added later.

Development Tools

Emscripten development tools are equally well supported on Linux, Windows, andmacOS. The upstream tools include:

  • The Emscripten Software Developer Kit (emsdk) which can be used toinstall the Emscripten compiler toolchain (emcc).
  • emcc is a C and C++ compiler, linker, and a sysroot with headersfor the system libraries. The system libraries themselves are generated onthe fly based on the ABI requested.
  • Node.js can be used as an “emulator” to run Emscripten programs from thecommand line. This emulation behaves best on Linux with macOS as a runner up.Node.js is the most convenient way to test Emscripten programs.
  • It is possible to run Emscripten programs inside of any web browser. Browserautomation tools like Selenium, Playwright, or Puppeteer can be used to testfeatures that are browser-only.

Pyodide’s tools:

  • pyodidebuild can be used to cross compile Python packages to run onEmscripten. Cross compilation works best on Linux, there is experimentalsupport on macOS, and it is entirely unsupported on Windows.
  • pyodidevenv can make a virtual environment that runs in Pyodide.
  • pytest-pyodide can test Python code against various JavaScript runtimes.

cibuildwheel supports building wheels to target Emscripten usingpyodidebuild.

In the short term, Pyodide’s packaging tooling will stay in the Pyodiderepository. It is an open question where Pyodide’s packaging tooling should livein the long term. Two sensible options would be for it to remain under thepyodide organization or be moved into thepypa organization on GitHub.

Emscripten Application Lifecycle

An Emscripten “binary” consists of a pair of files, an.mjs file and a.wasm file. The.wasm file contains all of the compiled C/C++/Rust code.The.mjs file contains the lifecycle code to set up the runtime, locate the.wasm file, compile it, instantiate it, call themain() function, and toshut down the runtime on exit. It also includes an implementation for all of thesystem calls, including the file system, the dynamic loader, and any logic toexpose additional functionality from the JavaScript runtime to C code.

The.mjs file exports a singlebootstrapEmscriptenExecutable()JavaScript function that bootstraps the runtime, calls themain() function,and returns an API object that can be used to call C functions. Each time it iscalled produces a complete and independent copy of the runtime with its ownseparate address space.

ThebootstrapEmscriptenExecutable() takes a large number of runtimesettings.The full list is described in the Emscripten documentation here. The mostimportant of these are as follows:

  • thisProgram: The value ofargv[0]. In Python, this makes its way intosys.executable.
  • arguments: The list of string arguments to be passed tomain().
  • preRun: A list of callbacks which are invoked after the JavaScript runtimeand file system have been bootstrapped but before callingmain(). Usefulto set up the file system, environment variables, and standard streams.
  • print /printErr : Initial handlers for stdout and stderr. They areline buffered and performing aflush() of a partial line forces an extranew line. If tty-like behavior is desired, the standard stream devices shouldbe replaced in apreRun() hook.
  • onExit: A handler that is called when the runtime exits.
  • instantiateWasm: A callback that is called to instantiate the WebAssemblymodule. Overriding the WebAssembly instantiation procedure via this functionis useful when you have other custom asynchronous startup actions or downloadsthat can be performed in parallel to WebAssembly compilation. Implementingthis callback allows performing all of these in parallel.

File System Setup

The Standard Library

In order for Python to run, it needs access to the standard library in theEmscripten file system. There are several possible approaches to this:

  • The Emscripten linker has a--preload-file flag that will automaticallyhandle loading files.Information about how it works is available here.This is the simplest approach, but Pyodide has moved away from it because itembeds the files into a custom archive format that cannot be processed withstandard tooling.
  • For Node.js, use the NODEFS to mount a native directory with the files into theEmscripten file system. This is the most efficient option but is Node only. Itis closely analogous to whatWASI does.
  • Put the standard library into a zip archive and useZipImporter. Using anuncompressed zip file allows the web server and client to apply bettercompression to the standard library itself. It also uses the more efficientnative decompression algorithms of the browser rather than less efficientWebAssembly decompression. The disadvantages of this are a higher memoryfootprint and breakinginspect & various tests that do not expect thestandard library to be packaged in this way.
  • Put the standard library into an uncompressed tar archive and mount it into aTARFS read only file system backed by the tar file. This has the best memoryusage, runtime performance, and transfer size of the options that can be usedin the browser. The disadvantage is that Emscripten does not itself include aTARFS so it requires a downstream implementation.

Pyodide uses theZipImporter approach in every runtime. Python uses theNODEFS approach when run with node and theZipImporter approach for the webexample. We will continue with this approach.

TheZipImporter provides a clean resolution for a bootstrapping problem: thePython runtime is capable of unpacking a wide variety of archive formats, butthe Python runtime is not ready to use until the standard library is alreadyavailable. Sincezipimport.py is a frozen module, it avoids these problems.All of the other approaches solve the bootstrapping problem by setting up thestandard library using JavaScript.

Third-party packages

It is also necessary to make any needed packages available in the Emscriptenfile system. Currently Emscripten CPython has no support for packages. Pyodideuses two different approaches for packages:

  • In the browser, Pyodide downloads and unpacks wheels into the MEMFSsite-packages directory. It then preloads all dynamic libraries in the wheel.The work of downloading and installing all the packages is redone every timethe runtime starts.
  • The Pyodidepython CLI entrypoint mounts all of the host file system asNODEFS directories before it bootstraps Python. This allows the normal virtualenvironment mechanism to work. Pyodide virtual environments contain a patchedcopy of pip and a custompip.conf so that pip will install Pyodide wheels.On startup the Pyodidepython CLI will preload all Emscripten dynamiclibraries that are in the site-packages directory.

Console and Interactive Usage

stdin defaults to always returningEOF, whilestdout andstderrdefault to callingconsole.log andconsole.error respectively. It ispossible to pass handlers tobootstrapEmscriptenExecutable() to configurethe standard streams, but no matter what the I/O devices have undesirable linebuffering behavior that forces a new line when flushed. To implement a wellbehaved TTY in-browser, it is necessary to remove the default I/O devices andreplace them in apreRun hook.

Makingstdin work correctly in the browser poses an additional challengebecause it is not allowed to block for user input in the main thread of thebrowser. If Emscripten is run in a web worker and served with the shared memoryheaders, it is possible to receive input using shared memory and atomics. It isalso possible for astdin device to block in a simpler and more efficientmanner using stack switching using the experimental JavaScript PromiseIntegration API.

Pyodide replaces the standard I/O devices in order to fix the line bufferingbehavior. When Pyodide is run in Node.js,stdin,stdout, andstderr areby default connected toprocess.stdin,process.stdout, andprocess.stderr and so the standard streams work as a tty out of the box.Pyodide also ensures thatshutil.get_terminal_size returns resultsconsistent withprocess.stdout.rows andprocess.stdout.columns. Pyodidecurrently has no support for stack switchingstdin.

Currently, the Emscripten Python Node.js runner uses the default I/O thatEmscripten provides. The web example usesAtomics forstdin and hascustomstdout andstderr handlers, but they exhibit the undesirable linebuffering behavior. We will upstream the standard streams behaviors fromPyodide.

In the long term, we hope to implement stack switchingstdin devices, butthat is out of scope for this PEP.

Traps and Uncaught Exceptions

We consider the C runtime state to be corrupted if there is a WebAssembly trap,an unhandled JavaScript exception, or an uncaught WebAssembly throw instruction.

Unlike in other platforms, there is no operating system to shut down theexecutable when there is a trap or other unrecoverable corruption of the libcruntime. We need to provide our own code to print tracebacks, dump the memory,or do whatever else is helpful for debugging a crash. If we expose a JavaScriptAPI, we also must ensure that it is disabled after an unrecoverable crash toprevent downstream users from observing the Python runtime in an inconsistentstate.

In order to detect fatal errors, Pyodide uses the following approach: allfallable calls from WebAssembly into JavaScript are wrapped with a JavaScripttry/catch block. Any caught JavaScript exceptions are translated into Pythonexceptions. This ensures that any recoverable JavaScript error is caught beforeit unwinds through any WebAssembly frames. All entrypoints to WebAssembly arealso wrapped with JavaScript try/catch blocks. Any exceptions caught there haveunwound WebAssembly frames and are thus considered to be fatal errors (thoughthere is a special case to handleexit()). This requires foundationalintegration with the Python/JavaScript foreign function interface.

When the Pyodide runtime catches a fatal exception, it introspects the error todetermine whether it came from a trap, a logic error in a system call, asetjmp() without alongjmp(), or a libcxxabi call to__cxa_throw()(an uncaught C++ exception or Rust panic). We render as informative an errormessage as we can. We also call_Py_DumpTraceback() so we can display aPython traceback in addition to the JS/WebAssembly traceback. It also disablesthe JavaScript API so that further attempts to call into Python result in anerror saying that the runtime has fatally failed.

Normally, WebAssembly symbols are stripped so the WebAssembly frames are notvery useful. Compiling and linking with-g2 (or a higher debug setting)ensures that WebAssembly symbols are included and they will appear in thetraceback.

Because Emscripten Python currently has no JavaScript API and no foreign functioninterface, the situation is much simpler. The Python Node.js runner wraps the calltobootstrapEmscriptenExecutable() in a try/catch block. If an exception iscaught, it displays the JavaScript exception and calls_Py_DumpTraceback().It then exits with code 1. We will stick with this approach until we add eithera JavaScript API or foreign function interface, which is out of scope for this PEP.

Specification

Scope of Work

Adding Emscripten as a Tier 3 platform only requires adding support forcompiling an Emscripten-compatible build from the unpatched CPython source code.It does not necessarily require there to be any officially distributedEmscripten artifacts on python.org, although these could be added in the future.In the short term, they will continue to be distributed downstream with Pyodide.

Emscripten will be built using the same configure and Makefile system as otherPOSIX platforms, and must therefore be built on a POSIX platform. Both Linux andmacOS will be supported.

A Python CLI entrypoint will be provided, which among other things can be usedto execute the test suite.

Linkage

It is only supported to statically link the Python interpreter. We useEM_JSfunctions in the interpreter for various purposes. It is possible to dynamicallylink object files that includeEM_JS functions, but their behavior deviatessignificantly from their behavior in static builds. For this reason, it wouldrequire special work to support. If a use case for dynamically linking theinterpreter in Emscripten emerges, we can evaluate how much effort would berequired to support it.

Standard Library

Unsupported Modules

Seehttps://pyodide.org/en/stable/usage/wasm-constraints.html#removed-modules.

Removed Modules

The following modules are removed from the standard library to reduce downloadsize and since they currently wouldn’t work in the WebAssembly VM.

  • curses
  • dbm
  • ensurepip
  • fcntl
  • grp
  • idlelib
  • msvcrt
  • pwd
  • resource
  • syslog
  • termios
  • tkinter
  • turtle
  • turtledemo
  • venv
  • winreg
  • winsound
Included but not Working Modules

The following modules can be imported, but are not functional:

  • multiprocessing
  • threading
  • sockets

as well as any functionality that requires these.

The following are present but cannot be imported due to a dependency on thetermios module which has been removed:

  • pty
  • tty

Platform Identification

sys.platform will return"emscripten". Although Emscripten attempts tobe compatible with Linux, the differences are significant enough that a distinctname is justified. This is consistent with the return value fromos.uname().

There is alsosys._emscripten_info which includes the Emscripten version andthe runtime (eithernavigator.userAgent in a browser or"Nodejs"+process.version in Node.js).

Signals Support

WebAssembly does not have native support for signals. Furthermore, on anon-pthreads build, the address space of the WebAssembly module is not shared,so it is impossible for any thread capable of seeing an interrupt to write tothe eval breaker while the Python interpreter is running code. To work aroundthis, there are two possible solutions:

  • If Emscripten is run in a web worker and served with the shared memory headers,it is possible to use shared memory outside of the WebAssembly address spaceas a signal buffer. A signal handling UI thread can write the desired signalinto the signal buffer. The interpreter can periodically check the state ofthis signal buffer in the eval breaker code. Checking the signal buffer isslow compared to checking the eval breaker in native platforms, so we do onlydo it once every 50 times through the eval breaker. SeePython/emscripten_signal.c
  • Using stack switching, we can occasionally switch the stack and allow theJavaScript event loop to go around, then check the state of a signal buffer.This requires the experimental JavaScript Promise Integration API, and wouldbe best used with the techniques for optimizing long tasks describedin thisarticle

Emscripten Python has already implemented the solution based on shared memory,and it is in use in Pyodide.

Eventually, we hope to implement stack-switching-based signals so that it ispossible to use signals in the main thread of node and the browser, as well asin in web pages that are not served with the shared memory headers. We will needto keep the shared memory based approach as well, both for backwardscompatibility and because it is more efficient when it is possible. However,this is out of scope for this PEP.

Function Pointer Casts

Section 6.3.2.3, paragraph 8 of the Cstandard reads:

A pointer to a function of one type may be converted to a pointer to afunction of another type and back again; the result shall compare equal tothe original pointer. If a converted pointer is used to call a functionwhose type is not compatible with the pointed-to type, the behavior isundefined.

However, most platforms have the same behavior: if a function is called with toomany arguments, the extra arguments are ignored; if a function is called withtoo few arguments, the extra arguments are filled in with garbage.

On the other hand, the WebAssembly spec defines calling a function with thewrong signature to trap (see step 18 in the execution of call_indirect.

It is common for Python extension modules to cast a function to a differentsignature and call it with the different signature. For instance, many Cextensions define aMETH_NOARGS function to take 0 or 1 argument. Theinterpreter calls it with two arguments, the first of which is the Python moduleobject and the second of which is alwaysNULL. In order to make theseextension modules work without changing their source code, we need specialhandling.

Initially, we resolved this problem by calling out to JavaScript and havingJavaScript call the function pointer. When calling a WebAssembly function fromJavaScript, missing arguments are treated as zero and extra arguments areignored (see step 7 here.This works, but has the disadvantage of being slow and breaking stack switching– it is not possible to stack switch through JavaScript frames.

Using the wasm-gcref.testinstruction, we can query the type of the function pointer and manually fix upthe argument list.

wasm-gc is a relatively new feature for WebAssembly runtimes, so we attempt touse a wasm-gc based function pointer cast trampoline if possible and fall backto a JS trampoline if not. Every JavaScript runtime that supports stackswitching also supports wasm-gc, so this ensures that stack switching works onevery platform runtime that supports it. The one wrinkle is that iOS 18 ships abroken implementation of wasm-gc so we have to special case it.

See here for the full implementation details.

The function pointer cast handling is fully implemented in cpython. Pyodide usesexactly the same code as upstream.

CI Resources

Pyodide can be built and tested on any Linux with a reasonably recent version ofNode.js. Anaconda has offered to provide physical hardware to run Emscriptenbuildbots, maintained by Russell Keith-Magee.

CPython does not currently test Tier 3 platforms on GitHub Actions, but if thisever changes, their Linux runners are able to build and test Emscripten Python.

PEP 11

PEP 11 will be updated to indicate that Emscripten is supported, specificallythe tripleswasm32-unknown-emscripten_xx_xx_xx.

Russell Keith-Magee will serve as the initial core team contact for these ABIs.

Future Work

Improving Cross Builds in the Packaging Ecosystem

Python now supports four non-self-hosting platforms: iOS, Android, WASI, andEmscripten. All of them will need to build packages via cross builds. Currently,pyodide-build allows building a very large number of Python packages forEmscripten, but it is very complicated. Ideally, the Python packaging ecosystemwould have standards for cross builds. This is a difficult long term project,particularly because the packaging system is complex and was designed from theground up with the assumption that cross compilation would not happen.

Pyodide Runtime Features to be Upstreamed

This is a collection of Pyodide runtime features that are out of scope for thisPEP and for the Python 3.14 development cycle but we would like to upstream inthe future.

JavaScript API for Bootstrapping

Currently we offer no stable API for bootstrapping Python. Instead, we useonecollection of settings for the Node.js CLI entrypointanda separate collection of settings for the browser demo.

The Emscripten executable startup API is complicated and there are many possibleconfigurations that are broken. Pyodide offers a simpler set of options thanEmscripten. This gives downstream users a lot of flexibility while allowing usto maintain a small number of tested configurations. It also reduces downstreamcode duplication.

Eventually, we would like to upstream Pyodide’s bootstrapping API. In the shortterm, to keep things simple we will support no JavaScript API.

JavaScript foreign function interface (FFI)

Because Emscripten supports POSIX, a significant number of tasks can be achievedusing theos module. However, many fundamental operations in JavaScriptruntimes are not possible via POSIX APIs. Pyodide’s approach is to specify amapping between the JavaScript object model and the Python object model and acalling convention that allows high level bidirectional integration.See thePyodide documentation.

Asyncio

Most JavaScript primitives are asynchronous. The JavaScript thread that Pythonruns in already has an event loop. It it not too difficult to implement a Pythonevent loop that defers all actual work to the JavaScript event loop,implemented in Pyodide here.

This is logically dependent on having at least some limited JavaScript FFIbecause the only way to schedule tasks on the JavaScript event loop is via acall out to JavaScript.

One cause of incompatibility is that it is not possible to control the lifecycle of the event loop from within a JavaScript isolate. This makesasyncio.run() and similar things not work.

Using stack switching it is also possible to make a coroutine out of“synchronous” Python frames. These stack switching coroutines are scheduled onthe same event loop as ordinary Python coroutines and are fully reentrant. Thisis fully implemented in Pyodide.

Backwards Compatibility

Adding a new platform does not introduce any backwards compatibility concerns toCPython itself. However, there may be some backwards compatibility implicationson Pyodide users. There are a large number of existing users of Pyodide, so itis important when upstreaming features from Pyodide into Python that we takecare to minimize backwards incompatibility. We will also need a way to disablepartially-upstreamed features so that Pyodide can replace them with morecomplete versions downstream.

Security Implications

Adding a new platform does not add any new security implications.

Emscripten and WASI are also the only supported platforms that offer sandboxing.If users wish to execute untrusted Python code or untrusted Python extensionmodules, Emscripten provides a secure way for them to do that.

How to Teach This

The education needs related to this PEP relate to two groups of developers.

First, web developers will need to know how to build Python and use it in awebsite, along with their own Python code and any supporting packages, and howto use them all at runtime. The documentation will cover this in a similar formto the existing Windows embeddable package. In the short term, we will encouragedevelopers to use Pyodide if at all possible.

Reference Implementation

Pyodide.

Copyright

This document is placed in the public domain or under the CC0-1.0-Universallicense, whichever is more permissive.


Source:https://github.com/python/peps/blob/main/peps/pep-0776.rst

Last modified:2025-05-16 14:48:26 GMT


[8]ページ先頭

©2009-2025 Movatter.jp