Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up

Cinder is Meta's internal performance-oriented production version of CPython.

License

NotificationsYou must be signed in to change notification settings

facebookincubator/cinder

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Cinder Logo

Cinder Logo

Support Ukraine - Help Provide Humanitarian Aid to Ukraine.Cinder build status on GitHub Actions

Welcome to Cinder!

Cinder is Meta's internal performance-oriented production version ofCPython 3.10. It contains a number of performance optimizations, includingbytecode inline caching, eager evaluation of coroutines, a method-at-a-timeJIT, and an experimental bytecode compiler that uses type annotations to emittype-specialized bytecode that performs better in the JIT.

Cinder is powering Instagram, where it started, and is increasinglyused across more and more Python applications in Meta.

For more information on CPython, seeREADME.cpython.rst.

Is this supported?

Short answer: no.

We've made Cinder publicly available in order to facilitate conversationabout potentially upstreaming some of this work to CPython and to reduceduplication of effort among people working on CPython performance.

Cinder is not polished or documented for anyone else's use. We don't have thedesire for it to become an alternative to CPython. Our goal in making thiscode available is a unified faster CPython. So while we do run Cinder inproduction, if you choose to do so you are on your own. We can't commit tofixing external bug reports or reviewing pull requests. We make sure Cinderis sufficiently stable and fast for our production workloads, but we make noassurances about its stability or correctness or performance for any externalworkloads or use-cases.

That said, if you have experience in dynamic language runtimes and have ideasto make Cinder faster; or if you work on CPython and want to use Cinder asinspiration for improvements in CPython (or help upstream parts of Cinder toCPython), please reach out; we'd love to chat!

How do I build it?

Cinder should build just like CPython;configure andmake -j. Howeveras most development and usage of Cinder occurs in the highly specific context ofMeta we do not exercise it much in other environments. As such, the mostreliable way to build and run Cinder is to re-use the Docker-based setup fromour GitHub CI workflow.

If you just want to get a working Cinder without building it yourself, ourRuntime Docker Image is going to be the easiest (no repo clone needed!):

  1. Install and setup Docker.
  2. Fetch and run our cinder-runtime image:
    docker run -it --rm ghcr.io/facebookincubator/cinder-runtime:cinder-3.10

If you want to build it yourself:

  1. Install and setup Docker.
  2. Clone the Cinder repo:
    git clone https://github.com/facebookincubator/cinder
  3. Run a shell in the Docker environment used by the CI:
    docker run -v "$PWD/cinder:/vol" -w /vol -it --rm ghcr.io/facebookincubator/cinder/python-build-env:latest bash
    The above command does the following:
    • Downloads (if not already cached) a pre-built Docker image used by theCI fromhttps://ghcr.io/facebookincubator/cinder/python-build-env.
    • Makes the Cinder checkout above ($PWD/cinder) available to theDocker environment at the mount point /vol.
    • Interactively (-it) runs bash in the /vol directory.
    • Cleanup the local image after it's finished (--rm) to avoid disk bloat.
  4. Build Cinder from the shell started the Docker environment:
    ./configure && make

Please be aware that Cinder is only built or tested on Linux x64; anything else(including macOS) probably won't work. The Docker image above is FedoraLinux-based and built from a Docker spec file in the Cinder repo:.github/workflows/python-build-env/Dockerfile.

There are some new test targets that might be interesting.maketestcinder is pretty much the same asmake test except that it skips afew tests that are problematic in our dev environment.maketestcinder_jit runs the test suite with the JIT fully enabled, so allfunctions are JIT'ed.make testruntime runs a suite of C++ gtest unittests for the JIT. Andmake test_strict_module runs a test suite forstrict modules (see below).

Note that these steps produce a Cinder Python binary without PGO/LTO optimizations enabled,so don't expect to use these instructions to get any speedup on any Python workload.

How do I explore it?

Cinder Explorer is a live playground, where you cansee how Cinder compiles Python code from source to assembly -- you're welcometo try it out! Feel free to file feature requests and bug reports. Keep in mindthat the Cinder Explorer, like the rest of this, "supported" on a best-effortbasis.

What's here?

Immortal Instances

Instagram uses a multi-process webserver architecture; the parent processstarts, performs initialization work (e.g. loading code), and forks tens ofworker processes to handle client requests. Worker processes are restartedperiodically for a number of reasons (e.g. memory leaks, code deployments) andhave a relatively short lifetime. In this model, the OS must copy the entirepage containing an object that was allocated in the parent process when theobject's reference count is modified. In practice, the objects allocatedin the parent process outlive workers; all the work related to referencecounting them is unnecessary.

Instagram has a very large Python codebase and the overhead due tocopy-on-write from reference counting long-lived objects turned out to besignificant. We developed a solution called "immortal instances" to provide away to opt-out objects from reference counting. See Include/object.h fordetails. This feature is controlled by defining Py_IMMORTAL_INSTANCES and isenabled by default in Cinder. This was a large win for us in production (~5%),but it makes straight-line code slower. Reference counting operations occurfrequently and must check whether or not an object participates in referencecounting when this feature is enabled.

Shadowcode

"Shadowcode" or "shadow bytecode" is our implementation of a specializinginterpreter. It observes particular optimizable cases in the execution ofgeneric Python opcodes and (for hot functions) dynamically replaces thoseopcodes with specialized versions. The core of shadowcode lives inShadowcode/shadowcode.c, though the implementations for the specializedbytecodes are inPython/ceval.c with the rest of the eval loop.Shadowcode-specific tests are inLib/test/test_shadowcode.py.

It is similar in spirit to the specializing adaptive interpreter (PEP-659)that will be built into CPython 3.11.

Await-aware function calls

The Instagram Server is an async-heavy workload, where each web request maytrigger hundreds of thousands of async tasks, many of which can be completedwithout suspension (e.g. thanks to memoized values).

We extended the vectorcall protocol to pass a new flag,Ci_Py_AWAITED_CALL_MARKER, indicating the caller is immediately awaitingthis call.

When used with async function calls that are immediately awaited, we canimmediately (eagerly) evaluate the called function, up to completion, or upto its first suspension. If the function completes without suspending, we areable to return the value immediately, with no extra heap allocations.

When used with async gather, we can immediately (eagerly) evaluate the set ofpassed awaitables, potentially avoiding the cost of creation and scheduling ofmultiple tasks for coroutines that could be completed synchronously, completedfutures, memoized values, etc.

These optimizations resulted in a significant (~5%) CPU efficiency improvement.

This is mostly implemented inPython/ceval.c, via a new vectorcall flagCi_Py_AWAITED_CALL_MARKER, indicating the caller is immediately awaitingthis call. Look for uses of theIS_AWAITED() macro and this vectorcallflag.

The Cinder JIT

The Cinder JIT is a method-at-a-time custom JIT implemented in C++. It isenabled via the-X jit flag or thePYTHONJIT=1 environment variable.It supports almost all Python opcodes, and can achieve 1.5-4x speedimprovements on many Python performance benchmarks.

By default when enabled it will JIT-compile every function that is evercalled, which may well make your program slower, not faster, due to overheadof JIT-compiling rarely-called functions. The option-Xjit-list-file=/path/to/jitlist.txt orPYTHONJITLISTFILE=/path/to/jitlist.txt can point it to a text filecontaining fully qualified function names (in the formpath.to.module:funcname orpath.to.module:ClassName.method_name),one per line, which should be JIT-compiled. We use this option to compileonly a set of hot functions derived from production profiling data. (A moretypical approach for a JIT would be to dynamically compile functions as theyare observed to be called frequently. It hasn't yet been worth it for us toimplement this, since our production architecture is a pre-fork webserver,and for memory sharing reasons we wish to do all of our JIT compiling upfront in the initial process before workers are forked, which means we can'tobserve the workload in-process before deciding which functions toJIT-compile.)

The JIT lives in theJit/ directory, and its C++ tests live inRuntimeTests/ (run these withmake testruntime). There are also somePython tests for it inLib/test/test_cinderjit.py; these aren't meant tobe exhaustive, since we run the entire CPython test suite under the JIT viamake testcinder_jit; they cover JIT edge cases not otherwise found in theCPython test suite.

SeeJit/pyjit.cpp for some other-X options and environment variablesthat influence the behavior of the JIT. There is also acinderjit moduledefined in that file which exposes some JIT utilities to Python code (e.g.forcing a specific function to compile, checking if a function is compiled,disabling the JIT). Note thatcinderjit.disable() only disables futurecompilation; it immediately compiles all known functions and keeps existingJIT-compiled functions.

The JIT first lowers Python bytecode to a high-level intermediaterepresentation (HIR); this is implemented inJit/hir/. HIR mapsreasonably closely to Python bytecode, though it is a register machineinstead of a stack machine, it is a bit lower level, it is typed, and somedetails that are obscured by Python bytecode but important for performance(notably reference counting) are exposed explicitly in HIR. HIR istransformed into SSA form, some optimization passes are performed on it, andthen reference counting operations are automatically inserted into itaccording to metadata about the refcount and memory effects of HIR opcodes.

HIR is then lowered to a low-level intermediate representation (LIR), whichis an abstraction over assembly, implemented inJit/lir/. In LIR we doregister allocation, some additional optimization passes, and then finallyLIR is lowered to assembly (inJit/codegen/) using the excellentasmjit library.

The JIT is in its early stages. While it can already eliminate interpreterloop overhead and offers significant performance improvements for manyfunctions, we've only begun to scratch the surface of possible optimizations.Many common compiler optimizations are not yet implemented. Ourprioritization of optimizations is largely driven by the characteristics ofthe Instagram production workload.

Strict Modules

Strict modules is a few things rolled into one:

1. A static analyzer capable of validating that executing a module'stop-level code will not have side effects visible outside that module.

2. An immutableStrictModule type usable in place of Python's defaultmodule type.

3. A Python module loader capable of recognizing modules opted in to strictmode (via animport __strict__ at the top of the module), analyzing themto validate no import side effects, and populating them insys.modules asaStrictModule object.

Static Python

Static Python is a bytecode compiler that makes use of type annotations toemit type-specialized and type-checked Python bytecode. Used along with theCinder JIT, it can deliver performance similar toMyPyC orCython inmany cases, while offering a pure-Python developer experience (normal Pythonsyntax, no extra compilation step). Static Python plus Cinder JIT achieves18x the performance of stock CPython on a typed version of the Richardsbenchmark. At Instagram we have successfully used Static Python in productionto replace all Cython modules in our primary webserver codebase, with noperformance regression.

The Static Python compiler is built on top of the Pythoncompiler modulethat was removed from the standard library in Python 3 and has since beenmaintained and updated externally; this compiler is incorporated into CinderinLib/compiler. The Static Python compiler is implemented inLib/compiler/static/, and its tests are inLib/test/test_compiler/test_static.py.

Classes defined in Static Python modules are automatically given typed slots(based on inspection of their typed class attributes and annotatedassignments in__init__), and attribute loads and stores againstinstances of these types use newSTORE_FIELD andLOAD_FIELD opcodes,which in the JIT become direct loads/stores from/to a fixed memory offset inthe object, with none of the indirection of aLOAD_ATTR orSTORE_ATTR. Classes also gain vtables of their methods, for use by theINVOKE_* opcodes mentioned below. The runtime support for these featuresis located inStaticPython/classloader.h andStaticPython/classloader.c.

A static Python function begins with a hidden prologue which checksthat the supplied arguments' types match the type annotations, and raisesTypeError if not. Calls from a static Python function to another staticPython function will skip this opcode (since the types are already validatedby the compiler). Static to static calls can also avoid much of the overheadof a typical Python function call. We emit anINVOKE_FUNCTION orINVOKE_METHOD opcode which carries with it metadata about the calledfunction or method; this plus optionally immutable modules (viaStrictModule) and types (viacinder.freeze_type(), which we currentlyapply to all types in strict and static modules in our import loader, but infuture may become an inherent part of Static Python) and compile-timeknowledge of the callee signature allow us to (in the JIT) turn many Pythonfunction calls into direct calls to a fixed memory address using the x64calling convention, with little more overhead than a C function call.

Static Python is still gradually typed, and supports code that is onlypartially annotated or uses unknown types by falling back to normal Pythondynamic behavior. In some cases (e.g. when a value of statically-unknown typeis returned from a function with a return annotation), a runtimeCASTopcode is inserted which will raiseTypeError if the runtime type doesnot match the expected type.

Static Python also supports new types for machine integers, bools, doubles,and vectors/arrays. In the JIT these are handled as unboxed values, and e.g.primitive integer arithmetic avoids all Python overhead. Some operations onbuiltin types (e.g. list or dictionary subscript orlen()) are alsooptimized.

Cinder supports gradual adoption of static modules via a strict/static moduleloader that can automatically detect static modules and load them as staticwith cross-module compilation. The loader will look forimport __static__andimport __strict__ annotations at the top of a file, and compilemodules appropriately. To enable the loader, you have one of three options:

1. Explicitly install the loader at the top level of your applicationviafrom cinderx.compiler.strict.loader import install; install().

  1. SetPYTHONINSTALLSTRICTLOADER=1 in your env.
  2. Run./python -X install-strict-loader application.py.

Alternatively, you can compile all code statically by using./python -m compiler --static some_module.py,which will compile the module as static Python and execute it.

SeeCinderDoc/static_python.rst for more detailed documentation.


[8]ページ先頭

©2009-2025 Movatter.jp