NotificationsYou must be signed in to change notification settings
Fork3.1k
Star20.1k

Mypy Daemon

Jukka Lehtosalo edited this pageJun 30, 2022 ·6 revisions

If you run mypy using themypy tool, it performs incremental checking by default. This tries to reuse type checking results from the previous run via files in the.mypy_cache directory.

Thiscoarse-grained (module-level) incremental checking is often just fine, but it can result in a slow iteration speed, especially in large code bases with hundreds of thousands of lines or more.

The mypy daemon (dmypy) helps with this use case. It makes incremental runs very fast (often a few hundred ms) by having a long-lived process that stores program state in memory between runs.

The daemon also maintainsfine-grained dependencies. These dependencies are between definitions, not modules. In contrast, the coarse-grained incremental mode only tracks dependencies at module level.

The implementation is fairly complex and it's usually not necessary to understand the details of how it works to work on mypy, but some mypy changes (e.g. many AST changes) require at least small updates in the mypy daemon.

Example

Assume that we have a filea.py that contains dozens of functions, only one of which uses moduleb:

# a.pyimportb...# lots of stuff that doesn't use bdeff()->None:b.g(1)

The mypy daemon will process the entire program when starting up, and record thata.f depends onb.g, among other things.

Now we editb.py and change the signature ofb.g:

# b.pydefg(x:str)->None: ...

Mypy daemon will first notice that the fileb.py is changed (by checking the modification time). It will processb.py again and notice that the only externally visible change inb is the modified signature ofb.g.

Using fine-grained dependencies, mypy daemon will figure out that in modulea, we'll need to reprocess only the functiona.f. This is much faster than processing the entire modulea if the module is big.

After reprocessinga.f, mypy daemon will report an incompatible argument error.

How does it work

The mypy daemon has several things going on, summarized below.

Fine-grained dependencies

Mypy daemon keeps track of fine-grained dependencies between functions, attributes, classes, modules, etc.

The smallest unit of (re)processing (i.e.target) is either a module top level or a top-level function/method. Class bodies are considered to be included in the surrounding module top level or function, but methods are separate targets. We only track dependencies at this level of granularity, though targets can depend on arbitrary attributes (that is,triggers or sources are more fine-grained than targets).

For example, if functionmod1.f has a reference to class attributemod2.C.x, we'll record thatmod1.f depends onmod2.C.x. However, if the definition of class attributeC.x is initialized to the valueCONST, we'll record that the top-level ofmod2 depends onCONST, instead of recording a dependency for the class variablemod2.C.x.

Processing modified files

Each file that ismodified since the previous mypy daemon run will be processed in full, since mypy doesn't have an incremental parser.

After processing a file, the daemon takes a diff of the old and new symbol tables to find changed definitions. It will then "trigger" these definitions (i.e. reprocess everything that depends on them).

Now comes a tricky bit: wemerge the new AST to the AST corresponding to the previous revision of the file by copying the contents of new AST nodes over to the corresponding old nodes (when they exist). This way references to the AST nodes in other modules will continue to point to the correct things.

Following triggered fine-grained dependencies

Mypy daemon uses fine-grained dependencies to find other parts of the codebase that need to be reprocessed in response to the triggered definitions.

To reprocess a triggered definition, the daemon first "strips" (or resets) the relevant AST nodes to match a fresh AST we get from the parser. We use this hack, since we don't have an incremental parser and want to avoid processing the entire file containing a triggered definition.

Reprocessing a definition implies performing semantic analysis and type checking on a stripped subset of some module AST.

After reprocessing, we again check if something externally visible has changed, and we may need to also trigger the dependencies of definitions we just reprocessed.

Completing an incremental step

Eventually we'll reach a fixed point where there are no additional triggered dependencies and we are done.

Relevant code

mypy/dmypy_server.py: Implementation of the daemon process
mypy/server/deps.py: Generate fine-grained dependencies for AST nodes; also documents how triggers and dependencies work
mypy/server/update.py: Fine-grained incremental processing logic
mypy/server/astdiff.py: Compare two versions of a module symbol table and find changed definitions
mypy/server/astmerge.py: Merge new version of an AST to the previous version to preserve object identities of nodes
mypy/server/aststrip.py: Strip an AST to make it "fresh", i.e. similar to what is produced by parser, with all changes performed by semantic analysis or type checking reverted

Test cases

Many daemon test cases perform multiple incremental steps to validate that propagating changes works as expected.

The primary test files are here:test-data/unit/fine-grained*.test

There are also more unit test style test cases for some operations:

Generating deps:test-data/unit/deps.test
Performing AST diffs:test-data/unit/diff.test
Merging ASTs:test-data/unit/merge.test

Movatterモバイル変換

Uh oh!

Mypy Daemon

Example

How does it work

Fine-grained dependencies

Processing modified files

Following triggered fine-grained dependencies

Completing an incremental step

Relevant code

Test cases

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally