In you want to try this in compiled mode, simply install mypy usingMYPY_USE_MYPYC=1, this will install the extension automatically (for now)
If you want to just play with the extension, or use it in interpreted mode, usepip install mypyc/lib-rt
I translated (de-)serialization logic from JSON methods almost verbatim (including comments)
This may be still not the most efficient way to do this, but I wanted to write something simple, that probably still gets us 90% there in terms of performance. I am still open to suggestions however
Please forgive me if the PR looks not very polished, I feel tired, but needed some kind of closure on this :-)

Some technical notes:

The hugetry/except import blob inmypy/cache.py is temporary, it is needed for now to be able to run tests without installing mypy itself (only withtest-requirements.txt).
There is certain asymmetry with read/write for literals, this is intentional because we allowcomplex and/orNone in some cases, but not in other cases.
General convention is that during deserialization the type/symbol marker is consumer by the caller (except forMypyFile, which is special). There is no convention for few classes that are not types/symbols.
I add new primitive type fornative_internal.Buffer (and possible more type in future fromnative) for better/automatic method call specializations. If this feels wrong/risky, I can convert this to a more ad-hoc logic intransform_call_expr()

Related issue:#3456

ilevkivskyi added11 commits

August 6, 2025 12:25

Add PoC implementation

ac77da1

Port ArgKind hack

f3b37eb

Some more experiments

e3d9d41

Clean-up/optimize some case reads

08bec72

Merge remote-tracking branch 'upstream/master' into ff-cache

ab6e6a6

Some more experiments

bfafc83

Add support for ints

1865600

Move closer to the real deal

d6b3068

Repurpose lib-rt/setup.py

d1bc106

Some touches

8e7cdcb

More touches

418fd1f

ilevkivskyi requested a review fromJukkaL

August 15, 2025 14:52

Merge remote-tracking branch 'upstream/master' into ff-cache

705ffe4

ilevkivskyi mentioned this pull request

Aug 15, 2025

Strategy for distributing mypyc native librarymypyc/mypyc#1128

Open

This comment has been minimized.

Copy link

MemberAuthor

ilevkivskyi commentedAug 15, 2025

Oh, hm I completely forgot 32-bit architectures are still a thing. Probably an easy fix, not sure I will have time to take a look at it today, but it should not stop any reviewers from reviewing.

Remove string size limit; Fix Windows

c9712ee

This comment has been minimized.

Copy link

Collaborator

JukkaL commentedAug 18, 2025

(Not a full review.)

Nice, this is a major improvement! I measured a ~35% performance improvement inimport torch when cached, compared to using the old cache format. Once deserialization is faster, we can hopefully further improve warm runtimes by improving the performance of remaining bottlenecks.

Copy link

Collaborator

JukkaL commentedAug 19, 2025

I did a quick review, and this seems good enough to merge (or at least close enough). Let's leave this open for another day in case somebody still wants to have a look. Before making this an official feature, there are some changes that I think would be nice to have (beyond a way of distributing this), but these are better done as follow-up PRs and I won't discuss them here. I also have some smaller comments that are easier for me to address in a follow-up PR than writing detailed review comments here.

I also want to remove the JSON based format eventually, maybe a few releases after we've made this the default. This may require a way of converting the binary cache to JSON, since at least at Dropbox we have some tools that process mypy cache files, and processing a binary format in tools sounds pretty painful. I have some ideas about how to achieve this without making the serialization code significantly harder to maintain or less efficient. I'll write about this in a follow-up issue.

Copy link

Collaborator

JukkaL commentedAug 19, 2025

Added link to related issue#3456 in summary.

JukkaL approved these changes

Aug 19, 2025

View reviewed changes

Copy link

Collaborator

JukkaL left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Waiting another day doesn't bring any value, so let's merge this soon. Left one comment, otherwise looks good. We can start iterating on this once this is merged.

mypy/main.py Outdated

		incremental_group.add_argument(
		"--fixed-format-cache",
		action="store_true",
		help="Use experimental binary fixed format cache",

Copy link

Collaborator

JukkaLAug 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Hide the flag from--help output (help=argparse.SUPPRESS) until we've figured out how to distribute this properly?

Copy link

MemberAuthor

ilevkivskyiAug 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I hoped we will do this before next mypy release, but sure, we can expose this easily when ready.

Hide new cache flag for now

45d3ec4

Copy link

Contributor

github-actionsbot commentedAug 19, 2025

According tomypy_primer, this change doesn't affect type check results on a corpus of open source code. ✅

Copy link

MemberAuthor

ilevkivskyi commentedAug 19, 2025

@JukkaL

Added link to related issue#3456 in summary.

Oh, I completely forgot about that issue, and I see you even posted a proposal there. It is interesting that I wrote something very similar to what you propose, few comments:

I actually hadwrite_short_int() andread_short_int() for type/symbol tags (one byte per tag) at some point. I didn't include them in the PR because it had only minor effect on performance. But they will definitely allow to cut a big chunk of cache size (IIRC they reduced cache size by at least 20% in my experiments).
As you mentioned, attribute name tags are technically redundant, and actually debugging was not that hard without them, so I think we don't really need them.
I think we can add few dedicated tags for most common instances likebuiltins.str and return same object for them (essentially what we do in type-checker withnamed_type()).

Before making this an official feature, there are some changes that I think would be nice to have (beyond a way of distributing this), but these are better done as follow-up PRs and I won't discuss them here. I also have some smaller comments that are easier for me to address in a follow-up PR than writing detailed review comments here.

OK, yeah, this will definitely speed things up.

ilevkivskyi merged commit657bdd8 intopython:master

Aug 19, 2025

20 checks passed

ilevkivskyi deleted the ff-cache branch

August 19, 2025 19:33

Labels

None yet

Movatterモバイル変換

Uh oh!

More efficient (fixed-format) serialization#19668

More efficient (fixed-format) serialization#19668

Uh oh!

Conversation

ilevkivskyi commentedAug 15, 2025• edited by JukkaLLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Uh oh!

This comment has been minimized.

ilevkivskyi commentedAug 15, 2025

Uh oh!

This comment has been minimized.

JukkaL commentedAug 18, 2025

Uh oh!

JukkaL commentedAug 19, 2025

Uh oh!

JukkaL commentedAug 19, 2025

Uh oh!

JukkaL left a comment

Choose a reason for hiding this comment

Uh oh!

JukkaLAug 19, 2025

Choose a reason for hiding this comment

Uh oh!

ilevkivskyiAug 19, 2025

Choose a reason for hiding this comment

Uh oh!

github-actionsbot commentedAug 19, 2025

Uh oh!

ilevkivskyi commentedAug 19, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ilevkivskyi commentedAug 15, 2025•
edited by JukkaL
Loading