NotificationsYou must be signed in to change notification settings
Fork32.4k
Star67.9k

gh-121577: fix compileall always recompiling pyc files with hash based invalidation#121960

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Jump to bottom

Open

kulikjak wants to merge5 commits intopython:main

base:main

Choose a base branch

fromkulikjak:compileall-invalidation

Open

gh-121577: fix compileall always recompiling pyc files with hash based invalidation#121960

kulikjak wants to merge5 commits intopython:mainfromkulikjak:compileall-invalidation

Conversation

Copy link

Contributor

kulikjak commentedJul 18, 2024•
edited by bedevere-appbot
Loading

Issue:compileall shouldn't recompile up-to-date files with hash-based invalidation mode #121577

kulikjak added2 commits

July 18, 2024 13:43

compileall now verifies whether recompilation is necessary for all in…

35b9311

…validation modes

add NEWS entry

9d47dbc

bedevere-appbot mentioned this pull request

Jul 18, 2024

compileall shouldn't recompile up-to-date files with hash-based invalidation mode#121577

Open

bedevere-appbot added the awaiting review label

Jul 18, 2024

kulikjak added3 commits

July 18, 2024 14:21

fix NEWS entry syntax

f14eae5

fix the verification logic

9e3b22d

fix tests

64d7f35

edmorley reviewed

Aug 24, 2024

View reviewed changes

Copy link

edmorley left a comment•
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Thank you for working on this!

There is also another related bug with compileall that I think might be worth trying to fix at the same time, since it's directly related:#117204

That is, I think the logic should be:

If header of file doesn't match requested invalidation mode, then skip checking the timestamps or checksum, and instead always recompile
If the header does match the requested invalidation mode, then check the file is up to date (via either the timestamp or hash, as appropriate), and recompile if needed

In addition, I think the current version of this PR worsens performance in the "compiling multiple optimisation levels" scenario, since the original source timestamp and/or expected hash is calculated multiple times (since those steps were moved inside the loop) even though those values are constant regardless of optimisation level of the output bytecode. As such, it might be worth moving those back outside the loop.

edmorley mentioned this pull request

Aug 24, 2024

compileall doesn't recompile when invalidation mode changes even if timestamps match#117204

Open

Copy link

ContributorAuthor

kulikjak commentedAug 30, 2024

There is also another related bug with compileall that I think might be worth trying to fix at the same time, since it's directly related:#117204
That is, I think the logic should be:
If header of file doesn't match requested invalidation mode, then skip checking the timestamps or checksum, and instead always recompile
If the header does match the requested invalidation mode, then check the file is up to date (via either the timestamp or hash, as appropriate), and recompile if needed

I am aware of#117204, but I don't think it's as simple as that.

I do agree that if you specify the hash-based invalidation and the target uses the timestamp and is up-to-date,compileall shouldn't silently do nothing (which is the current behavior). However, when you don't specify any mode, it's not that simple anymore - you can either use the default one (timestamp), or you can use the one existing.pyc file uses.

I can imagine a use case for both of these, which means that there should probably also be a command line option that switches between those two, or a new default invalidation mode that would mean "use what the existing.pyc file uses, or default to timestamp if nopyc exists" and that means changes to the command line tool and API...

I don't think I am the one who should decide how exactlycompileall will behave. That's why I didn't look into that one more.

That said, it's something that should probably be looked into (and I am happy to help with that once we know how it should behave ;)).

In addition, I think the current version of this PR worsens performance in the "compiling multiple optimisation levels" scenario, since the original source timestamp and/or expected hash is calculated multiple times (since those steps were moved inside the loop) even though those values are constant regardless of optimisation level of the output bytecode. As such, it might be worth moving those back outside the loop.

Good catch - I will do so.

Copy link

ContributorAuthor

kulikjak commentedAug 30, 2024

Oh, and there's also a question about what to do when different optimization levels use different invalidation modes....

Copy link

edmorley commentedAug 30, 2024

However, when you don't specify any mode, it's not that simple anymore - you can either use the default one (timestamp), or you can use the one existing.pyc file uses.
I can imagine a use case for both of these, which means that there should probably also be a command line option that switches between those two, or a new default invalidation mode that would mean "use what the existing .pyc file uses, or default to timestamp if no pyc exists" and that means changes to the command line tool and API...

So if this were about normal Python operation (where it creates pyc files only as needed), I would agree that perhaps a pyc file existing but not being in the expected mode is good enough, and recompilation should be skipped.

However, IMO the whole point of thecompileall command is to explicitly compile all specified files. To me, the fact that some files aren't recompiled every time (ie: are skipped if the timestamp is up to date andforce hasn't been used) is more of an optimisation implementation detail.

As such, my inclination would be that ifcompileall has been requested to compile using a particular invalidation mode, then it should recompile any files that aren't using that mode. My gut feeling is that it would be very rare for anyone usingcompileall to actually want a mixture of invalidation modes to be used based on whether files had previously been compiled, and therefore that we don't need to add an extra option for it - but curious what others think :-)

Copy link

ContributorAuthor

kulikjak commentedAug 30, 2024•
edited
Loading

That makes sense; maybe I am overthinking it.

In the end, people either care aboutpyc not changing when the timestamp changes (our case) and then use hashes for everything, or they don't and stick to timestamps. A combination of different invalidation modes for different files (or even optimization levels) seems unlikely and unnecessary...

Maybe@benjaminp would have some thoughts (as an author of invalidation modes :))?

serhiy-storchaka reviewed

Jul 13, 2025

View reviewed changes

Lib/compileall.py

		else:
		# timestamp-based invalidation
		mtime = int(os.stat(fullname).st_mtime)
		actual = header[:12]

Copy link

Member

serhiy-storchakaJul 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Why not check also the file size?

Lib/compileall.py

		actual = chandle.read(12)
		header = chandle.read(16)

		if header[4]:

Copy link

Member

serhiy-storchakaJul 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

You can check the magic number without reading the source file.

Also, header[4] can raise IndexError. And you need to check other bytes and bits.

Suggested change

	ifheader[4]:
	iflen(header)<16orheader[:4]!=importlib.util.MAGIC_NUMBERorstruct.unpack('<L',header[4:8])[0]&~0b11:
	break
	ifheader[4]&0b1:

Lib/compileall.py

Comment on lines +239 to +241

		actual = header
		expect = struct.pack('<4sL8s', importlib.util.MAGIC_NUMBER,
		header[4], source_hash)

Copy link

Member

serhiy-storchakaJul 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

You can simply check the source hash:

Suggested change

	actual=header
	expect=struct.pack('<4sL8s',importlib.util.MAGIC_NUMBER,
	header[4],source_hash)
	ifheader[8:]!=source_hash:
	break

Labels

awaiting review

3 participants

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

gh-121577: fix compileall always recompiling pyc files with hash based invalidation#121960

Are you sure you want to change the base?

gh-121577: fix compileall always recompiling pyc files with hash based invalidation#121960

Uh oh!

Conversation

kulikjak commentedJul 18, 2024•
edited by bedevere-appbot
Loading

Uh oh!

Uh oh!

edmorley left a comment•
edited
Loading

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kulikjak commentedAug 30, 2024

Uh oh!

kulikjak commentedAug 30, 2024

Uh oh!

edmorley commentedAug 30, 2024

Uh oh!

kulikjak commentedAug 30, 2024•
edited
Loading

Uh oh!

Uh oh!

serhiy-storchakaJul 13, 2025

Choose a reason for hiding this comment

Uh oh!

serhiy-storchakaJul 13, 2025

Choose a reason for hiding this comment

Uh oh!

serhiy-storchakaJul 13, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Movatterモバイル変換

Uh oh!

gh-121577: fix compileall always recompiling pyc files with hash based invalidation#121960

Are you sure you want to change the base?

gh-121577: fix compileall always recompiling pyc files with hash based invalidation#121960

Uh oh!

Conversation

kulikjak commentedJul 18, 2024• edited by bedevere-appbotLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Uh oh!

edmorley left a comment• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kulikjak commentedAug 30, 2024

Uh oh!

kulikjak commentedAug 30, 2024

Uh oh!

edmorley commentedAug 30, 2024

Uh oh!

kulikjak commentedAug 30, 2024• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Uh oh!

serhiy-storchakaJul 13, 2025

Choose a reason for hiding this comment

Uh oh!

serhiy-storchakaJul 13, 2025

Choose a reason for hiding this comment

Uh oh!

serhiy-storchakaJul 13, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

kulikjak commentedJul 18, 2024•
edited by bedevere-appbot
Loading

edmorley left a comment•
edited
Loading

kulikjak commentedAug 30, 2024•
edited
Loading