Uh oh!
There was an error while loading.Please reload this page.
- Notifications
You must be signed in to change notification settings - Fork32.4k
gh-121577: fix compileall always recompiling pyc files with hash based invalidation#121960
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
base:main
Are you sure you want to change the base?
Uh oh!
There was an error while loading.Please reload this page.
Conversation
edmorley left a comment• edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Thank you for working on this!
There is also another related bug with compileall that I think might be worth trying to fix at the same time, since it's directly related:#117204
That is, I think the logic should be:
- If header of file doesn't match requested invalidation mode, then skip checking the timestamps or checksum, and instead always recompile
- If the header does match the requested invalidation mode, then check the file is up to date (via either the timestamp or hash, as appropriate), and recompile if needed
In addition, I think the current version of this PR worsens performance in the "compiling multiple optimisation levels" scenario, since the original source timestamp and/or expected hash is calculated multiple times (since those steps were moved inside the loop) even though those values are constant regardless of optimisation level of the output bytecode. As such, it might be worth moving those back outside the loop.
I am aware of#117204, but I don't think it's as simple as that. I do agree that if you specify the hash-based invalidation and the target uses the timestamp and is up-to-date, I can imagine a use case for both of these, which means that there should probably also be a command line option that switches between those two, or a new default invalidation mode that would mean "use what the existing I don't think I am the one who should decide how exactly That said, it's something that should probably be looked into (and I am happy to help with that once we know how it should behave ;)).
Good catch - I will do so. |
Oh, and there's also a question about what to do when different optimization levels use different invalidation modes.... |
edmorley commentedAug 30, 2024
So if this were about normal Python operation (where it creates pyc files only as needed), I would agree that perhaps a pyc file existing but not being in the expected mode is good enough, and recompilation should be skipped. However, IMO the whole point of the As such, my inclination would be that if |
kulikjak commentedAug 30, 2024 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
That makes sense; maybe I am overthinking it. In the end, people either care about Maybe@benjaminp would have some thoughts (as an author of invalidation modes :))? |
else: | ||
# timestamp-based invalidation | ||
mtime = int(os.stat(fullname).st_mtime) | ||
actual = header[:12] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Why not check also the file size?
actual = chandle.read(12) | ||
header = chandle.read(16) | ||
if header[4]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
You can check the magic number without reading the source file.
Also, header[4] can raise IndexError. And you need to check other bytes and bits.
ifheader[4]: | |
iflen(header)<16orheader[:4]!=importlib.util.MAGIC_NUMBERorstruct.unpack('<L',header[4:8])[0]&~0b11: | |
break | |
ifheader[4]&0b1: |
actual = header | ||
expect = struct.pack('<4sL8s', importlib.util.MAGIC_NUMBER, | ||
header[4], source_hash) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
You can simply check the source hash:
actual=header | |
expect=struct.pack('<4sL8s',importlib.util.MAGIC_NUMBER, | |
header[4],source_hash) | |
ifheader[8:]!=source_hash: | |
break |
Uh oh!
There was an error while loading.Please reload this page.