Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

gh-121577: fix compileall always recompiling pyc files with hash based invalidation#121960

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Open
kulikjak wants to merge5 commits intopython:main
base:main
Choose a base branch
Loading
fromkulikjak:compileall-invalidation

Conversation

kulikjak
Copy link
Contributor

@kulikjakkulikjak commentedJul 18, 2024
edited by bedevere-appbot
Loading

edmorley reacted with thumbs up emoji
Copy link

@edmorleyedmorley left a comment
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Thank you for working on this!

There is also another related bug with compileall that I think might be worth trying to fix at the same time, since it's directly related:#117204

That is, I think the logic should be:

  • If header of file doesn't match requested invalidation mode, then skip checking the timestamps or checksum, and instead always recompile
  • If the header does match the requested invalidation mode, then check the file is up to date (via either the timestamp or hash, as appropriate), and recompile if needed

In addition, I think the current version of this PR worsens performance in the "compiling multiple optimisation levels" scenario, since the original source timestamp and/or expected hash is calculated multiple times (since those steps were moved inside the loop) even though those values are constant regardless of optimisation level of the output bytecode. As such, it might be worth moving those back outside the loop.

wimglenn reacted with thumbs up emoji
@kulikjak
Copy link
ContributorAuthor

There is also another related bug with compileall that I think might be worth trying to fix at the same time, since it's directly related:#117204

That is, I think the logic should be:

  • If header of file doesn't match requested invalidation mode, then skip checking the timestamps or checksum, and instead always recompile
  • If the header does match the requested invalidation mode, then check the file is up to date (via either the timestamp or hash, as appropriate), and recompile if needed

I am aware of#117204, but I don't think it's as simple as that.

I do agree that if you specify the hash-based invalidation and the target uses the timestamp and is up-to-date,compileall shouldn't silently do nothing (which is the current behavior). However, when you don't specify any mode, it's not that simple anymore - you can either use the default one (timestamp), or you can use the one existing.pyc file uses.

I can imagine a use case for both of these, which means that there should probably also be a command line option that switches between those two, or a new default invalidation mode that would mean "use what the existing.pyc file uses, or default to timestamp if nopyc exists" and that means changes to the command line tool and API...

I don't think I am the one who should decide how exactlycompileall will behave. That's why I didn't look into that one more.

That said, it's something that should probably be looked into (and I am happy to help with that once we know how it should behave ;)).

In addition, I think the current version of this PR worsens performance in the "compiling multiple optimisation levels" scenario, since the original source timestamp and/or expected hash is calculated multiple times (since those steps were moved inside the loop) even though those values are constant regardless of optimisation level of the output bytecode. As such, it might be worth moving those back outside the loop.

Good catch - I will do so.

@kulikjak
Copy link
ContributorAuthor

Oh, and there's also a question about what to do when different optimization levels use different invalidation modes....

@edmorley
Copy link

However, when you don't specify any mode, it's not that simple anymore - you can either use the default one (timestamp), or you can use the one existing.pyc file uses.

I can imagine a use case for both of these, which means that there should probably also be a command line option that switches between those two, or a new default invalidation mode that would mean "use what the existing .pyc file uses, or default to timestamp if no pyc exists" and that means changes to the command line tool and API...

So if this were about normal Python operation (where it creates pyc files only as needed), I would agree that perhaps a pyc file existing but not being in the expected mode is good enough, and recompilation should be skipped.

However, IMO the whole point of thecompileall command is to explicitly compile all specified files. To me, the fact that some files aren't recompiled every time (ie: are skipped if the timestamp is up to date andforce hasn't been used) is more of an optimisation implementation detail.

As such, my inclination would be that ifcompileall has been requested to compile using a particular invalidation mode, then it should recompile any files that aren't using that mode. My gut feeling is that it would be very rare for anyone usingcompileall to actually want a mixture of invalidation modes to be used based on whether files had previously been compiled, and therefore that we don't need to add an extra option for it - but curious what others think :-)

@kulikjak
Copy link
ContributorAuthor

kulikjak commentedAug 30, 2024
edited
Loading

That makes sense; maybe I am overthinking it.

In the end, people either care aboutpyc not changing when the timestamp changes (our case) and then use hashes for everything, or they don't and stick to timestamps. A combination of different invalidation modes for different files (or even optimization levels) seems unlikely and unnecessary...

Maybe@benjaminp would have some thoughts (as an author of invalidation modes :))?

else:
# timestamp-based invalidation
mtime = int(os.stat(fullname).st_mtime)
actual = header[:12]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Why not check also the file size?

actual = chandle.read(12)
header = chandle.read(16)

if header[4]:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

You can check the magic number without reading the source file.

Also, header[4] can raise IndexError. And you need to check other bytes and bits.

Suggested change
ifheader[4]:
iflen(header)<16orheader[:4]!=importlib.util.MAGIC_NUMBERorstruct.unpack('<L',header[4:8])[0]&~0b11:
break
ifheader[4]&0b1:

Comment on lines +239 to +241
actual = header
expect = struct.pack('<4sL8s', importlib.util.MAGIC_NUMBER,
header[4], source_hash)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

You can simply check the source hash:

Suggested change
actual=header
expect=struct.pack('<4sL8s',importlib.util.MAGIC_NUMBER,
header[4],source_hash)
ifheader[8:]!=source_hash:
break

Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment
Reviewers

@edmorleyedmorleyedmorley left review comments

@serhiy-storchakaserhiy-storchakaserhiy-storchaka left review comments

Assignees
No one assigned
Projects
None yet
Milestone
No milestone
Development

Successfully merging this pull request may close these issues.

3 participants
@kulikjak@edmorley@serhiy-storchaka

[8]ページ先頭

©2009-2025 Movatter.jp