Uh oh!
There was an error while loading.Please reload this page.
- Notifications
You must be signed in to change notification settings - Fork33.3k
GH-99554:marshal bytecode more efficiently#99555
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
Uh oh!
There was an error while loading.Please reload this page.
Conversation
Hm, that's annoying. It appears that I might need to find a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Just one drive-by comment ;)
Python/marshal.c Outdated
| assert(0x00 <=oparg&&oparg<0x100); | ||
| buffer[i++]=_Py_MAKECODEUNIT(opcode,oparg); | ||
| for (intj=0;j<_PyOpcode_Caches[opcode];j++) { | ||
| buffer[i++]=_Py_MAKECODEUNIT(CACHE,oparg); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Maybe assert thati < size here, since there is potential for a buffer overrun here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
TBH, I'm a bit torn. This looks brilliant, but there's one thing that worries me -- unless you have the correct table of inline cache entries and know the value ofHAVE_ARGUMENT you can't evenskip the bytecode, which means you can't do anything else with the code object (e.g. looking atco_consts orco_names or evenco_firstlineno).
That would be easier if the size you wrote was the number of bytes written instead of the number of code units including cache; but thenr_bytecode() would have to make a guess at the size of thebytes object it is allocating and perhaps reallocate it later, which isn't ideal either.
How would you feel about writingboth numbers to the file? That wastes 4 bytes per code object, but makes decoding more robust.
FWIW: Here's a wild scheme for avoiding the extra copy of the bytecode (which would presumably save a little time and memory on all imports). Once you have read the size of the bytecode you could just allocate a blankPyCodeObject object of the right size, and then you could deposit the expanded bytecode directly into that. Then thestruct _PyCodeConstructor-based API would have to allow a field where you pass in that object and it would just initialize the rest based on the various given fields and return it (or dealloc it if there's another error). A little fiddly and not something for this PR, but given how heroic this all already sounds, perhaps worth the effort in the future.
Tools/build/umarshal.py Outdated
| bytecode=bytearray() | ||
| whilelen(bytecode)<nbytes: | ||
| opcode_byte=self.r_byte() | ||
| ifopcode.HAVE_ARGUMENT<=opcode_byte: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
This looks weird. I'm so used toif x >= N that seeingif N <= x just feels wrong.
| ifopcode.HAVE_ARGUMENT<=opcode_byte: | |
| ifopcode_byte>=opcode.HAVE_ARGUMENT: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
const == variable is a C-ism to prevent accidentally writing= instead of==. No need in Python.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
I have a weird habit where I always use< and<= for comparisons, regardless of the placement of constants, etc. I guess I like the parallel with chained comparisons likelo <= x < hi, which almost always use< and<=.
I'll change it, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Looks good.
Any performance numbers? (not that it matters, the 25% space saving is worth it).
Tools/build/deepfreeze.py Outdated
| fromtypingimportDict,FrozenSet,TextIO,Tuple | ||
| importumarshal | ||
| importopcode_for_buildasopcode |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
This isn't a normal import so it shouldn't look like one. Maybe something likeopcode = opcode_finder.get_opcodes()
Tools/build/umarshal.py Outdated
| bytecode=bytearray() | ||
| whilelen(bytecode)<nbytes: | ||
| opcode_byte=self.r_byte() | ||
| ifopcode.HAVE_ARGUMENT<=opcode_byte: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
const == variable is a C-ism to prevent accidentally writing= instead of==. No need in Python.
| w= (int)s; | ||
| if (_Py_hashtable_set(p->hashtable,Py_NewRef(v), | ||
| (void*)(uintptr_t)w)<0) { | ||
| Py_DECREF(v); | ||
| gotoerr; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Honestly I'd rather see thei++ on separate lines instead of this hybrid approach.
We can do that. The only awkward part is that this requires walking over the bytecode twice when marshalling: once to count the number of bytes to be written (for the header), and another to actually write them. |
brandtbucher commentedDec 22, 2022 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
(Another option could just be to terminate the serialized bytecode with two zero bytes. That can't appear anywhere in the compressed string, but is a tad bit hacky.) |
1.00x faster: |
| // Terminate with two zero bytes, so that programs scanning .pyc files can | ||
| // skip over the bytecode (even if they don't know the compression scheme). | ||
| // This is simpler than writing the compressed size in the header, which | ||
| // requires two loops (one to count the bytes, then one to write them): | ||
| w_short(0,p); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Ugh, sorry, I can't stomach this. It feels certain that at some point in the future we'll have a corner case where we encounter \0\0 in the middle. If an extra pass is too expensive let's just go with what you had before. Or let's not do this -- I am feeling pretty lukewarm about this scheme, especially since the register VM will require a whole new approach anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Personally my vote is to abandon this.
Hm, that's a shame. I'm still not personally sold on the motivation for including this extra metadata at all. Are there really that many packages reading raw If so, these tools already need to understand how the other members of the code object are laid out, and in what formats (something which changes between Python versions). I don't think that the overhead of those tools needing
I wouldn't call it a "whole new approach". The cache compression will still be very valuable, I think. (I'm also not willing to die on this hill though. 25% savings on disk is certainly nice, but it's probably not a game-changer for most users.) |
To be clear, it's not the choice between adding the byte count at the cost of an extra pass vs. requiring people to parse the bytecode in order to implement their own marshal. Most tools doing that would be able to just use the marshal module (the code object format there changes in every release so they'd have to anyways) or copy umarshal.py and friends from our Tools directory. It's more that this adds a fair amount of code complexity and what that buys us is merely slightly smaller PYC files, which I'm not sure anybody cares for. Plus (as I mentioned) I worry that we'll have to redo this for the register machine, which completely changes the opcode format. |
Okay, makes sense. I'll drop this for now, and we can maybe revisit after the opcode format has stabilized. |
Uh oh!
There was an error while loading.Please reload this page.
This shrinks the size of
.pycfiles by about 25%. It also removes an intermediate copy of the bytecode created during marshalling.Next steps will be removing the intermediate
bytesobject created during unmarshalling and performing the work of_PyCode_Quickenas part of this same move..pycfiles are larger than they need to be #99554