Uh oh!
There was an error while loading.Please reload this page.
- Notifications
You must be signed in to change notification settings - Fork32k
gh-101178: Add Ascii85, base85, and Z85 support to binascii#102753
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
base:main
Are you sure you want to change the base?
Uh oh!
There was an error while loading.Please reload this page.
Conversation
ghost commentedMar 16, 2023 • edited by ghost
Loading Uh oh!
There was an error while loading.Please reload this page.
edited by ghost
Uh oh!
There was an error while loading.Please reload this page.
kangtastic commentedMar 19, 2024 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
It's a year later, and Z85 support has been added to For reference, this is the benchmark run that led me to do so. # After merging main but before adding Z85 support to this PR(cpython-b85) $ python bench_b85.py 64b64encode: 67108864 bin 0.121 s (527.435 MB/s) using 42.667 MBb64decode: 89478488 bin 0.309 s (276.188 MB/s) using 56.889 MBa85encode: 67108864 bin 0.297 s (215.150 MB/s) using 0.000 MBa85decode: 83886080 bin 0.205 s (390.751 MB/s) using 0.000 MBb85encode: 67108864 bin 0.106 s (604.359 MB/s) using 0.000 MBb85decode: 83886080 bin 0.204 s (393.040 MB/s) using 0.000 MBz85encode: 67108864 bin 0.204 s (313.610 MB/s) using 80.000 MBz85decode: 83886080 bin 0.300 s (266.670 MB/s) using 100.000 MB The existing Z85 implementation translates from the standard base85 alphabet to Z85 after the fact and within Python, so it was already benefiting from this PR but with substantial performance and memory usage overhead. That overhead is now gone. |
71f1955
to7b4aba1
Comparepython-cla-botbot commentedApr 18, 2025 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
Add Ascii85, base85, and Z85 encoders and decoders to `binascii`,replacing the existing pure Python implementations in `base64`.No API or documentation changes are necessary with respect to`base64.a85encode()`, `b85encode()`, etc., and all existing unittests for those functions continue to pass without modification.Note that attempting to decode Ascii85 or base85 data of length 1 mod 5(after accounting for Ascii85 quirks) now produces an error, as noencoder would emit such data. This should be the only significantexternally visible difference compared to the old implementation.Resolves:pythongh-101178
7b4aba1
to05ae5ad
ComparePR has been rebased onto main at78cfee6 with squashing. |
I believe you have to document this change. |
Fair point, I could do that. In case anyone argues for keeping the old behavior (silently ignoring length 1 mod 5), I won't do it just yet. |
Lib/base64.py Outdated
_A85START = b"<~" | ||
_A85END = b"~>" | ||
def _85encode(b, chars, chars2, pad=False, foldnuls=False, foldspaces=False): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
PerPEP-0399, the Python implementation must be kept, with the C accelerator and Python implementation tested to ensure they produce identical output.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
FromPEP-0399:
If an acceleration module is provided it is to be named the same as the module it is accelerating with an underscore attached as a prefix, e.g., _warnings for warnings. The common pattern to access the accelerated code from the pure Python implementation is to import it with an import *, e.g., from _warnings import *. This is typically done at the end of the module to allow it to overwrite specific Python objects with their accelerated equivalents.
Although the effect is the same, there is a subtle difference in that strictly speaking, this PR isn't providing alternative C implementations for the base 85-related pure-Python functions inbase64
. It's adding new functions into the existingbinascii
C module and turning said Python functions into wrappers for them, which is in keeping with howbinascii
andbase64
have historically been interrelated.
That difference means the guidelines inPEP-0399 don't apply cleanly. So e.g. creating a new_base64
C module doesn't make sense. Neither does trying to use the accelerated routines only if available, asbinascii
will always be available.
Do you have any thoughts on how to keep the Python implementation in a way that works with Python's import system? I'm not familiar with an analogous situation in the rest of the codebase.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
In#91610 a C version is added for deepcopy and unit tests are created for both the c and python implementation. If you search a bit in the codebase you can find some more examples.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Never mind, all. I assumedbinascii
, being part of the stdlib, would/should always be available.
I've since found thatquopri doesn't make that assumption. I'll do what it does.
If we were strictly following PEP-0399, _base64 would be a Cmodule for accelerated functions in base64. Due to historicalreasons, those should actually go in binascii instead.We still want to preserve the existing Python code in base64.Parting out facilities for accessing the C functions into amodule named _base64 shouldn't risk a naming conflict andwill simplify testing.
This is done differently to PEP-0399 to minimize the number ofchanged lines.
As we're now keeping the existing Python base 85 functions, the Cimplementations should behave exactly the same, down to exceptiontype and wording. It is also no longer an error to try to decodedata of length 1 mod 5.
The PR has been updated to preserve the existing base 85 Python functions in |
Lib/_base64.py Outdated
"""C accelerator wrappers for originally pure-Python parts of base64.""" | ||
from binascii import Error, a2b_ascii85, a2b_base85, b2a_ascii85, b2a_base85 | ||
from base64 import _bytes_from_decode_data, bytes_types |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
We should avoid import cycles like this, it can make refactoring in the future harder.
Lib/base64.py Outdated
try: | ||
from _base64 import (_a85encode, _a85decode, _b85encode, | ||
_b85decode, _z85encode, _z85decode) | ||
from functools import update_wrapper |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Functools is an expensive import, I would copy the relative parts ofupdate_wrapper()
locally.
Lib/test/test_base64.py Outdated
c_base64 = import_fresh_module("base64", fresh=["_base64"]) | ||
def with_c_implementation(test_func): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Instead of a decorator, perhaps use the mixin approach from other modules.
Importing update_wrapper() from functools to copy attributesis expensive. Do it ourselves for only the most relevant ones.
This requires some code duplication, but oh well.
Using a decorator complicates function signature introspection.
Do we really need to test the legacy API twice?
Lib/base64.py Outdated
from _base64 import (_a85encode, _a85decode, _b85encode, | ||
_b85decode, _z85encode, _z85decode) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Given these are already in a private module, you can remove the prefix. That means the_copy_attributes
function only needs to copy__doc__
, and__module__
can be set to the static'base64'
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Done.
PR was accidentally closed due to misclicking on mobile. There should be a confirmation dialog or something 😅 |
Uh oh!
There was an error while loading.Please reload this page.
Synopsis
Add Ascii85 and base85 encoder and decoder functions implemented in C to
binascii
and use them to greatly improve the performance and reduce the memory usage of the existing Ascii85, base85, and Z85 codec functions inbase64
.No API or documentation changes are necessary with respect to any functions in
base64
, and all existing unit tests for those functions continue to pass without modification.Resolves:gh-101178
Discussion
The base85-related functions in
base64
are now wrappers for the new functions inbinascii
, as envisioned in thedocs:Parting out Ascii85 from base85 and Z85 was warranted in my testing despite the code duplication due to the various performance-murdering special cases in Ascii85.
Comments and questions are welcome.
Benchmarks
Updated April 20, 2025.
The old pure-Python implementation is two orders of magnitude slower and uses over O(40n) temporary memory.