Add Ascii85 and base85 encoder and decoder functions implemented in C tobinascii and use them to greatly improve the performance and reduce the memory usage of the existing Ascii85, base85, and Z85 codec functions inbase64.

No API or documentation changes are necessary with respect to any functions inbase64, and all existing unit tests for those functions continue to pass without modification.

Resolves:gh-101178

Discussion

The base85-related functions inbase64 are now wrappers for the new functions inbinascii, as envisioned in thedocs:

Thebinascii module contains a number of methods to convert between binary and various ASCII-encoded binary representations. Normally, you will not use these functions directly but use wrapper modules likeuu orbase64 instead. Thebinascii module contains low-level functions written in C for greater speed that are used by the higher-level modules.

Parting out Ascii85 from base85 and Z85 was warranted in my testing despite the code duplication due to the various performance-murdering special cases in Ascii85.

Comments and questions are welcome.

Benchmarks

Updated April 20, 2025.

# bench_b85.py# Note: EXTREMELY SLOW on unmodified mainline CPython#       when tracing malloc on the base-85 functions.importbase64importsysimporttimeitimporttracemallocfuncs= [(base64.b64encode,base64.b64decode),# sanity check/comparison         (base64.a85encode,base64.a85decode),         (base64.b85encode,base64.b85decode),         (base64.z85encode,base64.z85decode)]defmb(n):returnf"{n/1024/1024:.3f} MB"defstats(func,data,t,m):name,n,bps=func.__qualname__,len(data),len(data)/tprint(f"{name} :{n} b in{t:.3f} s ({mb(bps)}/s) using{mb(m)}")if__name__=="__main__":data=b"a"*int(sys.argv[1])*1024*1024forfenc,fdecinfuncs:tracemalloc.start()enc=fenc(data)menc=tracemalloc.get_traced_memory()[1]-len(enc)tracemalloc.stop()tenc=timeit.timeit("fenc(data)",number=1,globals=globals())stats(fenc,data,tenc,menc)tracemalloc.start()dec=fenc(enc)mdec=tracemalloc.get_traced_memory()[1]-len(dec)tracemalloc.stop()tdec=timeit.timeit("fdec(enc)",number=1,globals=globals())stats(fdec,enc,tdec,mdec)

# Python 3.14.0a7+ commit 78cfee6f09# ./configure --enable-optimizations --with-lto# With this PR$time ./python bench_b85.py 64b64encode: 67108864 bin 0.084 s (763.340 MB/s) using 42.667 MBb64decode: 89478488 bin 0.230 s (371.074 MB/s) using 56.889 MBa85encode: 67108864 bin 0.190 s (336.115 MB/s) using 0.000 MBa85decode: 83886080 bin 0.216 s (370.605 MB/s) using 0.000 MBb85encode: 67108864 bin 0.072 s (887.955 MB/s) using 0.000 MBb85decode: 83886080 bin 0.175 s (457.224 MB/s) using 0.000 MBz85encode: 67108864 bin 0.072 s (891.721 MB/s) using 0.000 MBz85decode: 83886080 bin 0.174 s (460.582 MB/s) using 0.000 MBreal    0m2.231suser    0m2.064ssys     0m0.156s# Unmodified$time ./python bench_b85.py 64b64encode: 67108864 bin 0.082 s (781.718 MB/s) using 42.667 MBb64decode: 89478488 bin 0.237 s (360.686 MB/s) using 56.889 MBa85encode: 67108864 bin 7.492 s (8.543 MB/s) using 2664.406 MBa85decode: 83886080 bin 14.264 s (5.609 MB/s) using 3332.254 MBb85encode: 67108864 bin 7.181 s (8.912 MB/s) using 2664.404 MBb85decode: 83886080 bin 8.486 s (9.427 MB/s) using 3332.254 MBz85encode: 67108864 bin 7.343 s (8.715 MB/s) using 2664.102 MBz85decode: 83886080 bin 8.778 s (9.113 MB/s) using 3332.254 MBreal    9m2.346suser    8m47.248ssys     0m12.460s

The old pure-Python implementation is two orders of magnitude slower and uses over O(40n) temporary memory.

bedevere-bot added the awaiting review label

Mar 16, 2023

Copy link

ghost commentedMar 16, 2023•
edited by ghost
Loading

All commit authors signed the Contributor License Agreement.

kangtastic changed the title~~Add Ascii85 and base85 support to binascii~~gh-101178: Add Ascii85 and base85 support to binascii

Mar 16, 2023

bedevere-bot mentioned this pull request

Mar 16, 2023

base64.b85encode uses significant amount of RAM#101178

Open

arhadthedev added the stdlibPython modules in the Lib dir label

Mar 23, 2023

Copy link

Author

kangtastic commentedMar 19, 2024•
edited
Loading

It's a year later, and Z85 support has been added tobase64 in the meantime. So while bringing this PR up to date with main, I added Z85 support to it as well.

For reference, this is the benchmark run that led me to do so.

# After merging main but before adding Z85 support to this PR(cpython-b85) $ python bench_b85.py 64b64encode: 67108864 bin 0.121 s (527.435 MB/s) using 42.667 MBb64decode: 89478488 bin 0.309 s (276.188 MB/s) using 56.889 MBa85encode: 67108864 bin 0.297 s (215.150 MB/s) using 0.000 MBa85decode: 83886080 bin 0.205 s (390.751 MB/s) using 0.000 MBb85encode: 67108864 bin 0.106 s (604.359 MB/s) using 0.000 MBb85decode: 83886080 bin 0.204 s (393.040 MB/s) using 0.000 MBz85encode: 67108864 bin 0.204 s (313.610 MB/s) using 80.000 MBz85decode: 83886080 bin 0.300 s (266.670 MB/s) using 100.000 MB

The existing Z85 implementation translates from the standard base85 alphabet to Z85 after the fact and within Python, so it was already benefiting from this PR but with substantial performance and memory usage overhead. That overhead is now gone.

kangtastic force-pushed thegh-101178-rework-base85 branch from71f1955 to7b4aba1Compare

March 19, 2024 09:27

Copy link

python-cla-botbot commentedApr 18, 2025•
edited
Loading

All commit authors signed the Contributor License Agreement.

Add Ascii85, base85, and Z85 support to binascii

05ae5ad

Add Ascii85, base85, and Z85 encoders and decoders to `binascii`,replacing the existing pure Python implementations in `base64`.No API or documentation changes are necessary with respect to`base64.a85encode()`, `b85encode()`, etc., and all existing unittests for those functions continue to pass without modification.Note that attempting to decode Ascii85 or base85 data of length 1 mod 5(after accounting for Ascii85 quirks) now produces an error, as noencoder would emit such data. This should be the only significantexternally visible difference compared to the old implementation.Resolves:pythongh-101178

kangtastic force-pushed thegh-101178-rework-base85 branch from7b4aba1 to05ae5adCompare

April 21, 2025 05:16

Copy link

Author

kangtastic commentedApr 21, 2025

PR has been rebased onto main at78cfee6 with squashing.

kangtastic changed the title~~gh-101178: Add Ascii85 and base85 support to binascii~~gh-101178: Add Ascii85. base85, and Z85 support to binascii

Apr 21, 2025

kangtastic changed the title~~gh-101178: Add Ascii85. base85, and Z85 support to binascii~~gh-101178: Add Ascii85, base85, and Z85 support to binascii

Apr 21, 2025

Copy link

Contributor

sergey-miryanov commentedApr 21, 2025

Note that attempting to decode Ascii85, base85, or Z85 data of length 1 mod 5 now produces an error, as no encoder would emit such data. This should be the only significant externally visible difference compared to the old implementations.

I believe you have to document this change.

Copy link

Author

kangtastic commentedApr 21, 2025

Note that attempting to decode Ascii85, base85, or Z85 data of length 1 mod 5 now produces an error, as no encoder would emit such data. This should be the only significant externally visible difference compared to the old implementations.
I believe you have to document this change.

Fair point, I could do that.

In case anyone argues for keeping the old behavior (silently ignoring length 1 mod 5), I won't do it just yet.

AA-Turner reviewed

Apr 24, 2025

View reviewed changes

Lib/base64.py Outdated

		_A85START = b"<~"
		_A85END = b"~>"

		def _85encode(b, chars, chars2, pad=False, foldnuls=False, foldspaces=False):

Copy link

Member

AA-TurnerApr 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

PerPEP-0399, the Python implementation must be kept, with the C accelerator and Python implementation tested to ensure they produce identical output.

Copy link

Author

kangtasticApr 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

FromPEP-0399:

If an acceleration module is provided it is to be named the same as the module it is accelerating with an underscore attached as a prefix, e.g., _warnings for warnings. The common pattern to access the accelerated code from the pure Python implementation is to import it with an import *, e.g., from _warnings import *. This is typically done at the end of the module to allow it to overwrite specific Python objects with their accelerated equivalents.

Although the effect is the same, there is a subtle difference in that strictly speaking, this PR isn't providing alternative C implementations for the base 85-related pure-Python functions inbase64. It's adding new functions into the existingbinascii C module and turning said Python functions into wrappers for them, which is in keeping with howbinascii andbase64 have historically been interrelated.

That difference means the guidelines inPEP-0399 don't apply cleanly. So e.g. creating a new_base64 C module doesn't make sense. Neither does trying to use the accelerated routines only if available, asbinascii will always be available.

Do you have any thoughts on how to keep the Python implementation in a way that works with Python's import system? I'm not familiar with an analogous situation in the rest of the codebase.

Copy link

Contributor

eendebakptApr 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

In#91610 a C version is added for deepcopy and unit tests are created for both the c and python implementation. If you search a bit in the codebase you can find some more examples.

Copy link

Author

kangtasticApr 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Never mind, all. I assumedbinascii, being part of the stdlib, would/should always be available.

I've since found thatquopri doesn't make that assumption. I'll do what it does.

kangtastic added5 commits

April 26, 2025 06:37

Restore base64.py

aa06c5d

Create _base64 module with wrappers for accelerated functions

6377440

If we were strictly following PEP-0399, _base64 would be a Cmodule for accelerated functions in base64. Due to historicalreasons, those should actually go in binascii instead.We still want to preserve the existing Python code in base64.Parting out facilities for accessing the C functions into amodule named _base64 shouldn't risk a naming conflict andwill simplify testing.

Test both Python and C codepaths in base64

6c0e4a3

This is done differently to PEP-0399 to minimize the number ofchanged lines.

Match behavior between Python and C base 85 functions

ce4773c

As we're now keeping the existing Python base 85 functions, the Cimplementations should behave exactly the same, down to exceptiontype and wording. It is also no longer an error to try to decodedata of length 1 mod 5.

Add Z85 tests to binascii

4072e3b

Copy link

Author

kangtastic commentedApr 27, 2025

The PR has been updated to preserve the existing base 85 Python functions inbase64 and modify the new base 85 C functions inbinascii to closely match their behavior. Notably, trying to decode data of length 1 mod 5 is no longer an error.

Update generated files

bc9217f

AA-Turner reviewed

Apr 27, 2025

View reviewed changes

Lib/_base64.py Outdated

		"""C accelerator wrappers for originally pure-Python parts of base64."""

		from binascii import Error, a2b_ascii85, a2b_base85, b2a_ascii85, b2a_base85
		from base64 import _bytes_from_decode_data, bytes_types

Copy link

Member

AA-TurnerApr 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

We should avoid import cycles like this, it can make refactoring in the future harder.

AA-Turner reviewed

Apr 27, 2025

View reviewed changes

Lib/base64.py Outdated

		try:
		from _base64 import (_a85encode, _a85decode, _b85encode,
		_b85decode, _z85encode, _z85decode)
		from functools import update_wrapper

Copy link

Member

AA-TurnerApr 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Functools is an expensive import, I would copy the relative parts ofupdate_wrapper() locally.

AA-Turner reviewed

Apr 27, 2025

View reviewed changes

Lib/test/test_base64.py Outdated

		c_base64 = import_fresh_module("base64", fresh=["_base64"])


		def with_c_implementation(test_func):

Copy link

Member

AA-TurnerApr 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Instead of a decorator, perhaps use the mixin approach from other modules.

kangtastic added4 commits

April 27, 2025 19:55

Avoid importing functools

2c40ba0

Importing update_wrapper() from functools to copy attributesis expensive. Do it ourselves for only the most relevant ones.

Avoid circular import in _base64

fd9eaf7

This requires some code duplication, but oh well.

Do not use a decorator for changing exception type

4746d18

Using a decorator complicates function signature introspection.

Test Python and C codepaths in base64 using mixins

d075593

Do we really need to test the legacy API twice?

kangtastic closed this

Apr 29, 2025

kangtastic reopened this

Apr 29, 2025

AA-Turner reviewed

Apr 29, 2025

View reviewed changes

Lib/base64.py Outdated

Comment on lines 581 to 582

		from _base64 import (_a85encode, _a85decode, _b85encode,
		_b85decode, _z85encode, _z85decode)

Copy link

Member

AA-TurnerApr 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Given these are already in a private module, you can remove the prefix. That means the_copy_attributes function only needs to copy__doc__, and__module__ can be set to the static'base64'.

Copy link

Author

kangtasticApr 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Done.

Copy link

Author

kangtastic commentedApr 29, 2025

PR was accidentally closed due to misclicking on mobile. There should be a confirmation dialog or something 😅

Remove leading underscore from functions in private module

6d65fec

Labels

awaiting review stdlib

Python modules in the Lib dir

6 participants

Movatterモバイル変換

Uh oh!

gh-101178: Add Ascii85, base85, and Z85 support to binascii#102753

Are you sure you want to change the base?

gh-101178: Add Ascii85, base85, and Z85 support to binascii#102753

Uh oh!

Conversation

kangtastic commentedMar 16, 2023• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Synopsis

Discussion

Benchmarks

Uh oh!

ghost commentedMar 16, 2023• edited by ghostLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Uh oh!

kangtastic commentedMar 19, 2024• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Uh oh!

python-cla-botbot commentedApr 18, 2025• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Uh oh!

kangtastic commentedApr 21, 2025

Uh oh!

sergey-miryanov commentedApr 21, 2025

Uh oh!

kangtastic commentedApr 21, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kangtastic commentedApr 27, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kangtastic commentedApr 29, 2025

Uh oh!

Uh oh!

kangtastic commentedMar 16, 2023•
edited
Loading

ghost commentedMar 16, 2023•
edited by ghost
Loading

kangtastic commentedMar 19, 2024•
edited
Loading

python-cla-botbot commentedApr 18, 2025•
edited
Loading