Uh oh!
There was an error while loading.Please reload this page.
- Notifications
You must be signed in to change notification settings - Fork32k
gh-114953: Support for multithreaded XZ compression in lzma.py#114954
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
base:main
Are you sure you want to change the base?
Conversation
ghost commentedFeb 3, 2024 • edited by ghost
Loading Uh oh!
There was an error while loading.Please reload this page.
edited by ghost
Uh oh!
There was an error while loading.Please reload this page.
Most changes to Pythonrequire a NEWS entry. Add one using theblurb_it web app or theblurb command-line tool. If this change has little impact on Python users, wait for a maintainer to apply the |
760d2c3
to3848356
CompareCould you create some benchmarks for this? Also, please align the code to followPEP-7. |
mkomet commentedFeb 3, 2024 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
Aligned to PEP7. Regarding the benchmarks, ran benchmark on Ubuntu 16 image (similar to Phoronixhttps://openbenchmarking.org/test/pts/compress-xz). Attached results running on 12 core i7-10750H (2.60GHz), with 16 GiB RAM xz_bench.py"""./python xz_bench.py"""importjsonimportlzmaimporttime# Taken from:# https://old-releases.ubuntu.com/releases/16.04.3/ubuntu-16.04.3-server-i386.imgwithopen("ubuntu-16.04.3-server-i386.img","rb")asreader:data=reader.read()# First 20 MiBdata=data[: (20* (2<<20))]threads_list= [1,0,2,4]compression_results= {thread: {"times": [],"ratios": []}forthreadinthreads_list}forthreadinthreads_list:forpresetinrange(10):start_time=time.time()compressed_data=lzma.compress(data,preset=preset,threads=thread)end_time=time.time()compression_results[thread]["times"].append(end_time-start_time)ratio=len(data)/len(compressed_data)compression_results[thread]["ratios"].append(ratio)withopen("compression_results.json","w")asresult_file:json.dump(compression_results,result_file) plot.py"""python3 -m pip install matplotlibpython3 plot.py"""importjsonimportmatplotlib.pyplotaspltwithopen("compression_results.json","r")asresult_file:loaded_results=json.load(result_file)presets=list(range(10))# Create line graph for compression timesplt.figure(figsize=(12,5))plt.subplot(1,2,1)forthread,datainloaded_results.items():times=data["times"]plt.plot(presets,times,label=f"Threads={thread}")plt.xlabel("Preset")plt.ylabel("Time (s)")plt.title("XZ Compression Benchmark - Compression Times (20 MiB)")plt.legend()# Create line graph for compression ratiosplt.subplot(1,2,2)forthread,datainloaded_results.items():ratios=data["ratios"]plt.plot(presets,ratios,label=f"Threads={thread}")plt.xlabel("Preset")plt.ylabel("Compression Ratio")plt.title("XZ Compression Benchmark - Compression Ratios (20 MiB)")plt.legend()plt.tight_layout()plt.show() Are there any other benchmarks you would like to see run? |
Signed-off-by: Meir Komet <mskomet1@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Could you wire this up for the decompressor too?
@thesamesam
(A possible implementation option would be to support multi-threaded decompression in the cpython |
thesamesam commentedFeb 11, 2024 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
Your xz is old, I think. It does support it now. What version do you have? EDIT: Specifically, this was introduced in xz-utils-5.4.0, released on 2022-12-13. The release notes say, among other things:
|
Please update the Autoconf script to detect the required functions/symbols, add C pre-processor defines for mt encoding/decoding, and use these to guard the newly added functionality. That way we won't break users with older XZ libs. |
threads: int(c_default="1") = 1 | ||
Number of threads to use for compression (only relevant with FORMAT_XZ). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Would it be better to keep the Python API unchanged, and implicitly use the cores available?
Artoria2e5Feb 13, 2024 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Hmm. This would usually be an improvement. There is a little bit of risk in the way of someone running multiple instances in parallel.
(Also, is it possible to do a chunked compress (which allows parallel decomp) with just one thread? What happens when you tellencoder_mt
to do 1 thread? Do we want to expose that potential difference?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
IIRC, yes, it's possible to use the chunked/parallel compressor with one thread. xz 5.6.0 will default to this for its command line interface at least, because it enablesdecompression in parallel for the archives.
Artoria2e5Feb 13, 2024 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
In that case it may be beneficial to default to 1 thread with chunks. If you still want to have a "use the non-chunked thing" option, maybe we can pass some silly value like -1 threads toCompressor_init_xz
.
Something along the lines ofArtoria2e5@40029fb. (Treat as a long comment, not as a serious commit.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
FTR: xz.git git:(master)$ git tag --contains 4cce3e27f529af33e0e7749a8cbcec59954946b5 | head -n 1v5.3.3alpha |
Yeah, they do odd/even for development (not ABI stable) and stable/prod releases, so 5.4.0 was the first stable version it was available in. |
Right; I forgot that was a thing. GNU LilyPond used to do that as well (and they probably still do; haven't checked in years.) |
thesamesam commentedApr 27, 2025 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
@mkomet Are you interested in finishing this off (at least for threaded compression, we can revisit for parallel decompression)? If not, I can put it on my list to try get to, but I can't promise that will be immediately. |
jave27 commentedMay 7, 2025
Note that XZ Utils < 5.8.1 has a CVE specifically about this function:https://nvd.nist.gov/vuln/detail/CVE-2025-31115. So if this goes in, that will need to be considered for public releases. |
Uh oh!
There was an error while loading.Please reload this page.