NotificationsYou must be signed in to change notification settings
Fork32k
Star67.3k

gh-114953: Support for multithreaded XZ compression in lzma.py#114954

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Jump to bottom

Open

mkomet wants to merge1 commit intopython:main

base:main

Choose a base branch

frommkomet:komet/lzma-mt

Open

gh-114953: Support for multithreaded XZ compression in lzma.py#114954

mkomet wants to merge1 commit intopython:mainfrommkomet:komet/lzma-mt

+100 −17

Conversation

Copy link

mkomet commentedFeb 3, 2024•
edited by bedevere-appbot
Loading

Issue:LZMA MultiThreading XZ compression support #114953

Copy link

ghost commentedFeb 3, 2024•
edited by ghost
Loading

All commit authors signed the Contributor License Agreement.

bedevere-appbot added the awaiting review label

Feb 3, 2024

bedevere-appbot mentioned this pull request

Feb 3, 2024

LZMA MultiThreading XZ compression support#114953

Open

Copy link

bedevere-appbot commentedFeb 3, 2024

Most changes to Pythonrequire a NEWS entry. Add one using theblurb_it web app or theblurb command-line tool.

If this change has little impact on Python users, wait for a maintainer to apply theskip news label instead.

mkomet force-pushed thekomet/lzma-mt branch 3 times, most recently from760d2c3 to3848356Compare

February 3, 2024 11:27

Copy link

Contributor

erlend-aasland commentedFeb 3, 2024

Could you create some benchmarks for this? Also, please align the code to followPEP-7.

mkomet force-pushed thekomet/lzma-mt branch from3848356 todc55f0fCompare

February 3, 2024 13:11

Copy link

Author

mkomet commentedFeb 3, 2024•
edited
Loading

Could you create some benchmarks for this? Also, please align the code to followPEP-7.

@erlend-aasland

Aligned to PEP7.

Regarding the benchmarks, ran benchmark on Ubuntu 16 image (similar to Phoronixhttps://openbenchmarking.org/test/pts/compress-xz).

Attached results running on 12 core i7-10750H (2.60GHz), with 16 GiB RAM

xz_bench.py

"""./python xz_bench.py"""importjsonimportlzmaimporttime# Taken from:# https://old-releases.ubuntu.com/releases/16.04.3/ubuntu-16.04.3-server-i386.imgwithopen("ubuntu-16.04.3-server-i386.img","rb")asreader:data=reader.read()# First 20 MiBdata=data[: (20* (2<<20))]threads_list= [1,0,2,4]compression_results= {thread: {"times": [],"ratios": []}forthreadinthreads_list}forthreadinthreads_list:forpresetinrange(10):start_time=time.time()compressed_data=lzma.compress(data,preset=preset,threads=thread)end_time=time.time()compression_results[thread]["times"].append(end_time-start_time)ratio=len(data)/len(compressed_data)compression_results[thread]["ratios"].append(ratio)withopen("compression_results.json","w")asresult_file:json.dump(compression_results,result_file)

plot.py

"""python3 -m pip install matplotlibpython3 plot.py"""importjsonimportmatplotlib.pyplotaspltwithopen("compression_results.json","r")asresult_file:loaded_results=json.load(result_file)presets=list(range(10))# Create line graph for compression timesplt.figure(figsize=(12,5))plt.subplot(1,2,1)forthread,datainloaded_results.items():times=data["times"]plt.plot(presets,times,label=f"Threads={thread}")plt.xlabel("Preset")plt.ylabel("Time (s)")plt.title("XZ Compression Benchmark - Compression Times (20 MiB)")plt.legend()# Create line graph for compression ratiosplt.subplot(1,2,2)forthread,datainloaded_results.items():ratios=data["ratios"]plt.plot(presets,ratios,label=f"Threads={thread}")plt.xlabel("Preset")plt.ylabel("Compression Ratio")plt.title("XZ Compression Benchmark - Compression Ratios (20 MiB)")plt.legend()plt.tight_layout()plt.show()

Are there any other benchmarks you would like to see run?
Or alternatively, is there a way to integrate the benchmarks into the tests themselves?

pythongh-114953: Support for multithreaded XZ compression in lzma.py

b796221

Signed-off-by: Meir Komet <mskomet1@gmail.com>

mkomet force-pushed thekomet/lzma-mt branch frome41182e tob796221Compare

February 3, 2024 15:54

thesamesam reviewed

Feb 5, 2024

View reviewed changes

Copy link

Contributor

thesamesam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Could you wire this up for the decompressor too?

Copy link

Author

mkomet commentedFeb 10, 2024

@thesamesam
Multi-threaded decompression is not implemented in the liblzma lib as of yet.
To quote thexz(1) man page:

Threaded decompression hasn't been implemented yet. It will only work on files that contain multiple blocks with size information in block headers. All files compressed in multi-threaded mode meet this condition, but files compressed in single-threaded mode don't even if --block-size=size is used.

(A possible implementation option would be to support multi-threaded decompression in the cpython_lzmamodule.c liblzma wrapper functions, but I don't think that's a wise rabbit-hole to venture down :) )

Copy link

Contributor

thesamesam commentedFeb 11, 2024•
edited
Loading

Your xz is old, I think. It does support it now. What version do you have?

EDIT: Specifically, this was introduced in xz-utils-5.4.0, released on 2022-12-13. The release notes say, among other things:

Added threaded .xz decompressor lzma_stream_decoder_mt().
It can use multiple threads with .xz files that have multiple
Blocks with size information in Block Headers. The threaded
encoder in xz has always created such files.

Copy link

Contributor

erlend-aasland commentedFeb 12, 2024

Please update the Autoconf script to detect the required functions/symbols, add C pre-processor defines for mt encoding/decoding, and use these to guard the newly added functionality. That way we won't break users with older XZ libs.

erlend-aasland reviewed

Feb 12, 2024

View reviewed changes

Modules/_lzmamodule.c

Comment on lines +801 to +802

		threads: int(c_default="1") = 1
		Number of threads to use for compression (only relevant with FORMAT_XZ).

Copy link

Contributor

erlend-aaslandFeb 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Would it be better to keep the Python API unchanged, and implicitly use the cores available?

Copy link

Artoria2e5Feb 13, 2024•
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Hmm. This would usually be an improvement. There is a little bit of risk in the way of someone running multiple instances in parallel.

(Also, is it possible to do a chunked compress (which allows parallel decomp) with just one thread? What happens when you tellencoder_mt to do 1 thread? Do we want to expose that potential difference?)

Copy link

Contributor

thesamesamFeb 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

IIRC, yes, it's possible to use the chunked/parallel compressor with one thread. xz 5.6.0 will default to this for its command line interface at least, because it enablesdecompression in parallel for the archives.

Copy link

Artoria2e5Feb 13, 2024•
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

In that case it may be beneficial to default to 1 thread with chunks. If you still want to have a "use the non-chunked thing" option, maybe we can pass some silly value like -1 threads toCompressor_init_xz.

Something along the lines ofArtoria2e5@40029fb. (Treat as a long comment, not as a serious commit.)

Copy link

Contributor

erlend-aaslandFeb 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

erlend-aasland commentedFeb 12, 2024

Your xz is old, I think. It does support it now. What version do you have?
EDIT: Specifically, this was introduced in xz-utils-5.4.0, released on 2022-12-13.

FTR:

xz.git git:(master)$ git tag --contains 4cce3e27f529af33e0e7749a8cbcec59954946b5 | head -n 1v5.3.3alpha

Copy link

Contributor

thesamesam commentedFeb 12, 2024

Yeah, they do odd/even for development (not ABI stable) and stable/prod releases, so 5.4.0 was the first stable version it was available in.

Copy link

Contributor

erlend-aasland commentedFeb 12, 2024

Yeah, they do odd/even for development (not ABI stable) and stable/prod releases, so 5.4.0 was the first stable version it was available in.

Right; I forgot that was a thing. GNU LilyPond used to do that as well (and they probably still do; haven't checked in years.)

Copy link

Contributor

thesamesam commentedApr 27, 2025•
edited
Loading

@mkomet Are you interested in finishing this off (at least for threaded compression, we can revisit for parallel decompression)? If not, I can put it on my list to try get to, but I can't promise that will be immediately.

Copy link

jave27 commentedMay 7, 2025

Note that XZ Utils < 5.8.1 has a CVE specifically about this function:https://nvd.nist.gov/vuln/detail/CVE-2025-31115. So if this goes in, that will need to be considered for public releases.

Labels

awaiting review

5 participants

Movatterモバイル変換

Uh oh!

gh-114953: Support for multithreaded XZ compression in lzma.py#114954

Are you sure you want to change the base?

gh-114953: Support for multithreaded XZ compression in lzma.py#114954

Conversation

mkomet commentedFeb 3, 2024• edited by bedevere-appbotLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Uh oh!

ghost commentedFeb 3, 2024• edited by ghostLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Uh oh!

bedevere-appbot commentedFeb 3, 2024

Uh oh!

erlend-aasland commentedFeb 3, 2024

Uh oh!

mkomet commentedFeb 3, 2024• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Uh oh!

thesamesam left a comment

Choose a reason for hiding this comment

Uh oh!

mkomet commentedFeb 10, 2024

Uh oh!

thesamesam commentedFeb 11, 2024• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Uh oh!

erlend-aasland commentedFeb 12, 2024

Uh oh!

erlend-aaslandFeb 12, 2024

Choose a reason for hiding this comment

Uh oh!

Artoria2e5Feb 13, 2024• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

thesamesamFeb 13, 2024

Choose a reason for hiding this comment

Uh oh!

Artoria2e5Feb 13, 2024• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

erlend-aaslandFeb 27, 2024

Choose a reason for hiding this comment

Uh oh!

erlend-aasland commentedFeb 12, 2024

Uh oh!

thesamesam commentedFeb 12, 2024

Uh oh!

erlend-aasland commentedFeb 12, 2024

Uh oh!

thesamesam commentedApr 27, 2025• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Uh oh!

jave27 commentedMay 7, 2025

Uh oh!

Uh oh!

mkomet commentedFeb 3, 2024•
edited by bedevere-appbot
Loading

ghost commentedFeb 3, 2024•
edited by ghost
Loading

mkomet commentedFeb 3, 2024•
edited
Loading

thesamesam commentedFeb 11, 2024•
edited
Loading

Artoria2e5Feb 13, 2024•
edited
Loading

Artoria2e5Feb 13, 2024•
edited
Loading

thesamesam commentedApr 27, 2025•
edited
Loading