4
\$\begingroup\$

Downloading specific version like3.11.1 fromhttps://www.python.org/ftp/python/3.11.1/Python-3.11.1.tgz and installing./configure --enable-optimizations && make install is slow (30 - 40 mins), on GitHub actions.

Dockerfile:

FROM --platform=linux/amd64 ubuntu:22.04 as baseUSER rootENV PYTHONDONTWRITEBYTECODE 1ENV PYTHONUNBUFFERED 1ENV DEBIAN_FRONTEND noninteractiveCOPY . /appWORKDIR /appCOPY ZscalerCertificate.crt /usr/local/share/ca-certificates/ZscalerCertificate.crtRUN find /tmp -name \*.deb -exec rm {} +RUN apt-get update && \    apt-get upgrade -y && \    apt-get install -y software-properties-common ca-certificates &&\    update-ca-certificatesRUN apt-get update &&\    apt-get upgrade -y && \    apt-get install -y --no-install-recommends curl gcc g++ gnupg unixodbc-dev openssl git &&\    rm -rf /var/lib/apt/lists/*RUN apt-get update && apt-get upgradeRUN apt-get install -y build-essential zlib1g-dev libncurses5-dev libgdbm-dev libssl-dev \    libreadline-dev libffi-dev wget libbz2-dev libsqlite3-devRUN mkdir /python && cd /pythonRUN wget https://www.python.org/ftp/python/3.11.1/Python-3.11.1.tgzRUN tar -zxvf Python-3.11.1.tgzRUN cd Python-3.11.1 && ls -lhR && ./configure --enable-optimizations && make install

What is the optimal way to install python specific version? and how to reduce the build time?

askedAug 17, 2023 at 13:45
Python coder's user avatar
\$\endgroup\$
3
  • 2
    \$\begingroup\$Do you absolutely need to build yourself? There are utilities like pyenv that do a better job of managing multiple versions and have download and build included.\$\endgroup\$CommentedAug 17, 2023 at 14:49
  • \$\begingroup\$@Reinderien Not really, If I can get specific python version, I need it because security scanning tool (whitesource/mend) is showing some critical vulnerabilities in the latest version of python.\$\endgroup\$CommentedAug 17, 2023 at 15:28
  • \$\begingroup\$Usemake -j with your number of cores for parallelism.\$\endgroup\$CommentedAug 18, 2023 at 0:57

2 Answers2

4
\$\begingroup\$

This is a bad layer:

RUN apt-get update && apt-get upgrade

That locks in the package lists permanently in a layer. Worse, it means changes to the next layer may use a cached version of this layer instead of getting an update. That could mean theapt-get install tries to get packages no longer in the archive.

Combine it with the install command and post-install size reduction, as you did for the earlier package installs.

Each of those ought toapt-get clean, too, to remove the contents of/var/cache/apt/archives, which tend to be large.


Similarly, this one leaves the large archive lying around:

wget https://www.python.org/ftp/python/3.11.1/Python-3.11.1.tgz

We should be immediately unpacking then removing it (all in a single layer, to keep the container image reasonably small). Preferably, we should unpack, build and clean, keeping just the installed result.

answeredAug 17, 2023 at 14:40
Toby Speight's user avatar
\$\endgroup\$
5
  • \$\begingroup\$For my understanding: isn'tchanges to the next layer may use a cached version of this layer instead of getting an update a feature instead of a bug? If a layer does not have caching as its purpose, what is it for?\$\endgroup\$CommentedAug 17, 2023 at 15:14
  • \$\begingroup\$Thanks for providing the review. I get to know some new things, any suggestions on improving the build time?\$\endgroup\$CommentedAug 17, 2023 at 15:32
  • \$\begingroup\$I think you're unlikely to improve the build time, but reducing the container size may give you speed and cost improvements by transferring less data. The trouble with using cached package indexes is that when the package repositories are updated, we'll be using old lists of what's available (i.e. the cache is stale). So we always wantapt-get update in the same layer asapt-get install.\$\endgroup\$CommentedAug 17, 2023 at 15:51
  • \$\begingroup\$Ithink you might be able to declare/var/cache as a volume, so that it's never included in any layers. I haven't tried that though - perhaps someone else can verify whether that's a good practice?\$\endgroup\$CommentedAug 17, 2023 at 15:52
  • \$\begingroup\$An update on the solution, Size got reduced by 100MB and no change in build time.\$\endgroup\$CommentedAug 18, 2023 at 14:13
1
\$\begingroup\$

There are a couple optimisations you can do here.

First and foremost, you should re-order your commands, so that the earlier layers are the least likely to change and latter layers are the ones that are more likely to change. In general, you should install your dependencies first before COPY-ing files belonging to the application that will run on it. This way, you take better advantage of caching.COPY . makes a very volatile layer as any change in any files would invalidate all of the subsequent layers. If you really have to do that, do that as close as possible to the final step.

FROM --platform=linux/amd64 ubuntu:22.04 as baseUSER rootENV PYTHONDONTWRITEBYTECODE 1ENV PYTHONUNBUFFERED 1ENV DEBIAN_FRONTEND noninteractive# is this step really necessary? there shouldn't be anything in /tmp# RUN find /tmp -name \*.deb -exec rm {} +RUN apt-get update && \    apt-get upgrade -y && \    apt-get install -y software-properties-common ca-certificates &&\    update-ca-certificatesRUN apt-get update &&\    apt-get upgrade -y && \    apt-get install -y --no-install-recommends curl gcc g++ gnupg unixodbc-dev openssl git &&\    rm -rf /var/lib/apt/lists/*RUN apt-get update && apt-get upgradeRUN apt-get install -y build-essential zlib1g-dev libncurses5-dev libgdbm-dev libssl-dev \    libreadline-dev libffi-dev wget libbz2-dev libsqlite3-devRUN mkdir /python && cd /pythonRUN wget https://www.python.org/ftp/python/3.11.1/Python-3.11.1.tgzRUN tar -zxvf Python-3.11.1.tgzRUN cd Python-3.11.1 && ls -lhR && ./configure --enable-optimizations && make installCOPY . /appWORKDIR /appCOPY ZscalerCertificate.crt /usr/local/share/ca-certificates/ZscalerCertificate.crt

That way, merely changing Dockerfile or certificate wouldn't require an entire reinstallation and recompilation of Python.

Second, to minimize image sizes, avoid creating layers with unnecessary cached files:

FROM --platform=linux/amd64 ubuntu:22.04 as baseUSER rootENV PYTHONDONTWRITEBYTECODE 1ENV PYTHONUNBUFFERED 1ENV DEBIAN_FRONTEND noninteractive# is this step really necessary? there shouldn't be anything in /tmp# RUN find /tmp -name \*.deb -exec rm {} +RUN apt-get update &&\    apt-get upgrade -y && \    apt-get install -y --no-install-recommends curl gcc g++ gnupg unixodbc-dev openssl git &&\    apt-get install -y software-properties-common ca-certificates &&\    apt-get install -y build-essential zlib1g-dev libncurses5-dev libgdbm-dev libssl-dev libreadline-dev libffi-dev wget libbz2-dev libsqlite3-dev && \    update-ca-certificates && \    rm -rf /var/lib/apt/lists/*RUN mkdir /python && cd /python && \    wget https://www.python.org/ftp/python/3.11.1/Python-3.11.1.tgz && \    tar -zxvf Python-3.11.1.tgz && \    cd Python-3.11.1 && \    ls -lhR && \    ./configure --enable-optimizations && \    make install && \    rm -rf /pythonCOPY . /appWORKDIR /appCOPY ZscalerCertificate.crt /

Third, you can go further than this to slim down the image by removing unnecessary apt-get dependencies as well. There are two approach to this:

  1. use multi-stage build and useCOPY --from to copy just the Python files that you need on a fresh image.
  2. Or dowhat the official debian slim python did, which is to use apt-mark to uninstall unnecessary packages. This need to happen in the same layer as the entire python compile and install step to avoid creating bloated intermediate images.
answeredAug 18, 2023 at 4:47
Lie Ryan's user avatar
\$\endgroup\$
2
  • \$\begingroup\$Thanks for the details. Can you add more details on multi-stage build? What are the paths? From which path python executables need to copy and in which path should it need to be store in final stage?\$\endgroup\$CommentedAug 18, 2023 at 5:55
  • \$\begingroup\$@Pythoncoder you can usedocker diff to find the list of files that changed since the container was created from the image. You can runmake as a dockerfile step, then runmake install, find thedocker diff, and use that list of files. Alternatively, make a deb package so all the file you need to COPY is in a single self contained file (though this doubles the image diff size).\$\endgroup\$CommentedAug 19, 2023 at 0:07

You mustlog in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.