Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

[ZEPPELIN-6367] Improve testing performance using Docker images#5125

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Open
kmularise wants to merge23 commits intoapache:master
base:master
Choose a base branch
Loading
fromkmularise:ZEPPELIN-6367

Conversation

@kmularise
Copy link
Contributor

@kmularisekmularise commentedDec 4, 2025
edited
Loading

What is this PR for?

Optimize GitHub Actions CI pipeline by using pre-built Docker images for test environments. This reduces conda environment setup time from20+ minutes to under 1 minute by caching the fully configured Python/R environment in GitHub Container Registry (GHCR).

What type of PR is it?

Improvement

Todos

  • Create Dockerfile for Python/R test environment
  • Add prepare-python-r-env job with GHCR integration
  • Convert core-modules to container job
  • Fix hashFiles() issue for container jobs
  • Upgrade to Debian 12 for MongoDB 8.0 compatibility
  • Install Temurin JDK 11 from Adoptium repository
  • Verify core-modules tests pass in container environment
  • Verify zeppelin-integration-test passes in container environment
  • Consider extending to other jobs (spark-integration-test, interpreter-test, etc.)

What is the Jira issue?

ZEPPELIN-6367

How should this be tested?

  • Automated: All existing tests incore-modules job should pass
  • Manual verification:
    1. Checkprepare-python-r-env job completes and pushes image to GHCR
    2. Verify image is reused on subsequent runs (should show "exists=true" in logs)
    3. Compare total workflow time before and after this change
    4. Confirm MongoDB tests pass in Debian 12 container environment

Screenshots (if appropriate)

Questions:

  • Does the license files need to update? No
  • Is there breaking changes for older versions? No
  • Does this needs documentation? No

hashFiles() fails in container jobs when evaluated before checkout.Pre-calculate hash in prepare-python-r-env job and pass it as outputto core-modules job for Maven cache key.
@kmularisekmularise changed the titleZeppelin 6367[ZEPPELIN-6367] Improve testing performance using Docker imagesDec 4, 2025
Comment on lines 58 to 74
-name:Build and push Docker image
if:steps.check.outputs.exists != 'true'
uses:docker/build-push-action@v5
with:
context:.
file:.github/docker/python-r-env.Dockerfile
push:true
tags:|
${{ steps.hash.outputs.image-name }}
ghcr.io/${{ github.repository_owner }}/zeppelin-test-env:py39-r-latest
cache-from:type=gha
cache-to:type=gha,mode=max
labels:|
org.opencontainers.image.source=${{ github.event.repository.html_url }}
org.opencontainers.image.revision=${{ github.sha }}
build-args:|
ENV_FILE=testing/env_python_3.9_with_R.yml
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

https://docs.github.com/en/actions/reference/workflows-and-actions/events-that-trigger-workflows#workflows-in-forked-repositories

PRs from forked repositories are limited to read-only permissions, which prevents workflows triggered by them from pushing images to the registry.

I suggest separating the build and push steps. (Assuming the image is not already in the registry,) we should build the image first for local testing, and then execute the push step only if the workflow has write access to GHCR.

kmularise reacted with thumbs up emoji
Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Thanks for the suggestion! I've updated the workflow to separate the build and push steps and check write access to GHCR.

Comment on lines 36 to 98
# ============================================
# Job 1: Prepare Docker image
# ============================================
prepare-python-r-env:
runs-on:ubuntu-24.04
permissions:
contents:read
packages:write
outputs:
image-name:${{ steps.image.outputs.name }}
pom-hash:${{ steps.hash.outputs.pom }}

steps:
-name:Checkout
uses:actions/checkout@v4

-name:Calculate pom.xml hash
id:hash
run:|
echo "pom=${{ hashFiles('**/pom.xml') }}" >> $GITHUB_OUTPUT
-name:Generate image name with hash
id:image
run:|
# Include both environment file AND Dockerfile in hash calculation
COMBINED_HASH=$(cat testing/env_python_3.9_with_R.yml .github/docker/python-r-env.Dockerfile | sha256sum | cut -d' ' -f1 | cut -c1-12)
IMAGE_NAME="ghcr.io/${{ github.repository_owner }}/zeppelin-test-env:py39-r-${COMBINED_HASH}"
echo "name=${IMAGE_NAME}" >> $GITHUB_OUTPUT
-name:Check if image exists
id:check
run:|
if docker manifest inspect ${{ steps.image.outputs.name }} >/dev/null 2>&1; then
echo "exists=true" >> $GITHUB_OUTPUT
else
echo "exists=false" >> $GITHUB_OUTPUT
fi
-name:Set up Docker Buildx
if:steps.check.outputs.exists != 'true'
uses:docker/setup-buildx-action@v3

-name:Log in to GHCR
if:steps.check.outputs.exists != 'true'
uses:docker/login-action@v3
with:
registry:ghcr.io
username:${{ github.actor }}
password:${{ secrets.GITHUB_TOKEN }}

-name:Build and push if needed
if:steps.check.outputs.exists != 'true'
uses:docker/build-push-action@v5
with:
context:.
file:.github/docker/python-r-env.Dockerfile
push:true
tags:|
${{ steps.image.outputs.name }}
ghcr.io/${{ github.repository_owner }}/zeppelin-test-env:py39-r-latest
cache-from:type=gha
cache-to:type=gha,mode=max
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Is the workflow defined here the same as the one inbuild-docker-image.yml?

If so, I'd suggest convertingbuild-docker-image.yml into a reusable workflow (usingon: workflow_call).

That way, you can keep the current separation while allowing other wofkflows, such asprepare-python-r-env, to call it via theuses keyword.

For reference, here are the cases I mentioned:

kmularise reacted with thumbs up emoji
@tbonelee
Copy link
Contributor

We have some user permission issues for npm cache directories incore-modules job.

kmularise reacted with eyes emoji

Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment

Reviewers

@tboneleetboneleetbonelee left review comments

At least 1 approving review is required to merge this pull request.

Assignees

No one assigned

Labels

None yet

Projects

None yet

Milestone

No milestone

Development

Successfully merging this pull request may close these issues.

2 participants

@kmularise@tbonelee

[8]ページ先頭

©2009-2025 Movatter.jp