Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up

Orchestrate Spark Jobs from Kubeflow Pipelines and poll for the status.

License

NotificationsYou must be signed in to change notification settings

sbakiu/kubeflow-spark

Repository files navigation

Orchestrate Spark Jobs using Kubeflow, a modern Machine Learning orchestration framework. Read relatedblog post.

Requirements

  1. Kubernetes cluster (1.17+)
  2. Kubeflow pipelines (1.7.0+)
  3. Spark Operator (1.1.0+)
  4. Python (3.6+)
  5. kubectl
  6. helm3

Getting started

Runmake all to start everything and skip to step 6 or:

  1. Start your local cluster
./scripts/start-minikube.sh
  1. Install Kubeflow Pipelines
./scripts/install-kubeflow.sh
  1. Install Spark Operator
./scripts/install-spark-operator.sh
  1. Create Spark Service Account and add permissions
./scripts/add-spark-rbac.sh
  1. Make Kubeflow UI reachable
  • a. (Optional) Add Kubeflow UI Ingress
./scripts/add-kubeflow-ui-ingress.sh
  • b. (Optional) Forward service port, e.g:
kubectl port-forward -n kubeflow svc/ml-pipeline-ui 8005:80
  1. Create Kubeflow pipeline definition file
python kubeflow_pipeline.py
  1. Navigate to the Pipelines UI and upload the newly created pipeline from filespark_job_pipeline.yaml

  2. Trigger a pipeline run. Make sure to setspark-sa as Service Account for the execution.

  3. Enjoy your orchestrated Spark job execution!

About

Orchestrate Spark Jobs from Kubeflow Pipelines and poll for the status.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

[8]ページ先頭

©2009-2025 Movatter.jp