- Notifications
You must be signed in to change notification settings - Fork13
Orchestrate Spark Jobs from Kubeflow Pipelines and poll for the status.
License
NotificationsYou must be signed in to change notification settings
sbakiu/kubeflow-spark
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
Orchestrate Spark Jobs using Kubeflow, a modern Machine Learning orchestration framework. Read relatedblog post.
- Kubernetes cluster (1.17+)
- Kubeflow pipelines (1.7.0+)
- Spark Operator (1.1.0+)
- Python (3.6+)
- kubectl
- helm3
Runmake all
to start everything and skip to step 6 or:
- Start your local cluster
./scripts/start-minikube.sh
- Install Kubeflow Pipelines
./scripts/install-kubeflow.sh
- Install Spark Operator
./scripts/install-spark-operator.sh
- Create Spark Service Account and add permissions
./scripts/add-spark-rbac.sh
- Make Kubeflow UI reachable
- a. (Optional) Add Kubeflow UI Ingress
./scripts/add-kubeflow-ui-ingress.sh
- b. (Optional) Forward service port, e.g:
kubectl port-forward -n kubeflow svc/ml-pipeline-ui 8005:80
- Create Kubeflow pipeline definition file
python kubeflow_pipeline.py
Navigate to the Pipelines UI and upload the newly created pipeline from file
spark_job_pipeline.yaml
Trigger a pipeline run. Make sure to set
spark-sa
as Service Account for the execution.Enjoy your orchestrated Spark job execution!
About
Orchestrate Spark Jobs from Kubeflow Pipelines and poll for the status.
Topics
Resources
License
Uh oh!
There was an error while loading.Please reload this page.
Stars
Watchers
Forks
Releases
No releases published
Packages0
No packages published
Uh oh!
There was an error while loading.Please reload this page.