- Notifications
You must be signed in to change notification settings - Fork12
Orchestrate Spark Jobs from Kubeflow Pipelines and poll for the status.
License
NotificationsYou must be signed in to change notification settings
sbakiu/kubeflow-spark
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
Orchestrate Spark Jobs using Kubeflow, a modern Machine Learning orchestration framework. Read relatedblog post.
- Kubernetes cluster (1.17+)
- Kubeflow pipelines (1.7.0+)
- Spark Operator (1.1.0+)
- Python (3.6+)
- kubectl
- helm3
Runmake all
to start everything and skip to step 6 or:
- Start your local cluster
./scripts/start-minikube.sh
- Install Kubeflow Pipelines
./scripts/install-kubeflow.sh
- Install Spark Operator
./scripts/install-spark-operator.sh
- Create Spark Service Account and add permissions
./scripts/add-spark-rbac.sh
- Make Kubeflow UI reachable
- a. (Optional) Add Kubeflow UI Ingress
./scripts/add-kubeflow-ui-ingress.sh
- b. (Optional) Forward service port, e.g:
kubectl port-forward -n kubeflow svc/ml-pipeline-ui 8005:80
- Create Kubeflow pipeline definition file
python kubeflow_pipeline.py
Navigate to the Pipelines UI and upload the newly created pipeline from file
spark_job_pipeline.yaml
Trigger a pipeline run. Make sure to set
spark-sa
as Service Account for the execution.Enjoy your orchestrated Spark job execution!