bbenzikry/spark-eksPublic

NotificationsYou must be signed in to change notification settings
Fork5
Star26

Examples and custom spark images for working with the spark-on-k8s operator on AWS

License

Apache-2.0 license

26 stars 5 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
conf		conf
docker		docker
flux		flux
glue-metastore/conf		glue-metastore/conf
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
logos.png		logos.png

Repository files navigation

spark-on-eks

Examples and custom spark images for working with the spark-on-k8s operator on AWS.

Allows using Spark 2 with IRSA and Spark 3 with IRSA and AWS Glue as a metastore.

Note: Spark 3 images also include relevant jars for working with theS3A commiters

If you're looking for the Spark 3 custom distributions, you can find themhere

Note: Spark 2 images will not be updated, please see theFAQ

Prerequisites

Deployspark-on-k8s operator using thehelm chart and thepatched operator imagebbenzikry/spark-eks-operator:latest

Suggested values for the helm chart can be found in theflux example.

Note: Do not create the spark service account automatically as part of chart use.

using IAM roles for service accounts on EKS

Creating roles and service account

Create an AWS role for driver
Create an AWS role for executors

AWS docs on creating policies and roles

Add default service account EKS role for executors in your spark job namespace ( optional )

# NOTE: Only required when not building spark from source or using a version of spark < 3.1. In 3.1, executor roles will rely on the driver definition. At the moment they execute with the default service account.apiVersion:v1kind:ServiceAccountmetadata:name:defaultnamespace:SPARK_JOB_NAMESPACEannotations:# can also be the driver roleeks.amazonaws.com/role-arn:"arn:aws:iam::ACCOUNT_ID:role/executor-role"

Make sure spark service account ( used by driver pods ) is configured to an EKS role as well

apiVersion:v1kind:ServiceAccountmetadata:name:sparknamespace:SPARK_JOB_NAMESPACEannotations:eks.amazonaws.com/role-arn:"arn:aws:iam::ACCOUNT_ID:role/driver-role"

Building a compatible image

For spark < 3.0.0, seespark2.Dockerfile
For spark 3.0.0+, seespark3.Dockerfile
For pyspark, seepyspark.Dockerfile

Submit your spark application with IRSA support

Select the right implementation for you

Below are examples for latest versions.
If you want to use pinned versions, all images are tagged by the commit SHA.
You can find a full list of tagshere

# spark2FROM bbenzikry/spark-eks:spark2-latest# spark3FROM bbenzikry/spark-eks:spark3-latest# pyspark2FROM bbenzikry/spark-eks:pyspark2-latest# pyspark3FROM bbenzikry/spark-eks:pyspark3-latest

Submit your SparkApplication spec

hadoopConf:# IRSA configuration"fs.s3a.aws.credentials.provider":"com.amazonaws.auth.WebIdentityTokenCredentialsProvider"driver:.....labels:.....serviceAccount:SERVICE_ACCOUNT_NAME# See: https://github.com/kubernetes/kubernetes/issues/82573# Note: securityContext has changed in recent versions of the operator to podSecurityContextpodSecurityContext:fsGroup:65534

Working with AWS Glue as metastore

Glue Prerequisites

Make sure your driver and executor roles have the relevant glue permissions

{/*  Example below depicts the IAM policy for accessing db1/table1.  Modify this as you deem worthy for spark application access.  */Effect:"Allow",Action:["glue:*Database*","glue:*Table*","glue:*Partition*"],Resource:["arn:aws:glue:us-west-2:123456789012:catalog","arn:aws:glue:us-west-2:123456789012:database/db1","arn:aws:glue:us-west-2:123456789012:table/db1/table1","arn:aws:glue:eu-west-1:123456789012:database/default","arn:aws:glue:eu-west-1:123456789012:database/global_temp","arn:aws:glue:eu-west-1:123456789012:database/parquet",],}

Make sure you are using the patched operator image
Add a config map to your spark job namespace as definedhere

apiVersion:v1data:hive-site.xml:|-    <configuration>        <property>            <name>hive.imetastoreclient.factory.class</name>            <value>com.amazonaws.glue.catalog.metastore.AWSGlueDataCatalogHiveClientFactory</value>        </property>    </configuration>kind:ConfigMapmetadata:namespace:SPARK_JOB_NAMESPACEname:spark-custom-config-map

Submitting your application

In order to submit an application with glue support, you need to add a reference to the configmap in yourSparkApplication spec.

kind:SparkApplicationmetadata:name:"my-spark-app"namespace:SPARK_JOB_NAMESPACEspec:sparkConfigMap:spark-custom-config-map

Working with the spark history server on S3

Use the appropriate spark version and deploy thehelm chart
Flux / Helm values referencehere

FAQ

Where can I find a Spark 2 build with Glue support?
As spark 2 becomes less and less relevant, I opted against the need to add glue support.You can take a lookhere for a reference build script which you can use to build a Spark 2 distribution to use with the Spark 2dockerfile
Why a patched operator image?
The patched image is a simple implementation for properly working with custom configuration files with the spark operator.It may be added as a PR in the future or another implementation will take its place. For more information, see the related issuekubeflow/spark-operator#216

About

Examples and custom spark images for working with the spark-on-k8s operator on AWS

Releases

No releases published

Packages

No packages published

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

License

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

spark-on-eks

Prerequisites

using IAM roles for service accounts on EKS

Creating roles and service account

Building a compatible image

Submit your spark application with IRSA support

Select the right implementation for you

Submit your SparkApplication spec

Working with AWS Glue as metastore

Glue Prerequisites

Submitting your application

Working with the spark history server on S3

FAQ

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages

Languages

Movatterモバイル変換

License

bbenzikry/spark-eks

Folders and files

Latest commit

History

Repository files navigation

spark-on-eks

Prerequisites

using IAM roles for service accounts on EKS

Creating roles and service account

Building a compatible image

Submit your spark application with IRSA support

Select the right implementation for you

Submit your SparkApplication spec

Working with AWS Glue as metastore

Glue Prerequisites

Submitting your application

Working with the spark history server on S3

FAQ

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages0

Languages

Packages