Movatterモバイル変換


[0]ホーム

URL:


Skip to content
DEV Community
Log in Create account

DEV Community

Cover image for Optimize spark on kubernetes
akoshel
akoshel

Posted on • Edited on

Optimize spark on kubernetes

This is my second post about Spark on Kubernetes. I wanted to share my experience with reducing the costs of Spark computation in clouds, which can be expensive, but can be decreased by 60-70%. I am using Spark version 3.3.1.

'1. If you are running your research in client mode from iPython notebook, it is recommended touse dynamic allocation. This configuration allows you to create an executor pod only during compute time, after which the executor stops.

spark.dynamicAllocation.enabled                     truespark.dynamicAllocation.shuffleTracking.enabled     truespark.dynamicAllocation.shuffleTracking.timeout     120spark.dynamicAllocation.minExecutors                0spark.dynamicAllocation.maxExecutors                10
Enter fullscreen modeExit fullscreen mode

'2.Using spot nodes for executors significantly reduce costs (60-90% cheaper than on-demand nodes). To create a spot node group, you need to label it, for example, spark: spot. However, for driver still on-demand nodes should be used.

If you are running in client mode, set the following configuration

spark.kubernetes.executor.node.selector.spark      spot  # here you label k,v in my case k=spark, v=node
Enter fullscreen modeExit fullscreen mode

If you are using Spark Operator, use the following configuration settings:

spec:  driver:    nodeSelector:      - key1: value1      - key2: value2  executor:    nodeSelector:      - key1: value1      - key2: value2
Enter fullscreen modeExit fullscreen mode

P.S use volume mount from next point to keep executors temp results is case of spot node interruption

'3.Use SSD volume mount to executors. As mentioned above to keep executor temp results in case of spot node interruption. For this purpose, it is best to use an SSD volume mount, which accelerates the write and read of temp files that Spark saves on disk. You can use the following configuration settings:

spark.kubernetes.executor.volumes.persistentVolumeClaim.data.options.claimName    OnDemandspark.kubernetes.executor.volumes.persistentVolumeClaim.data.options.storageClass    gp # your cloud ssd storage classspark.kubernetes.executor.volumes.persistentVolumeClaim.data.options.sizeLimit    100Gispark.kubernetes.executor.volumes.persistentVolumeClaim.data.mount.path    /dataspark.kubernetes.executor.volumes.persistentVolumeClaim.data.mount.readOnly    false
Enter fullscreen modeExit fullscreen mode

'4. These are the recommended default values from "Learning Spark":

spark.shuffle.file.buffer                           1mspark.file.transferTo                               falsespark.shuffle.unsafe.file.output.buffer             1mspark.io.compression.lz4.blockSize                  512k
Enter fullscreen modeExit fullscreen mode

In conclusion, by following the above steps, you can significantly reduce the cost of running Spark computations in the cloud. Dynamic allocation, using spot nodes for executors, and SSD volume mounts can reduce costs by up to 60-90%. Additionally, using default values as recommended in "Learning Spark" can help optimize performance. Remember to always prioritize the needs and satisfaction of the user when making any changes and to thoroughly test any configurations before implementing them. By doing so, you can provide a useful and enjoyable experience for your users while also being cost-effective.



Recources:
https://spot.io/blog/how-to-run-spark-on-kubernetes-reliably-on-spot-instances/
https://aws.amazon.com/blogs/compute/running-cost-optimized-spark-workloads-on-kubernetes-using-ec2-spot-instances/
https://spark.apache.org/docs/latest/running-on-kubernetes.html
https://www.oreilly.com/library/view/learning-spark-2nd/9781492050032/



P.S. My first post about spark on k8s
How to run Spark on kubernetes in jupyterhub
https://dev.to/akoshel/spark-on-k8s-in-jupyterhub-1da2

Top comments(0)

Subscribe
pic
Create template

Templates let you quickly answer FAQs or store snippets for re-use.

Dismiss

Are you sure you want to hide this comment? It will become hidden in your post, but will still be visible via the comment'spermalink.

For further actions, you may consider blocking this person and/orreporting abuse

https://www.linkedin.com/in/alexander-koshelev-a24745106/
  • Location
    Antalya
  • Joined

More fromakoshel

DEV Community

We're a place where coders share, stay up-to-date and grow their careers.

Log in Create account

[8]ページ先頭

©2009-2025 Movatter.jp