Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up

A high-performance, scalable and efficient ShuffleManager plugin for Apache Spark, utilizing UCX communication layer

License

NotificationsYou must be signed in to change notification settings

openucx/sparkucx

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

41 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SparkUCX is a high performance ShuffleManager plugin for Apache Spark, that uses RDMA and other high performance transportsthat are supported byUCX, to perform Shuffle data transfers in Spark jobs.

This open-source project is developed, maintained and supported by theUCF consortium.

Runtime requirements

Installation

Obtain SparkUCX

Please use the"Releases" page to download SparkUCX jar filefor your spark version (e.g. spark-ucx-1.0-for-spark-2.4.0-jar-with-dependencies.jar).Put SparkUCX jar file in $SPARK_UCX_HOME on all the nodes in your cluster.
If you would like to build the project yourself, please refer to the"Build" section below.

Ucx binariesmust be in Spark classpath on every Spark Master and Worker.It can be obtained by installing the latest version fromUcx release page

Configuration

Provide Spark the location of the SparkUCX plugin jars and ucx shared binaries by using the extraClassPath option.

spark.driver.extraClassPath     $SPARK_UCX_HOME/spark-ucx-1.0-for-spark-2.4.0-jar-with-dependencies.jar:$UCX_PREFIX/libspark.executor.extraClassPath   $SPARK_UCX_HOME/spark-ucx-1.0-for-spark-2.4.0-jar-with-dependencies.jar:$UCX_PREFIX/lib

To enable the SparkUCX Shuffle Manager plugin, add the following configuration propertyto spark (e.g. in $SPARK_HOME/conf/spark-defaults.conf):

spark.shuffle.manager   org.apache.spark.shuffle.UcxShuffleManager

For spark-3.0 version add SparkUCX ShuffleIO plugin:

spark.shuffle.sort.io.plugin.class org.apache.spark.shuffle.compat.spark_3_0.UcxLocalDiskShuffleDataIO

Build

Building the SparkUCX plugin requiresApache Maven and Java 8+ JDK

Build instructions:

% git clone https://github.com/openucx/sparkucx% cd sparkucx% mvn -DskipTests clean package -Pspark-2.4

Performance

SparkUCX plugin is built to provide the best performance out-of-the-box, and provides multiple configuration options to further tune SparkUCX per-job. For more information on how to setupHiBench benchmark and reproduce results, please refer toAccelerated Apache SparkUCX 2.4/3.0 cluster deployment.

Performance results

About

A high-performance, scalable and efficient ShuffleManager plugin for Apache Spark, utilizing UCX communication layer

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

[8]ページ先頭

©2009-2025 Movatter.jp