Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up

This is Spark running at 10Gb/s

NotificationsYou must be signed in to change notification settings

LeStarch/hyper-spark

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

This is Spark running at 10Gb/s!Note: The researched that created this work concluded with presentations at ApacheCon Europe 2015.

Project Goal

The goal of this project is to get a single stream in Apache Spark Streaming processing to a throughput of ~10Gb/s. To be clear, this is a single "processor" or "thread" (one map-reduce pipeline). It is independent of the inherant parallelism of the Map-Reduce style processing of Apache Spark.

Why? Given that Apache Spark is map-reduce style programing, why negate that to focusing on a single pipeline? In order to target the next-generation of distributed computing, it is not enough to use just parallelism. One must ensure that each of the individual pipelines must be optimized for maximum-thoughput and then parallelized. In order to support a project running on 40Gb/s networks and producing 13+ Gb/s of data, individual streams must process at ~1Gb/s.

Implementation Quirks

JVM network stack provides an inefficient interface. The interface allows the reading on individual bytes from a Socket, and thus requries many function calls and a lot of copying of single bytes.

Thus a JNI solution that can read blocks of data off a Berkley socket was used in order to ensure that the networking layer runs at peak performance.

Originally, the implementation was intended to run Fourier tansforms at high-speed. However, in order to achieve bare-bone efficiency this code as been disabled.

Current State

At the conclusion of the research, an individual Spark pipeline was recorded at processing around 500MB/s. Much of the Fourier processing code is commented out in order to get bare-bones efficiency.

Supported Research

About

This is Spark running at 10Gb/s

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

[8]ページ先頭

©2009-2025 Movatter.jp