Apache Spark™ Documentation
Setup instructions, programming guides, and other documentation are available for each stable version of Spark below:
Documentation for preview releases:
The documentation linked to above covers getting started with Spark, as well the built-in componentsMLlib,Spark Streaming, andGraphX.
In addition, this page lists other resources for learning Spark.
Videos
See theApache Spark YouTube Channel for videos from Spark events. There are separateplaylists for videos of different topics. Besides browsing through playlists, you can also find direct links to videos below.
Screencast Tutorial Videos
Spark Summit Videos
- Videos from Spark Summit 2014, San Francisco, June 30 - July 2 2013
- Videos from Spark Summit 2013, San Francisco, Dec 2-3 2013
Meetup Talk Videos
In addition to the videos listed below, you can also viewall slides from Bay Area meetups here.
- Spark 1.0 and Beyond (slides)by Patrick Wendell, at Cisco in San Jose, 2014-04-23
- Adding Native SQL Support to Spark with Catalyst (slides)by Michael Armbrust, at Tagged in SF, 2014-04-08
- SparkR and GraphX (slides:SparkR,GraphX)by Shivaram Venkataraman & Dan Crankshaw, at SkyDeck in Berkeley, 2014-03-25
- Simple deployment w/ SIMR & Advanced Shark Analytics w/ TGFs (slides)by Ali Ghodsi, at Huawei in Santa Clara, 2014-02-05
- Stores, Monoids & Dependency Injection - Abstractions for Spark (slides)by Ryan Weald, at Sharethrough in SF, 2014-01-17
- Distributed Machine Learning using MLbase (slides)by Evan Sparks & Ameet Talwalkar, at Twitter in SF, 2013-08-06
- GraphX Preview: Graph Analysis on Sparkby Reynold Xin & Joseph Gonzalez, at Flurry in SF, 2013-07-02
- Deep Dive with Spark Streaming (slides)by Tathagata Das, at Plug and Play in Sunnyvale, 2013-06-17
- Tachyon and Shark update (slides:Shark,Tachyon)by Ali Ghodsi, Haoyuan Li, Reynold Xin, Google Ventures, 2013-05-09
- Spark 0.7: Overview, pySpark, & Streamingby Matei Zaharia, Josh Rosen, Tathagata Das, at Conviva on 2013-02-21
- Introduction to Spark Internals (slides)by Matei Zaharia, at Yahoo in Sunnyvale, 2012-12-18
Training Materials
Hands-On Exercises
External Tutorials, Blog Posts, and Talks
Books
- Learning Spark, by Holden Karau, Andy Konwinski, Patrick Wendell and Matei Zaharia (O'Reilly Media)
- Spark in Action, by Marko Bonaci and Petar Zecevic (Manning)
- Advanced Analytics with Spark, by Juliet Hougland, Uri Laserson, Sean Owen, Sandy Ryza and Josh Wills (O'Reilly Media)
- Spark GraphX in Action, by Michael Malak (Manning)
- Fast Data Processing with Spark, by Krishna Sankar and Holden Karau (Packt Publishing)
- Machine Learning with Spark, by Nick Pentreath (Packt Publishing)
- Spark Cookbook, by Rishi Yadav (Packt Publishing)
- Apache Spark Graph Processing, by Rindra Ramamonjison (Packt Publishing)
- Mastering Apache Spark, by Mike Frampton (Packt Publishing)
- Big Data Analytics with Spark: A Practitioner's Guide to Using Spark for Large Scale Data Analysis, by Mohammed Guller (Apress)
- Large Scale Machine Learning with Spark, by Md. Rezaul Karim, Md. Mahedi Kaysar (Packt Publishing)
- Big Data Analytics with Spark and Hadoop, by Venkat Ankam (Packt Publishing)
Examples
Research Papers
Spark was initially developed as a UC Berkeley research project, and much of the design is documented in papers.Theresearch page lists some of the original motivation and direction.