- Notifications
You must be signed in to change notification settings - Fork34
AgilData/apache-spark-examples
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
Apache Spark Examples
These examples were put together for some talks on Apache Spark by AgilData [http://www.agildata.com/]
[8/5/16] This repo has been updated to use Apache Spark 2.0. The original code for Apache Spark 1.6.x isavailable on this branch:https://github.com/AgilData/apache-spark-examples/tree/spark_1.6.x
Note that some of the Java examples do not currently work. This is to highlight some of the issues when using Javawith the DataFrame API and this is covered in our talk (we will update this README with a link to the slides soon).
There are various code samples in this repo in both Java and Scala for performing some trivial analytics on US censusdata.
To download the full US census data for Colorado:
http://www2.census.gov/census_2010/04-Summary_File_1/Colorado/
Download the zip file and unzip into atestdata
directory within this project.
usgeo2010.txt contains geographic information in fixed-width format. For the examples in this repo we are onlyinterested in the following fields:
s.substring(18,25), // Logical Record Nos.substring(226,316), // Names.substring(8,11) // Summary Level (050 is county)
For full documentation on the file formats, downloadhttp://www.census.gov/prod/cen2010/doc/sf1.pdf
This repo also contains the classic word count examples, in Java and Scala, with some minor modifications.
You can use any text file as an input and in our talk we used the complete works of Shakespeare in text format. Thedownload is available here:
http://ocw.mit.edu/ans7870/6/6.006/s08/lecturenotes/files/t8.shakespeare.txt
About
Apache Spark Examples
Resources
Uh oh!
There was an error while loading.Please reload this page.
Stars
Watchers
Forks
Releases
Packages0
Uh oh!
There was an error while loading.Please reload this page.