Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up

Spark data source example: connecting to an open-data API

License

NotificationsYou must be signed in to change notification settings

hchauvin/opendata-example

Repository files navigation

Spark data source example: connecting to an open-data API

CircleCIscala: 2.12spark: 3.0License: MIT

This repo shows how Spark (3.0) can be leveraged to read open data accessible from remote APIs.

The death registry published by the French government is taken as an example. Itcontains in total more than 30 million death events since 1970.

age_dist

The retrieval is performed using the new data source SPI introduced in Spark 3.0.The data source SPI for extracting data from remote APIs can give cleaner, more reusablecode thanad hoc processing and is not necessarily more difficult to master.

Usage in a notebook or in a script

./tests/cluster-test.sc gives an example of how to use the data source. This examplerequiressbt,ammonite anddocker to be installed locally.

The following instructions create a fat jar with all the code for the Spark data source,spin off a Spark cluster using docker-compose and runs a Spark session in ammonite,a scala REPL:

sbt assembly./tests/cluster-test.sh

There is also an examplepolynote notebook,./tests/SparkTest.ipynb.

Development

Unit and integration tests:

sbttest

End-to-end tests:

sbt assembly./tests/cluster-test.sh

Code formatting:

sbt scalafmtAll

License

opendata-example is licensed underThe MIT License.

FOSSA Status

Releases

No releases published

Packages

No packages published

[8]ページ先頭

©2009-2025 Movatter.jp