miraisolutions/sparkbqPublic

NotificationsYou must be signed in to change notification settings
Fork3
Star19

Sparklyr extension package to connect to Google BigQuery

License

GPL-3.0 license

19 stars 3 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 81 Commits
R		R
man		man
.Rbuildignore		.Rbuildignore
.gitignore		.gitignore
DESCRIPTION		DESCRIPTION
LICENSE		LICENSE
NAMESPACE		NAMESPACE
NEWS		NEWS
README.md		README.md
cran-comments.md		cran-comments.md

Repository files navigation

sparkbq: Google BigQuery Support for sparklyr

sparkbq is asparklyr extension package providing an integration withGoogle BigQuery. It builds on top ofspark-bigquery, which provides a Google BigQuery data source toApache Spark.

Version Information

You can install the released version ofsparkbq from CRAN via

install.packages("sparkbq")

or the latest development version through

devtools::install_github("miraisolutions/sparkbq",ref="develop")

The following table provides an overview over supported versions of Apache Spark, Scala, andGoogle Dataproc:

sparkbq	spark-bigquery	Apache Spark	Scala	Google Dataproc
0.1.x	0.1.0	2.2.x and 2.3.x	2.11	1.2.x and 1.3.x

sparkbq is based on the Spark packagespark-bigquery which is available in a separateGitHub repository.

Example Usage

library(sparklyr)library(sparkbq)library(dplyr)config<- spark_config()sc<- spark_connect(master="local[*]",config=config)# Set Google BigQuery default settingsbigquery_defaults(billingProjectId="<your_billing_project_id>",gcsBucket="<your_gcs_bucket>",datasetLocation="US",serviceAccountKeyFile="<your_service_account_key_file>",type="direct")# Reading the public shakespeare data table# https://cloud.google.com/bigquery/public-data/# https://cloud.google.com/bigquery/sample-tableshamlet<-   spark_read_bigquery(sc,name="hamlet",projectId="bigquery-public-data",datasetId="samples",tableId="shakespeare") %>%  filter(corpus=="hamlet")# NOTE: predicate pushdown to BigQuery!# Retrieve results into a local tibblehamlet %>% collect()# Write result into "mysamples" dataset in our BigQuery (billing) projectspark_write_bigquery(hamlet,datasetId="mysamples",tableId="hamlet",mode="overwrite")

Authentication

When running outside of Google Cloud it is necessary to specify a service account JSON key file. The service account key file can be passed as parameterserviceAccountKeyFile tobigquery_defaults or directly tospark_read_bigquery andspark_write_bigquery.

Alternatively, an environment variableexport GOOGLE_APPLICATION_CREDENTIALS=/path/to/your/service_account_keyfile.json can be set (seehttps://cloud.google.com/docs/authentication/getting-started for more information). Make sure the variable is set before starting the R session.

When running on Google Cloud, e.g. Google Cloud Dataproc, application default credentials (ADC) may be used in which case it is not necessary to specify a service account key file.

Further Information

About

Sparklyr extension package to connect to Google BigQuery

Releases2

Version 0.1.1 Latest

Jan 8, 2020

+ 1 release

Packages

No packages published

Contributors4

Languages

R100.0%

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

License

Folders and files

Latest commit

History

Repository files navigation

sparkbq: Google BigQuery Support for sparklyr

Version Information

Example Usage

Authentication

Further Information

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases2

Packages

Contributors4

Uh oh!

Languages

Movatterモバイル変換

License

miraisolutions/sparkbq

Folders and files

Latest commit

History

Repository files navigation

sparkbq: Google BigQuery Support for sparklyr

Version Information

Example Usage

Authentication

Further Information

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases2

Packages0

Contributors4

Uh oh!

Languages

Packages