Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Sparklyr extension package to connect to Google BigQuery

License

NotificationsYou must be signed in to change notification settings

miraisolutions/sparkbq

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

81 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

sparkbq: Google BigQuery Support for sparklyr

CRAN_Status_BadgeRdoc

sparkbq is asparklyrextension package providing an integration withGoogle BigQuery. It builds on top ofspark-bigquery, which provides a Google BigQuery data source toApache Spark.

Version Information

You can install the released version ofsparkbq from CRAN via

install.packages("sparkbq")

or the latest development version through

devtools::install_github("miraisolutions/sparkbq",ref="develop")

The following table provides an overview over supported versions of Apache Spark, Scala, andGoogle Dataproc:

sparkbqspark-bigqueryApache SparkScalaGoogle Dataproc
0.1.x0.1.02.2.x and 2.3.x2.111.2.x and 1.3.x

sparkbq is based on the Spark packagespark-bigquery which is available in a separateGitHub repository.

Example Usage

library(sparklyr)library(sparkbq)library(dplyr)config<- spark_config()sc<- spark_connect(master="local[*]",config=config)# Set Google BigQuery default settingsbigquery_defaults(billingProjectId="<your_billing_project_id>",gcsBucket="<your_gcs_bucket>",datasetLocation="US",serviceAccountKeyFile="<your_service_account_key_file>",type="direct")# Reading the public shakespeare data table# https://cloud.google.com/bigquery/public-data/# https://cloud.google.com/bigquery/sample-tableshamlet<-   spark_read_bigquery(sc,name="hamlet",projectId="bigquery-public-data",datasetId="samples",tableId="shakespeare") %>%  filter(corpus=="hamlet")# NOTE: predicate pushdown to BigQuery!# Retrieve results into a local tibblehamlet %>% collect()# Write result into "mysamples" dataset in our BigQuery (billing) projectspark_write_bigquery(hamlet,datasetId="mysamples",tableId="hamlet",mode="overwrite")

Authentication

When running outside of Google Cloud it is necessary to specify a service account JSON key file. The service account key file can be passed as parameterserviceAccountKeyFile tobigquery_defaults or directly tospark_read_bigquery andspark_write_bigquery.

Alternatively, an environment variableexport GOOGLE_APPLICATION_CREDENTIALS=/path/to/your/service_account_keyfile.json can be set (seehttps://cloud.google.com/docs/authentication/getting-started for more information). Make sure the variable is set before starting the R session.

When running on Google Cloud, e.g. Google Cloud Dataproc, application default credentials (ADC) may be used in which case it is not necessary to specify a service account key file.

Further Information

About

Sparklyr extension package to connect to Google BigQuery

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors4

  •  
  •  
  •  
  •  

Languages


[8]ページ先頭

©2009-2026 Movatter.jp