Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings
This repository was archived by the owner on Feb 3, 2021. It is now read-only.
/aztkPublic archive

SparklyR on Azure with AZTK

Jacob Freck edited this pageJan 25, 2018 ·2 revisions

How to use SparklyR on Azure with AZTK

This document is a guide for how to create an Apache Spark cluster for an R user. In this tutorial, we install and setup Rstudio Server.

This guidedoes not require any knowledge about Azure or cloud infrastructure.

The goal for this tutorial is to:

  • Provision a Spark cluster to use R with
  • To interact with said Spark cluster with a RStudio Server
  • To achieve the above goals quickly, cheaply, and easily

For this tutorial, we assume that you have the following requirements:

Setup AZTK

In your working directory, runaztk spark init to initialize AZTK. This command will create a.aztk/ folder in your working directory. Fill out the.aztk/secrets.yaml file with your Azure Batch and Azure Storage account secrets. We also recommend setting your ssh-key here.

For more details, seethis section

For this tutorial, please copy the folderaztk/custom-scripts into your working directory. This should be located in AZTK repo that you cloned when installing AZTK.(If you are working directly in the cloned repo, you can skip this step as the/custom-scripts folder is already there.)

Provision your cluster

Set up your Spark cluster with the aztk/r-base Docker image and a custom script

To provision your Spark cluster for R users, you will need to edit.aztk/cluster.yaml. You need to set the parametersdocker_repo andcustom_scripts as follows:

# .aztk/cluster.yaml...docker_repo:aztk/r-base:latestcustom_scripts:  -script:custom-scripts/rstudio_server.shrunOn:master...

This will tell AZTK to use theaztk/r-base image to build your cluster as well as configure Rstudio Server to run seamlessly after the cluster is set up.This requirescustom-scripts/rstudio_server.sh to be a valid path from your working directory

Feel free to modify the other parameters as needed.

Create cluster through command-line (recommended)

You can now run theaztk cluster create command:

aztk spark cluster create --id<my_spark_cluster> --size 10

This command will automatically use the contents that we modified in.aztk/cluster.yaml (and.aztk/spark-defaults.conf).

Interact with Sparklyr

After you've run youraztk spark cluster create command, you will need to wait a few minutes for your cluster to be ready.

Once it is ready, you can use the following command to ssh into your cluster's master node. This command will also port forward the necessary ports to start using RStudio Server and the standard Spark UIs:

aztk spark cluster ssh --id<my_spark_cluster>

Once you've ssh'ed in, you'll be in the master node of your cluster. You can open up your favorite browser and go tolocalhost:8787 to use RStudio Server. We have created a default user 'rstudio' with 'rstudio' as the password.

Connecting to your cluster

library(sparklyr)# Getting ip address of the master nodecluster_url<- paste0("spark://", system("hostname -i",intern=TRUE),":7077")sc<- spark_connect(master=cluster_url)

rstudio server setup

Once you've connected to your sparklyr, you can visitlocalhost:8080 to see the SparkUI and monitor the state of your cluster.

For more information about using sparklyr, here's thelink.


[8]ページ先頭

©2009-2025 Movatter.jp