sparklyr/sparklyrPublic

NotificationsYou must be signed in to change notification settings
Fork308
Star969

Unable to connect`sparklyr` to remote Kubernetes Spark cluster (EMR on EKS) #3506

New issue

Open

Unable to connectsparklyr to remote Kubernetes Spark cluster (EMR on EKS)#3506

Description

grantmcdermott

opened

on Oct 10, 2025

Hi folks.

I'm trying to connect to a remote Kubernetes-based Spark cluster (AWS EMR on EKS) from R usingsparklyr, but keep running into connection errors. The equivalent PySpark code works as expected. I'm not sure if this is a limitation ofsparklyr with Kubernetes deployments or if I'm missing something in my configuration.

What I'm trying to do

Python Code (Works)

This PySpark code successfully connects to my remote Kubernetes cluster:

frompyspark.sqlimportSparkSessionspark=SparkSession.builder \    .master("k8s://https://<myurl>:443") \    .appName("TestConnection") \    .config("spark.kubernetes.container.image","<my-spark-image>") \    .config("spark.executor.instances","2") \    .config("spark.executor.memory","4g") \    .config("spark.executor.cores","2") \    .getOrCreate()sc=spark.sparkContext

This works perfectly. It connects to the Kubernetes API server, spawns executor pods, and I can run queries.

R Code (Fails)

Unfortunately, I come unstuck with the following R equivalent:

library(sparklyr)sc= spark_connect(master="k8s://https://<myurl>:443",spark_home="/usr/lib/spark",config=list(spark.kubernetes.container.image="<my-spark-image>",spark.executor.instances=2,spark.executor.memory="4g",spark.executor.cores=2  ))

Error:

Error in spark_connect_gateway(gatewayAddress, gatewayPort, sessionId, ...) :   Gateway in localhost:8880 did not respond.Try running `options(sparklyr.log.console = TRUE)` followed by `sc <- spark_connect(...)` for more debugging info.

Side note: Runningoptions(sparklyr.log.console = TRUE) didn't actually give me any more info.

My understanding (with help from Claude)

I must admit that I'm a bit out of my comfort zone here, so I tried to troubleshoot further with Claude 4.5 Sonnet. See the summary below, although I can't speak to it's full accuracy.

Claude's Technical Summary:

The issue appears to be an architectural difference betweensparklyr and PySpark:

sparklyr's Gateway Architecture:sparklyr tries to start a local gateway process onlocalhost:8880 that acts as a bridge between R and Spark. This architecture works for local, YARN, and Mesos deployments.
Kubernetes Client Mode: When connecting to a Kubernetes cluster with ak8s:// master URL, the driver needs to run locally and communicate directly with the Kubernetes API server to spawn executor pods. There's no intermediate gateway in this model.
The Mismatch:sparklyr attempts to start its gateway and wait for a response, but the gateway can't establish a connection to a remote Kubernetes cluster. The connection fails before any Spark application is created.

PySpark works because the Python process becomes the Spark driver directly and communicates with Kubernetes natively, without needing a gateway.

Questions

Is this a known limitation? Doessparklyr currently support connecting to remote Kubernetes Spark clusters?
Am I doing something wrong? Is there a different way I should be configuring the connection?
Are there workarounds?
- Should I rather be usingpysparklyr /reticulate?

Any guidance would be much appreciated. Happy to provide more info RE my setup as needed. Thanks in advance!

Metadata

Assignees

No one assigned

Labels

No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Unable to connect`sparklyr` to remote Kubernetes Spark cluster (EMR on EKS) #3506

Description

What I'm trying to do

Python Code (Works)

R Code (Fails)

My understanding (with help from Claude)

Questions

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Movatterモバイル変換

Unable to connectsparklyr to remote Kubernetes Spark cluster (EMR on EKS) #3506

Description

What I'm trying to do

Python Code (Works)

R Code (Fails)

My understanding (with help from Claude)

Questions

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Unable to connect`sparklyr` to remote Kubernetes Spark cluster (EMR on EKS) #3506