Recreate and update a Dataproc on GKE virtual cluster

You can copy an existing Dataproc on GKE virtual cluster's configuration,update the copied configuration, and then create a new Dataproc on GKE cluster using the updated configuration.

Recreate and update a Dataproc on GKE cluster

gcloud

  1. Set environment variables:

    CLUSTER=existing Dataproc on GKE  cluster name \  REGION=region

  2. Export the existing Dataproc on GKE cluster configuration to a YAML file.

    gcloud dataproc clusters export $CLUSTER \    --region=$REGION > "${CLUSTER}-config.yaml"

  3. Update the configuration.

    1. Remove thekubernetesNamespacefield. Removing this field is necessary to avoid a namespace conflictwhen you create the updated cluster.

      Samplesed command to remove thekubernetesNamespace field:

      sed -E "s/kubernetesNamespace: .+$//g" ${CLUSTER}-config.yaml

    2. Make additional changes to update Dataproc on GKE virtual clusterconfiguration settings, such as changing the SparkcomponentVersion.

  4. Delete the existing Dataproc on GKE virtual cluster if you will create a cluster thathas the same name as the cluster it is updating (if you are replacing theoriginal cluster).

  5. Wait for the previous delete operation to finish, and then import theupdated cluster configuration to create a new Dataproc on GKE virtual cluster with the updated config settings.

    gcloud dataproc clusters import $CLUSTER \    --region=$REGION \    --source="${CLUSTER}-config.yaml"

API

  1. Set environment variables:

    CLUSTER=existing Dataproc on GKE  cluster name \  REGION=region

  2. Export the existing Dataproc on GKE cluster configuration to a YAML file.

    curl -X GET -H "Authorization: Bearer $(gcloud auth print-access-token)"  "https://dataproc.googleapis.com/v1/projects/${PROJECT}/regions/${REGION}/clusters/${CLUSTER}?alt=json" > "${CLUSTER}-config.json"

  3. Update the configuration.

    1. Remove thekubernetesNamespacefield. Removal of this field is necessary to avoid a namespace conflictwhen you create the updated cluster.

      Samplejq command to removekubernetesNamespace field:

      jq 'del(.virtualClusterConfig.kubernetesClusterConfig.kubernetesNamespace)'

    2. Make additional changes to update Dataproc on GKE virtual clusterconfiguration settings, such as changing the SparkcomponentVersion.

  4. Delete the existing Dataproc on GKE virtual cluster if you will create a clusterthat has the same name as the cluster it is updating (if you are replacing theoriginal cluster).

    curl -X DELETE -H "Authorization: Bearer $(gcloud auth print-access-token)" "https://dataproc.googleapis.com/v1/projects/${PROJECT}/regions/${REGION}/clusters/${CLUSTER}"

  5. Wait for the previous delete operation to finish, andthen import the updated cluster configuration to create a new Dataproc on GKE virtual cluster with the updated settings.

    curl -i -X POST  -H "Authorization: Bearer $(gcloud auth print-access-token)"  -H "Content-Type: application/json; charset=utf-8" -d "@${CLUSTER}-config.json" "https://dataproc.googleapis.com/v1/projects/${PROJECT}/regions/${REGION}/clusters?alt=json"

Console

The Google Cloud console does not support recreating a Dataproc on GKE virtual cluster by importing an existing cluster's configuration.

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2025-12-15 UTC.