- Notifications
You must be signed in to change notification settings - Fork0
License
datastaxdevs/terraform-nvidia-runai-stack
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
Build a production-grade, GPU-ready RAG platform that turns your documents into answers — end to end, in minutes.
- Spin up multi-cloud GPU infrastructure (GKE/EKS/AKS) with sane defaults and HPA profiles
- Ingest, chunk, and embed with NVIDIA NIMs; store vectors in Astra DB/HCD; search at scale
- Serve retrieval APIs with cross-encoder reranking and observability built in
- CLI-first operations: deploy, validate, monitor, and optimize in a few commands
# 1) Bootstrap envcp env.template .env# set NGC_API_KEY, GCP_PROJECT_ID, GCP_ZONE, GKE_CLUSTER_NAME# 2) Provision (example: GKE)terraform -chdir=deployment apply \ -var gcp_project_id="$GCP_PROJECT_ID" \ -var gcp_zone="$GCP_ZONE" \ -var gke_name_prefix="$GKE_CLUSTER_NAME"# 3) One-command dev deploy (DNS-free)./scripts/cli.sh deploy --profile dev./scripts/cli.sh status --extended
What you get:
- NVIDIA NIMs (embedder + reranker) deployed and wired to ingress
- Retrieval stack with clean APIs, reranking, and performance timing
- Terraform-managed infrastructure and day-2 scripts (monitoring, tuning)
- Clear docs and examples to move from POC → production
Quick links:Docs ·GKE Deploy ·Scripts
This project uses atwo-phase deployment approach:
- 🏗️ Infrastructure First: Use Terraform to create the Kubernetes cluster and supporting infrastructure
- 🚀 Applications Second: Use the CLI to deploy applications onto the existing infrastructure
Key Point: You must run Terraform commands BEFORE running CLI deployment commands. The CLI deploys applications onto infrastructure that Terraform creates.
Creates the cloud infrastructure (Kubernetes cluster, networking, bastion host, etc.)
Deploys applications and services onto the existing infrastructure
- gcloud, terraform, kubectl, helm, jq installed and authenticated
- Copy env.template to .env and fill any required values (minimal for dev)
- Required: Run
source scripts/setup_environment.shto load environment variables before infrastructure deployment
Domain configuration for Run:AI:
- Production: set RUNAI_DOMAIN to a DNS name you control and create a DNS record pointing to your ingress LoadBalancer.
- Development (no DNS): use an IP-based hostname like
runai.<LOAD_BALANCER_IP>.sslip.io(or nip.io). The installer uses this domain for Run:AI ingress automatically if none is provided. - Precedence: CLI/env values override
.env.
Purpose: Create the Kubernetes cluster and supporting infrastructure
# 1. Set up environmentcp env.template .env# Edit .env with your values - Required variables:# - NGC_API_KEY=nvapi-... # NVIDIA API key for NeMo services# - GCP_PROJECT_ID=your-gcp-project # Your Google Cloud project ID# - GCP_ZONE=us-central1-c # GCP zone for resources# - GKE_CLUSTER_NAME=your-cluster-name # Name for your GKE cluster## Optional but recommended for databases:# - ASTRA_DB_ENDPOINT= # DataStax Astra DB endpoint# - ASTRA_DB_TOKEN= # DataStax Astra DB token# - HCD_DB_ENDPOINT= # HyperConverged Database endpoint# - HCD_DB_TOKEN= # HyperConverged Database token## For troubleshooting terminal crashes:# - TERRAFORM_DEBUG=true # Enable verbose terraform loggingsource scripts/setup_environment.sh# Load environment variables and setup# 2. Provision infrastructure with Terraformcd deploymentterraform initterraform apply# Note: If you experience terminal crashes during terraform operations,# add TERRAFORM_DEBUG=true to your .env file and re-run source scripts/setup_environment.sh# 3. Verify infrastructure and test cluster accesscd ..bastion kubectl get nodes# Test cluster access
What this creates:
- GKE cluster with GPU support
- Bastion host for secure cluster access
- VPC, subnets, and security groups
- IAM roles and service accounts
- Load balancer infrastructure
Thescripts/setup_environment.sh script provides several benefits:
- Environment Loading: Automatically loads variables from your
.envfile - Bastion Function: Creates a convenient
bastioncommand for cluster access - Authentication Check: Verifies your gcloud authentication status
- Connectivity Test: Tests bastion connectivity (if infrastructure is deployed)
- Helpful Tips: Provides usage examples and next steps
Usage:source scripts/setup_environment.sh (run once per terminal session)
The setup script creates a convenientbastion function for executing commands on your cluster. Here are the different ways to access your cluster:
# Setup environment (run once per session)source scripts/setup_environment.sh# Execute commands on the clusterbastion kubectl get nodesbastion kubectl get pods --all-namespacesbastion"kubectl describe nodes | grep nvidia"
# Get bastion detailsBASTION_NAME=$(terraform -chdir=deployment output -raw gke_bastion_name)PROJECT_ID=$(terraform -chdir=deployment output -raw gcp_project_id)ZONE=$(terraform -chdir=deployment output -raw gcp_zone)# SSH to bastiongcloud compute ssh --project$PROJECT_ID --zone$ZONE$BASTION_NAME# Then run kubectl commands inside the SSH sessionkubectl get nodes
# Execute single commandsgcloud compute ssh --project$PROJECT_ID --zone$ZONE$BASTION_NAME --command="kubectl get nodes"
# Replace with your actual valuesgcloud compute ssh --project gcp-lcm-project --zone us-central1-c vz-mike-obrien-bastion --command="kubectl get nodes"
💡 Pro Tips:
- Use Method 1 (bastion function) for the best experience
- The setup script automatically detects your infrastructure configuration
- Use quotes for complex commands:
bastion "kubectl get pods | grep nemo" - Run
source scripts/setup_environment.shin each new terminal session
Purpose: Deploy applications onto the existing infrastructure
./scripts/cli.sh deploy --profile dev# If the CLI seems to hang after "Loading environment...",# skip local platform detection (e.g., when local kubectl is stale):./scripts/cli.sh deploy --profile dev --platform gke# Or explicitly disable detection:./scripts/cli.sh deploy --profile dev --no-detect
./scripts/cli.sh deploy --profile prod --domain your-domain.com --email admin@your-domain.com
What this deploys:
- GPU Operator for NVIDIA GPU management
- NVIDIA NIMs (embedder + reranker) for AI services
- NV-Ingest for document processing
- NGINX ingress controller
- TLS certificates (production only)
# Check deployment status./scripts/cli.sh status --extended./scripts/cli.sh nims status./scripts/cli.sh ingress status# Validate everything is working./scripts/cli.sh validate# Access services (development)./scripts/cli.sh port-forward# dev/no DNS access./scripts/cli.sh monitor lb --watch# load balancing/HPA view
Access your services:
- Reranker:
http://reranker.<LOAD_BALANCER_IP>.nip.io(dev) - NV-Ingest:
http://nv-ingest.<LOAD_BALANCER_IP>.nip.io(dev) - Production: Use your configured domain with TLS
If you're familiar with the process, here's the essential sequence:
# 1. Infrastructurecp env.template .env&&# edit .env (see Environment Configuration section for required values)source scripts/setup_environment.sh&&cd deployment&& terraform init&& terraform apply# 2. Applicationscd ..&& ./scripts/cli.sh deploy --profile dev# 3. Validate./scripts/cli.sh validate&& ./scripts/cli.sh status
For a more detailed walkthrough with troubleshooting tips, see the comprehensive setup guide below.
# macOSbrew install --cask google-cloud-sdk||truebrew install terraform||truegcloud auth logingcloud auth application-default login# Install kubectl and helm (see platform-specific instructions below)
cp env.template .env# Edit .env and set required values:# --- Required for Infrastructure ---NGC_API_KEY=nvapi-...# NVIDIA API key from NGCGCP_PROJECT_ID=your-gcp-project# Your Google Cloud project IDGCP_ZONE=us-central1-c# GCP zone (or us-east1-b, etc.)GKE_CLUSTER_NAME=your-cluster-name# Name for your GKE cluster# --- Required for Database (choose one) ---# For DataStax Astra:ASTRA_DB_ENDPOINT=https://your-db-id-region.apps.astra.datastax.comASTRA_DB_TOKEN=AstraCS:...# OR for HyperConverged Database:HCD_DB_ENDPOINT=https://your-hcd-endpointHCD_DB_TOKEN=your-hcd-token# --- Optional Troubleshooting ---TERRAFORM_DEBUG=true# Enable if experiencing terminal crashes
# Instead of command-line variables, use a tfvars file:terraform -chdir=deployment apply -var-file=../configs/gke/gke.tfvarsFor advanced users or troubleshooting, you can use the underlying scripts directly:
# Development profile (GPU Operator + NeMo; DNS-free)scripts/platform/gke/deploy_to_bastion.sh --development# NV‑Ingest only (through bastion)scripts/platform/gke/deploy_to_bastion.sh --deploy-nv-ingest# Run:AI guided setup (optional)scripts/cli.sh runai setup
Tips:
- CLI flags override environment variables and
.envvalues - Keep
.env:GKE_CLUSTER_NAMEconsistent with Terraform'sgke_name_prefix - Prefer running Kubernetes commands via the bastion resolved from Terraform outputs
- You can override the Terraform directory with
TF_ROOTif needed
| Profile | GPUs (A100 equiv) | Nodes | Expected Throughput | Notes |
|---|---|---|---|---|
| dev | 0-2 | 1-2 | low | DNS-free, quick validation |
| prod-small | 4-8 | 3-6 | medium | HA, controller+ingress tuned |
| prod-large | 8-16+ | 6-12 | high | HPA aggressive, multi-NIM scaling |
- Actual throughput varies by document mix and GPU types. See
deploy_nv_ingest.shfor optimized presets andoptimize_nv_ingest_performance.shfor day-2 tuning.
📚Complete Documentation - Full project documentation with navigation
Choose your component to get started:
- V2 Pipeline - Enterprise document processing with NV-Ingest
- Retrieval System - Intelligent Q&A with Astra RAG Azure
- Infrastructure - Terraform and Kubernetes automation
- NeMo Services - NVIDIA NeMo microservices deployment
Quick links:
- Scripts Quickstart:
scripts/README.md - GKE Deployment Guide:
docs/deployment/gke-deployment.md
The ingestion pipeline writes runtime data to several directories. You can colocate them under a single base directory via--data-dir (CLI) orDATA_DIR (env). CLI values override env, which override defaults.
Defaults (when no base is provided) follow OS conventions via platformdirs:
- processed/error: user data dir
- checkpoints: user state dir
- temp/downloads: user cache dir
Optional env overrides for individual paths (CLI still wins):OUTPUT_DIR,ERROR_DIR,TEMP_DIR,CHECKPOINT_DIR,DOWNLOAD_DIR.
Suggested recipes:
- Development:
DATA_DIR=.datato keep the repo clean - Production (Linux):
DATA_DIR=/var/lib/datastax-ingestion
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐│ V2 Pipeline │ │ Retrieval RAG │ │ Infrastructure ││ │ │ │ │ ││ • Document Proc │ │ • Vector Search │ │ • Terraform ││ • NV-Ingest │ │ • Azure OpenAI │ │ • Kubernetes ││ • Vector DB │ │ • Astra DB │ │ • Monitoring ││ • Azure Blob │ │ • Reranking │ │ • Scripts │└─────────────────┘ └─────────────────┘ └─────────────────┘ │ │ │ └───────────────────────┼───────────────────────┘ │ ┌─────────────────┐ │ NeMo Services │ │ │ │ • Microservices │ │ • GPU Accel │ │ • Embeddings │ │ • Reranking │ └─────────────────┘- Large-scale ingestion from multiple sources (local, Azure Blob, etc.)
- Intelligent chunking and embedding generation using NVIDIA models
- Vector database storage with Astra DB or HCD for semantic search
- Enterprise security with comprehensive SSL/TLS support
- Semantic document search with vector similarity and reranking
- AI-powered answers using Azure OpenAI GPT-4o with streaming responses
- Production monitoring with Arize integration and detailed metrics
- Multi-database support for both cloud and on-premises deployments
- Multi-cloud deployment across AWS, GCP, and Azure platforms
- Kubernetes orchestration with GPU support and auto-scaling
- Terraform modules for reproducible infrastructure provisioning
- Comprehensive monitoring with health checks and diagnostics
- Python: 3.8 or higher
- Kubernetes: 1.20+ with GPU support
- Terraform: 1.0+ for infrastructure automation
- Docker: For containerized deployments
- Vector Database: DataStax Astra DB or HCD
- GPU Compute: NVIDIA GPU-enabled clusters
- Object Storage: Azure Blob Storage, AWS S3, or Google Cloud Storage
- AI Services: NVIDIA API keys, Azure OpenAI deployments
# Install Homebrew (if not already installed)/bin/bash -c"$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"# Install required toolsbrew install --cask google-cloud-sdkbrew install --cask hashicorp/tap/terraform# Use HashiCorp tap for latest versionbrew install kubectl helm jq# Authenticate with Google Cloudgcloud auth logingcloud auth application-default login
# Update package listsudo apt-get update# Install required toolssudo apt-get install -y apt-transport-https ca-certificates gnupg curl wget jq# Install Google Cloud SDKcurl https://sdk.cloud.google.com| bashexec -l$SHELL# Install Terraformwget -O- https://apt.releases.hashicorp.com/gpg| gpg --dearmor| sudo tee /usr/share/keyrings/hashicorp-archive-keyring.gpgecho"deb [signed-by=/usr/share/keyrings/hashicorp-archive-keyring.gpg] https://apt.releases.hashicorp.com$(lsb_release -cs) main"| sudo tee /etc/apt/sources.list.d/hashicorp.listsudo apt-get update&& sudo apt-get install terraform# Install kubectlcurl -LO"https://dl.k8s.io/release/$(curl -L -s https://dl.k8s.io/release/stable.txt)/bin/linux/amd64/kubectl"sudo install -o root -g root -m 0755 kubectl /usr/local/bin/kubectl# Install Helmcurl https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3| bash# Authenticate with Google Cloudgcloud auth logingcloud auth application-default login
# Install required toolssudo dnf install -y curl wget jq# or yum for older versions# Install Google Cloud SDKcurl https://sdk.cloud.google.com| bashexec -l$SHELL# Install Terraformsudo dnf install -y dnf-plugins-coresudo dnf config-manager --add-repo https://rpm.releases.hashicorp.com/RHEL/hashicorp.reposudo dnf install -y terraform# Install kubectlcurl -LO"https://dl.k8s.io/release/$(curl -L -s https://dl.k8s.io/release/stable.txt)/bin/linux/amd64/kubectl"sudo install -o root -g root -m 0755 kubectl /usr/local/bin/kubectl# Install Helmcurl https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3| bash# Authenticate with Google Cloudgcloud auth logingcloud auth application-default login
# Install Chocolatey (if not already installed)Set-ExecutionPolicy Bypass-ScopeProcess-Force; [System.Net.ServicePointManager]::SecurityProtocol= [System.Net.ServicePointManager]::SecurityProtocol-bor3072; iex ((New-Object System.Net.WebClient).DownloadString('https://community.chocolatey.org/install.ps1'))# Install required toolschoco install-y gcloudsdk terraform kubernetes-cli kubernetes-helm jq# Authenticate with Google Cloudgcloud auth logingcloud auth application-default login
Note: After installation, verify all tools are available and meet version requirements:
gcloud --version# No specific version requirement (latest recommended)terraform --version# >= 1.9.0, < 2.0.0 (see deployment/modules/kubernetes/versions.tf)kubectl version --client# Compatible with Kubernetes 1.20+ (see System Requirements)helm version# ~> 2.17.0 (see deployment/modules/kubernetes/versions.tf)jq --version# No specific version requirement (latest recommended)
Supported Version Ranges:
- Terraform: 1.9.0+ (required by infrastructure modules)
- Kubernetes: 1.20+ with GPU support
- Python: 3.11+ (required by ingestion and retrieval packages)
- Helm: 2.17.0+ (required by Terraform providers)
# Controller patch (idempotent)./scripts/cli.sh nginx controller --yes# Ingress high throughput profile for all./scripts/cli.sh nginx ingress --target all --profile high_throughput --yes# Apply whitelist (CIDR CSV)./scripts/cli.sh ingress whitelist --allowed-ips"1.2.3.4/32,5.6.7.0/24" --yes# Validate deployment./scripts/cli.sh validate
Note: default ingress upload limit is 3g. Override viaINGRESS_MAX_BODY_SIZE env or the corresponding CLI flags.
The Infrastructure Administrator is an IT person responsible for the installation, setup and IT maintenance of the Run:ai product.
As part of the Infrastructure Administrator documentation you will find:
- Install Run:ai
- Understand the Run:ai installation
- Set up a Run:ai Cluster
- Set up Researchers to work with Run:ai
- IT Configuration of the Run:ai system
- Connect Run:ai to an identity provider
- Maintenance & monitoring of the Run:ai system
- Troubleshooting
For comprehensive Run:ai administration documentation, visit:NVIDIA Run:ai Infrastructure Administrator Guide
Note: The NVIDIA Run:ai docs are moving! For versions 2.20 and above, visit the newNVIDIA Run:ai documentation site. Documentation for versions 2.19 and below remains on the original site.
This project includes integrated Run:ai deployment capabilities:
# Run:AI guided setup (optional)./scripts/cli.sh runai setupThe setup process will guide you through configuring Run:ai for your specific infrastructure and requirements.
docs/├── README.md # Master documentation hub├── components/ # Component-specific guides│ ├── v2-pipeline/ # Document processing pipeline│ ├── retrieval-rag/ # Intelligent Q&A system│ ├── infrastructure/ # Terraform & Kubernetes│ └── nemo/ # NVIDIA NeMo services├── deployment/ # Platform deployment guides├── troubleshooting/ # Common issues & solutions└── archive/ # Historical documentation- SSL/TLS encryption for all service communications
- Certificate management with custom CA support
- Network isolation using VPCs and security groups
- Access control with RBAC and service accounts
- Audit logging for compliance and monitoring
- Data residency controls for sensitive information
- Performance metrics with millisecond-precision timing
- Health checks for all system components
- Resource utilization tracking and optimization
- Arize AI integration for production monitoring
- Custom dashboards for specific use cases
- Error reporting and automated alerting
We welcome contributions! Please see our component-specific documentation for detailed guidelines:
- Documentation: Follow the component-based organization in
docs/ - Code: See individual component READMEs for specific guidelines
- Testing: Comprehensive test suites available for all components
- Issues: Use GitHub issues for bug reports and feature requests
This project is licensed under the Apache License 2.0. See theLICENSE file for details.
- DataStax: Astra DB vector database platform and enterprise support
- NVIDIA: NeMo microservices, GPU acceleration, and AI model APIs
- Microsoft: Azure OpenAI services and cloud infrastructure
- Community: Open source contributors and enterprise partners
- Complete Documentation Hub - Comprehensive project documentation
- Deployment Guides - Platform-specific setup instructions
- Troubleshooting - Common issues and solutions
- Architecture Details - System design and components
- Historical Documentation - Consolidated history, previous versions, and development notes
Ready to get started? Visit ourDocumentation Hub for complete setup guides and examples.
About
Resources
License
Code of conduct
Contributing
Security policy
Uh oh!
There was an error while loading.Please reload this page.