datastaxdevs/terraform-nvidia-runai-stackPublic

NotificationsYou must be signed in to change notification settings
Fork0
Star0

License

Apache-2.0 license

0 stars 0 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.github		.github
configs		configs
deployment		deployment
docs		docs
examples		examples
logs		logs
notebooks		notebooks
packages		packages
scripts		scripts
tests		tests
tools		tools
.cursorignore		.cursorignore
.dockerignore		.dockerignore
.env.template		.env.template
.env_audit.json		.env_audit.json
.gitignore		.gitignore
.license-header-apache2.txt		.license-header-apache2.txt
.pre-commit-config.yaml		.pre-commit-config.yaml
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
SECURITY.md		SECURITY.md
gke_startup.sh		gke_startup.sh
ruff.toml		ruff.toml

Repository files navigation

DataStax Vector Pipeline — RAG and Document AI

Build a production-grade, GPU-ready RAG platform that turns your documents into answers — end to end, in minutes.

Why this repo

Spin up multi-cloud GPU infrastructure (GKE/EKS/AKS) with sane defaults and HPA profiles
Ingest, chunk, and embed with NVIDIA NIMs; store vectors in Astra DB/HCD; search at scale
Serve retrieval APIs with cross-encoder reranking and observability built in
CLI-first operations: deploy, validate, monitor, and optimize in a few commands

TL;DR (60 seconds)

# 1) Bootstrap envcp env.template .env# set NGC_API_KEY, GCP_PROJECT_ID, GCP_ZONE, GKE_CLUSTER_NAME# 2) Provision (example: GKE)terraform -chdir=deployment apply \  -var gcp_project_id="$GCP_PROJECT_ID" \  -var gcp_zone="$GCP_ZONE" \  -var gke_name_prefix="$GKE_CLUSTER_NAME"# 3) One-command dev deploy (DNS-free)./scripts/cli.sh deploy --profile dev./scripts/cli.sh status --extended

What you get:

NVIDIA NIMs (embedder + reranker) deployed and wired to ingress
Retrieval stack with clean APIs, reranking, and performance timing
Terraform-managed infrastructure and day-2 scripts (monitoring, tuning)
Clear docs and examples to move from POC → production

Quick links:Docs ·GKE Deploy ·Scripts

🎯 Deployment Overview

This project uses atwo-phase deployment approach:

🏗️ Infrastructure First: Use Terraform to create the Kubernetes cluster and supporting infrastructure
🚀 Applications Second: Use the CLI to deploy applications onto the existing infrastructure

Key Point: You must run Terraform commands BEFORE running CLI deployment commands. The CLI deploys applications onto infrastructure that Terraform creates.

Phase 1: Infrastructure Provisioning (Terraform)

Creates the cloud infrastructure (Kubernetes cluster, networking, bastion host, etc.)

Phase 2: Application Deployment (CLI)

Deploys applications and services onto the existing infrastructure

📋 Prerequisites

gcloud, terraform, kubectl, helm, jq installed and authenticated
Copy env.template to .env and fill any required values (minimal for dev)
Required: Runsource scripts/setup_environment.sh to load environment variables before infrastructure deployment

Domain configuration for Run:AI:

Production: set RUNAI_DOMAIN to a DNS name you control and create a DNS record pointing to your ingress LoadBalancer.
Development (no DNS): use an IP-based hostname likerunai.<LOAD_BALANCER_IP>.sslip.io (or nip.io). The installer uses this domain for Run:AI ingress automatically if none is provided.
Precedence: CLI/env values override.env.

🏗️ Phase 1: Infrastructure Provisioning

Purpose: Create the Kubernetes cluster and supporting infrastructure

# 1. Set up environmentcp env.template .env# Edit .env with your values - Required variables:# - NGC_API_KEY=nvapi-...                    # NVIDIA API key for NeMo services# - GCP_PROJECT_ID=your-gcp-project         # Your Google Cloud project ID# - GCP_ZONE=us-central1-c                  # GCP zone for resources# - GKE_CLUSTER_NAME=your-cluster-name      # Name for your GKE cluster## Optional but recommended for databases:# - ASTRA_DB_ENDPOINT=                       # DataStax Astra DB endpoint# - ASTRA_DB_TOKEN=                          # DataStax Astra DB token# - HCD_DB_ENDPOINT=                         # HyperConverged Database endpoint# - HCD_DB_TOKEN=                            # HyperConverged Database token## For troubleshooting terminal crashes:# - TERRAFORM_DEBUG=true                     # Enable verbose terraform loggingsource scripts/setup_environment.sh# Load environment variables and setup# 2. Provision infrastructure with Terraformcd deploymentterraform initterraform apply# Note: If you experience terminal crashes during terraform operations,# add TERRAFORM_DEBUG=true to your .env file and re-run source scripts/setup_environment.sh# 3. Verify infrastructure and test cluster accesscd ..bastion kubectl get nodes# Test cluster access

What this creates:

GKE cluster with GPU support
Bastion host for secure cluster access
VPC, subnets, and security groups
IAM roles and service accounts
Load balancer infrastructure

🔧 Environment Setup Script

Thescripts/setup_environment.sh script provides several benefits:

Environment Loading: Automatically loads variables from your.env file
Bastion Function: Creates a convenientbastion command for cluster access
Authentication Check: Verifies your gcloud authentication status
Connectivity Test: Tests bastion connectivity (if infrastructure is deployed)
Helpful Tips: Provides usage examples and next steps

Usage:source scripts/setup_environment.sh (run once per terminal session)

🔗 Accessing Your Cluster via Bastion

The setup script creates a convenientbastion function for executing commands on your cluster. Here are the different ways to access your cluster:

Method 1: Using the Bastion Function (Recommended)

# Setup environment (run once per session)source scripts/setup_environment.sh# Execute commands on the clusterbastion kubectl get nodesbastion kubectl get pods --all-namespacesbastion"kubectl describe nodes | grep nvidia"

Method 2: Direct SSH Connection

# Get bastion detailsBASTION_NAME=$(terraform -chdir=deployment output -raw gke_bastion_name)PROJECT_ID=$(terraform -chdir=deployment output -raw gcp_project_id)ZONE=$(terraform -chdir=deployment output -raw gcp_zone)# SSH to bastiongcloud compute ssh --project$PROJECT_ID --zone$ZONE$BASTION_NAME# Then run kubectl commands inside the SSH sessionkubectl get nodes

Method 3: SSH with Inline Commands

# Execute single commandsgcloud compute ssh --project$PROJECT_ID --zone$ZONE$BASTION_NAME --command="kubectl get nodes"

Method 4: Manual gcloud SSH (if you know the details)

# Replace with your actual valuesgcloud compute ssh --project gcp-lcm-project --zone us-central1-c vz-mike-obrien-bastion --command="kubectl get nodes"

💡 Pro Tips:

Use Method 1 (bastion function) for the best experience
The setup script automatically detects your infrastructure configuration
Use quotes for complex commands:bastion "kubectl get pods | grep nemo"
Runsource scripts/setup_environment.sh in each new terminal session

🚀 Phase 2: Application Deployment

Purpose: Deploy applications onto the existing infrastructure

Development (DNS-free, quick start):

./scripts/cli.sh deploy --profile dev# If the CLI seems to hang after "Loading environment...",# skip local platform detection (e.g., when local kubectl is stale):./scripts/cli.sh deploy --profile dev --platform gke# Or explicitly disable detection:./scripts/cli.sh deploy --profile dev --no-detect

Production (with domain/email):

./scripts/cli.sh deploy --profile prod --domain your-domain.com --email admin@your-domain.com

What this deploys:

GPU Operator for NVIDIA GPU management
NVIDIA NIMs (embedder + reranker) for AI services
NV-Ingest for document processing
NGINX ingress controller
TLS certificates (production only)

✅ Phase 3: Validate & Operate

# Check deployment status./scripts/cli.sh status --extended./scripts/cli.sh nims status./scripts/cli.sh ingress status# Validate everything is working./scripts/cli.sh validate# Access services (development)./scripts/cli.sh port-forward# dev/no DNS access./scripts/cli.sh monitor lb --watch# load balancing/HPA view

Access your services:

Reranker:http://reranker.<LOAD_BALANCER_IP>.nip.io (dev)
NV-Ingest:http://nv-ingest.<LOAD_BALANCER_IP>.nip.io (dev)
Production: Use your configured domain with TLS

⚡ Quick Reference (For Experienced Users)

If you're familiar with the process, here's the essential sequence:

# 1. Infrastructurecp env.template .env&&# edit .env (see Environment Configuration section for required values)source scripts/setup_environment.sh&&cd deployment&& terraform init&& terraform apply# 2. Applicationscd ..&& ./scripts/cli.sh deploy --profile dev# 3. Validate./scripts/cli.sh validate&& ./scripts/cli.sh status

📖 Detailed Setup Guide (Alternative)

For a more detailed walkthrough with troubleshooting tips, see the comprehensive setup guide below.

Prerequisites Installation

# macOSbrew install --cask google-cloud-sdk||truebrew install terraform||truegcloud auth logingcloud auth application-default login# Install kubectl and helm (see platform-specific instructions below)

Environment Configuration

cp env.template .env# Edit .env and set required values:# --- Required for Infrastructure ---NGC_API_KEY=nvapi-...# NVIDIA API key from NGCGCP_PROJECT_ID=your-gcp-project# Your Google Cloud project IDGCP_ZONE=us-central1-c# GCP zone (or us-east1-b, etc.)GKE_CLUSTER_NAME=your-cluster-name# Name for your GKE cluster# --- Required for Database (choose one) ---# For DataStax Astra:ASTRA_DB_ENDPOINT=https://your-db-id-region.apps.astra.datastax.comASTRA_DB_TOKEN=AstraCS:...# OR for HyperConverged Database:HCD_DB_ENDPOINT=https://your-hcd-endpointHCD_DB_TOKEN=your-hcd-token# --- Optional Troubleshooting ---TERRAFORM_DEBUG=true# Enable if experiencing terminal crashes

Alternative: Using tfvars file

# Instead of command-line variables, use a tfvars file:terraform -chdir=deployment apply -var-file=../configs/gke/gke.tfvars

Legacy Deployment Methods

For advanced users or troubleshooting, you can use the underlying scripts directly:

# Development profile (GPU Operator + NeMo; DNS-free)scripts/platform/gke/deploy_to_bastion.sh --development# NV‑Ingest only (through bastion)scripts/platform/gke/deploy_to_bastion.sh --deploy-nv-ingest# Run:AI guided setup (optional)scripts/cli.sh runai setup

Tips:

CLI flags override environment variables and.env values
Keep.env:GKE_CLUSTER_NAME consistent with Terraform'sgke_name_prefix
Prefer running Kubernetes commands via the bastion resolved from Terraform outputs
You can override the Terraform directory withTF_ROOT if needed

📐 Deployment Profiles & Sizing (Guidance)

Profile	GPUs (A100 equiv)	Nodes	Expected Throughput	Notes
dev	0-2	1-2	low	DNS-free, quick validation
prod-small	4-8	3-6	medium	HA, controller+ingress tuned
prod-large	8-16+	6-12	high	HPA aggressive, multi-NIM scaling

Actual throughput varies by document mix and GPU types. Seedeploy_nv_ingest.sh for optimized presets andoptimize_nv_ingest_performance.sh for day-2 tuning.

📚 Documentation (Full)

📚Complete Documentation - Full project documentation with navigation

Choose your component to get started:

V2 Pipeline - Enterprise document processing with NV-Ingest
Retrieval System - Intelligent Q&A with Astra RAG Azure
Infrastructure - Terraform and Kubernetes automation
NeMo Services - NVIDIA NeMo microservices deployment

Quick links:

Scripts Quickstart:scripts/README.md
GKE Deployment Guide:docs/deployment/gke-deployment.md

Runtime data directories

The ingestion pipeline writes runtime data to several directories. You can colocate them under a single base directory via--data-dir (CLI) orDATA_DIR (env). CLI values override env, which override defaults.

Defaults (when no base is provided) follow OS conventions via platformdirs:

processed/error: user data dir
checkpoints: user state dir
temp/downloads: user cache dir

Optional env overrides for individual paths (CLI still wins):OUTPUT_DIR,ERROR_DIR,TEMP_DIR,CHECKPOINT_DIR,DOWNLOAD_DIR.

Suggested recipes:

Development:DATA_DIR=.data to keep the repo clean
Production (Linux):DATA_DIR=/var/lib/datastax-ingestion

🏛️ System Architecture

┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐│   V2 Pipeline   │    │ Retrieval RAG   │    │ Infrastructure  ││                 │    │                 │    │                 ││ • Document Proc │    │ • Vector Search │    │ • Terraform     ││ • NV-Ingest     │    │ • Azure OpenAI  │    │ • Kubernetes    ││ • Vector DB     │    │ • Astra DB      │    │ • Monitoring    ││ • Azure Blob    │    │ • Reranking     │    │ • Scripts       │└─────────────────┘    └─────────────────┘    └─────────────────┘         │                       │                       │         └───────────────────────┼───────────────────────┘                                 │                    ┌─────────────────┐                    │  NeMo Services  │                    │                 │                    │ • Microservices │                    │ • GPU Accel     │                    │ • Embeddings    │                    │ • Reranking     │                    └─────────────────┘

🎯 Use Cases

📄 Enterprise Document Processing

Large-scale ingestion from multiple sources (local, Azure Blob, etc.)
Intelligent chunking and embedding generation using NVIDIA models
Vector database storage with Astra DB or HCD for semantic search
Enterprise security with comprehensive SSL/TLS support

🔍 Intelligent Q&A Systems

Semantic document search with vector similarity and reranking
AI-powered answers using Azure OpenAI GPT-4o with streaming responses
Production monitoring with Arize integration and detailed metrics
Multi-database support for both cloud and on-premises deployments

🏗️ Infrastructure Automation

Multi-cloud deployment across AWS, GCP, and Azure platforms
Kubernetes orchestration with GPU support and auto-scaling
Terraform modules for reproducible infrastructure provisioning
Comprehensive monitoring with health checks and diagnostics

📋 Prerequisites

System Requirements

Python: 3.8 or higher
Kubernetes: 1.20+ with GPU support
Terraform: 1.0+ for infrastructure automation
Docker: For containerized deployments

Cloud Services

Vector Database: DataStax Astra DB or HCD
GPU Compute: NVIDIA GPU-enabled clusters
Object Storage: Azure Blob Storage, AWS S3, or Google Cloud Storage
AI Services: NVIDIA API keys, Azure OpenAI deployments

Platform-Specific Installation

macOS

# Install Homebrew (if not already installed)/bin/bash -c"$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"# Install required toolsbrew install --cask google-cloud-sdkbrew install --cask hashicorp/tap/terraform# Use HashiCorp tap for latest versionbrew install kubectl helm jq# Authenticate with Google Cloudgcloud auth logingcloud auth application-default login

Linux (Ubuntu/Debian)

# Update package listsudo apt-get update# Install required toolssudo apt-get install -y apt-transport-https ca-certificates gnupg curl wget jq# Install Google Cloud SDKcurl https://sdk.cloud.google.com| bashexec -l$SHELL# Install Terraformwget -O- https://apt.releases.hashicorp.com/gpg| gpg --dearmor| sudo tee /usr/share/keyrings/hashicorp-archive-keyring.gpgecho"deb [signed-by=/usr/share/keyrings/hashicorp-archive-keyring.gpg] https://apt.releases.hashicorp.com$(lsb_release -cs) main"| sudo tee /etc/apt/sources.list.d/hashicorp.listsudo apt-get update&& sudo apt-get install terraform# Install kubectlcurl -LO"https://dl.k8s.io/release/$(curl -L -s https://dl.k8s.io/release/stable.txt)/bin/linux/amd64/kubectl"sudo install -o root -g root -m 0755 kubectl /usr/local/bin/kubectl# Install Helmcurl https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3| bash# Authenticate with Google Cloudgcloud auth logingcloud auth application-default login

Linux (CentOS/RHEL/Fedora)

# Install required toolssudo dnf install -y curl wget jq# or yum for older versions# Install Google Cloud SDKcurl https://sdk.cloud.google.com| bashexec -l$SHELL# Install Terraformsudo dnf install -y dnf-plugins-coresudo dnf config-manager --add-repo https://rpm.releases.hashicorp.com/RHEL/hashicorp.reposudo dnf install -y terraform# Install kubectlcurl -LO"https://dl.k8s.io/release/$(curl -L -s https://dl.k8s.io/release/stable.txt)/bin/linux/amd64/kubectl"sudo install -o root -g root -m 0755 kubectl /usr/local/bin/kubectl# Install Helmcurl https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3| bash# Authenticate with Google Cloudgcloud auth logingcloud auth application-default login

Windows

# Install Chocolatey (if not already installed)Set-ExecutionPolicy Bypass-ScopeProcess-Force; [System.Net.ServicePointManager]::SecurityProtocol= [System.Net.ServicePointManager]::SecurityProtocol-bor3072; iex ((New-Object System.Net.WebClient).DownloadString('https://community.chocolatey.org/install.ps1'))# Install required toolschoco install-y gcloudsdk terraform kubernetes-cli kubernetes-helm jq# Authenticate with Google Cloudgcloud auth logingcloud auth application-default login

Note: After installation, verify all tools are available and meet version requirements:

gcloud --version# No specific version requirement (latest recommended)terraform --version# >= 1.9.0, < 2.0.0 (see deployment/modules/kubernetes/versions.tf)kubectl version --client# Compatible with Kubernetes 1.20+ (see System Requirements)helm version# ~> 2.17.0 (see deployment/modules/kubernetes/versions.tf)jq --version# No specific version requirement (latest recommended)

Supported Version Ranges:

Terraform: 1.9.0+ (required by infrastructure modules)
Kubernetes: 1.20+ with GPU support
Python: 3.11+ (required by ingestion and retrieval packages)
Helm: 2.17.0+ (required by Terraform providers)

🧰 Helpful CLI Examples

# Controller patch (idempotent)./scripts/cli.sh nginx controller --yes# Ingress high throughput profile for all./scripts/cli.sh nginx ingress --target all --profile high_throughput --yes# Apply whitelist (CIDR CSV)./scripts/cli.sh ingress whitelist --allowed-ips"1.2.3.4/32,5.6.7.0/24" --yes# Validate deployment./scripts/cli.sh validate

Note: default ingress upload limit is 3g. Override viaINGRESS_MAX_BODY_SIZE env or the corresponding CLI flags.

🏢 Run:ai Administration

Infrastructure Administrator Overview

The Infrastructure Administrator is an IT person responsible for the installation, setup and IT maintenance of the Run:ai product.

As part of the Infrastructure Administrator documentation you will find:

Install Run:ai
- Understand the Run:ai installation
- Set up a Run:ai Cluster
- Set up Researchers to work with Run:ai
IT Configuration of the Run:ai system
Connect Run:ai to an identity provider
Maintenance & monitoring of the Run:ai system
Troubleshooting

For comprehensive Run:ai administration documentation, visit:NVIDIA Run:ai Infrastructure Administrator Guide

Note: The NVIDIA Run:ai docs are moving! For versions 2.20 and above, visit the newNVIDIA Run:ai documentation site. Documentation for versions 2.19 and below remains on the original site.

Run:ai Setup with this Project

This project includes integrated Run:ai deployment capabilities:

# Run:AI guided setup (optional)./scripts/cli.sh runai setup

The setup process will guide you through configuring Run:ai for your specific infrastructure and requirements.

📚 Documentation Structure

docs/├── README.md                    # Master documentation hub├── components/                  # Component-specific guides│   ├── v2-pipeline/            # Document processing pipeline│   ├── retrieval-rag/          # Intelligent Q&A system│   ├── infrastructure/         # Terraform & Kubernetes│   └── nemo/                   # NVIDIA NeMo services├── deployment/                 # Platform deployment guides├── troubleshooting/            # Common issues & solutions└── archive/                    # Historical documentation

🔐 Security & Enterprise Features

SSL/TLS encryption for all service communications
Certificate management with custom CA support
Network isolation using VPCs and security groups
Access control with RBAC and service accounts
Audit logging for compliance and monitoring
Data residency controls for sensitive information

📊 Monitoring & Observability

Performance metrics with millisecond-precision timing
Health checks for all system components
Resource utilization tracking and optimization
Arize AI integration for production monitoring
Custom dashboards for specific use cases
Error reporting and automated alerting

🤝 Contributing

We welcome contributions! Please see our component-specific documentation for detailed guidelines:

Documentation: Follow the component-based organization indocs/
Code: See individual component READMEs for specific guidelines
Testing: Comprehensive test suites available for all components
Issues: Use GitHub issues for bug reports and feature requests

📄 License

This project is licensed under the Apache License 2.0. See theLICENSE file for details.

🙏 Acknowledgments

DataStax: Astra DB vector database platform and enterprise support
NVIDIA: NeMo microservices, GPU acceleration, and AI model APIs
Microsoft: Azure OpenAI services and cloud infrastructure
Community: Open source contributors and enterprise partners

📖 Additional Resources

Complete Documentation Hub - Comprehensive project documentation
Deployment Guides - Platform-specific setup instructions
Troubleshooting - Common issues and solutions
Architecture Details - System design and components
Historical Documentation - Consolidated history, previous versions, and development notes

Ready to get started? Visit ourDocumentation Hub for complete setup guides and examples.

About

No description, website, or topics provided.

Resources

Readme

License

Apache-2.0 license

Code of conduct

Contributing

Releases

No releases published

Packages

No packages published

Movatterモバイル変換

License

datastaxdevs/terraform-nvidia-runai-stack

Folders and files

Latest commit

History

Repository files navigation

DataStax Vector Pipeline — RAG and Document AI

Why this repo

TL;DR (60 seconds)

🎯 Deployment Overview

Phase 1: Infrastructure Provisioning (Terraform)

Phase 2: Application Deployment (CLI)

📋 Prerequisites

🏗️ Phase 1: Infrastructure Provisioning

🔧 Environment Setup Script

🔗 Accessing Your Cluster via Bastion

Method 1: Using the Bastion Function (Recommended)

Method 2: Direct SSH Connection

Method 3: SSH with Inline Commands

Method 4: Manual gcloud SSH (if you know the details)

🚀 Phase 2: Application Deployment

Development (DNS-free, quick start):

Production (with domain/email):

✅ Phase 3: Validate & Operate

⚡ Quick Reference (For Experienced Users)

📖 Detailed Setup Guide (Alternative)

Prerequisites Installation

Environment Configuration

Alternative: Using tfvars file

Legacy Deployment Methods

📐 Deployment Profiles & Sizing (Guidance)

📚 Documentation (Full)

Runtime data directories

🏛️ System Architecture

🎯 Use Cases

📄 Enterprise Document Processing

🔍 Intelligent Q&A Systems

🏗️ Infrastructure Automation

📋 Prerequisites

System Requirements

Cloud Services

Platform-Specific Installation

macOS

Linux (Ubuntu/Debian)

Linux (CentOS/RHEL/Fedora)

Windows

🧰 Helpful CLI Examples

🏢 Run:ai Administration

Infrastructure Administrator Overview

Run:ai Setup with this Project

📚 Documentation Structure

🔐 Security & Enterprise Features

📊 Monitoring & Observability

🤝 Contributing

📄 License

🙏 Acknowledgments

📖 Additional Resources

About

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages0

Languages

Packages