Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Generative AI reference workflows optimized for accelerated infrastructure and microservice architecture.

License

NotificationsYou must be signed in to change notification settings

RajiRai/GenerativeAIExamples

 
 

Repository files navigation

Introduction

State-of-the-art Generative AI examples that are easy to deploy, test, and extend. All examples run on the high performance NVIDIA CUDA-X software stack and NVIDIA GPUs.

NVIDIA NGC

Generative AI Examples uses resources from theNVIDIA NGC AI Development Catalog.

Sign up for afree NGC developer account to access:

  • GPU-optimized containers used in these examples
  • Release notes and developer documentation

Retrieval Augmented Generation (RAG)

A RAG pipeline embeds multimodal data -- such as documents, images, and video -- into a database connected to a LLM. RAG lets users chat with their data!

Developer RAG Examples

The developer RAG examples run on a single VM. They demonstrate how to combine NVIDIA GPU acceleration with popular LLM programming frameworks using NVIDIA'sopen source connectors. The examples are easy to deploy viaDocker Compose.

Examples support local and remote inference endpoints. If you have a GPU, you can inference locally viaTensorRT-LLM. If you don't have a GPU, you can inference and embed remotely viaNVIDIA AI Foundations endpoints.

ModelEmbeddingFrameworkDescriptionMulti-GPUTRT-LLMNVIDIA AI FoundationTritonVector Database
llama-2e5-large-v2LlamaindexCanonical QA ChatbotYESYESNoYESMilvus/PGVector
mixtral_8x7bnvolveqa_40kLangchainNvidia AI foundation based QA ChatbotNoNoYESYESFAISS
llama-2all-MiniLM-L6-v2Llama IndexQA Chatbot, GeForce, WindowsNOYESNONOFAISS
llama-2nvolveqa_40kLangchainQA Chatbot, Task Decomposition AgentNoNoYESYESFAISS
mixtral_8x7bnvolveqa_40kLangchainMinimilastic example showcasing RAG using Nvidia AI foundation modelsNoNoYESYESFAISS

Enterprise RAG Examples

The enterprise RAG examples run as microservies distributed across multiple VMs and GPUs. They show how RAG pipelines can be orchestrated withKubernetes and deployed withHelm.

Enterprise RAG examples include aKubernetes operator for LLM lifecycle management. It is compatible with theNVIDIA GPU operator that automates GPU discovery and lifecycle management in a Kubernetes cluster.

Enterprise RAG examples also support local and remote inference viaTensorRT-LLM andNVIDIA AI Foundations endpoints.

ModelEmbeddingFrameworkDescriptionMulti-GPUMulti-nodeTRT-LLMNVIDIA AI FoundationTritonVector Database
llama-2NV-Embed-QA-003LlamaindexQA Chatbot, Helm, k8sNONOYESNOYESMilvus

Tools

Example tools and tutorials to enhance LLM development and productivity when using NVIDIA RAG pipelines.

NameDescriptionDeploymentTutorial
EvaluationExample open source RAG eval tool that uses synthetic data generation and LLM-as-a-judgeDocker compose fileREADME
ObservabilityObservability serves as an efficient mechanism for both monitoring and debugging RAG pipelines.Docker compose fileREADME

Open Source Integrations

These are open source connectors for NVIDIA-hosted and self-hosted API endpoints. These open source connectors are maintained and tested by NVIDIA engineers.

NameFrameworkChatText EmbeddingPythonDescription
NVIDIA AI Foundation EndpointsLangchainYESYESYESEasy access to NVIDIA hosted models. Supports chat, embedding, code generation, steerLM, multimodal, and RAG.
NVIDIA Triton + TensorRT-LLMLangchainYESYESYESThis connector allows Langchain to remotely interact with a Triton inference server over GRPC or HTTP tfor optimized LLM inference.
NVIDIA Triton Inference ServerLlamaIndexYESYESNOTriton inference server provides API access to hosted LLM models over gRPC.
NVIDIA TensorRT-LLMLlamaIndexYESYESNOTensorRT-LLM provides a Python API to build TensorRT engines with state-of-the-art optimizations for LLM inference on NVIDIA GPUs.

NVIDIA support

In each example README we indicate the level of support provided.

Feedback / Contributions

We're posting these examples on GitHub to support the NVIDIA LLM community, facilitate feedback. We invite contributions via GitHub Issues or pull requests!

Known issues

  • In each of the READMEs, we indicate any known issues and encourage the community to provide feedback.
  • The datasets provided as part of this project is under a different license for research and evaluation purposes.
  • This project will download and install additional third-party open source software projects. Review the license terms of these open source projects before use.

About

Generative AI reference workflows optimized for accelerated infrastructure and microservice architecture.

Resources

License

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python57.3%
  • Jupyter Notebook29.4%
  • Go9.1%
  • Makefile1.5%
  • Jinja0.8%
  • Shell0.8%
  • Other1.1%

[8]ページ先頭

©2009-2025 Movatter.jp