RajiRai/GenerativeAIExamplesPublic

forked fromNVIDIA/GenerativeAIExamples

NotificationsYou must be signed in to change notification settings
Fork0
Star0

Generative AI reference workflows optimized for accelerated infrastructure and microservice architecture.

License

Apache-2.0 license

0 stars 921 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
RetrievalAugmentedGeneration		RetrievalAugmentedGeneration
deploy		deploy
docs		docs
examples		examples
experimental		experimental
integrations		integrations
models		models
notebooks		notebooks
tools		tools
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CHANGELOG.md		CHANGELOG.md
GenAI_class_Jul15.ipynb		GenAI_class_Jul15.ipynb
LICENSE.DATA		LICENSE.DATA
LICENSE.md		LICENSE.md
README.md		README.md
SECURITY.md		SECURITY.md

Repository files navigation

NVIDIA Generative AI Examples

Introduction

State-of-the-art Generative AI examples that are easy to deploy, test, and extend. All examples run on the high performance NVIDIA CUDA-X software stack and NVIDIA GPUs.

NVIDIA NGC

Generative AI Examples uses resources from theNVIDIA NGC AI Development Catalog.

GPU-optimized containers used in these examples
Release notes and developer documentation

Retrieval Augmented Generation (RAG)

A RAG pipeline embeds multimodal data -- such as documents, images, and video -- into a database connected to a LLM. RAG lets users chat with their data!

Developer RAG Examples

The developer RAG examples run on a single VM. They demonstrate how to combine NVIDIA GPU acceleration with popular LLM programming frameworks using NVIDIA'sopen source connectors. The examples are easy to deploy viaDocker Compose.

Examples support local and remote inference endpoints. If you have a GPU, you can inference locally viaTensorRT-LLM. If you don't have a GPU, you can inference and embed remotely viaNVIDIA AI Foundations endpoints.

Model	Embedding	Framework	Description	Multi-GPU	TRT-LLM	NVIDIA AI Foundation	Triton	Vector Database
llama-2	e5-large-v2	Llamaindex	Canonical QA Chatbot	YES	YES	No	YES	Milvus/PGVector
mixtral_8x7b	nvolveqa_40k	Langchain	Nvidia AI foundation based QA Chatbot	No	No	YES	YES	FAISS
llama-2	all-MiniLM-L6-v2	Llama Index	QA Chatbot, GeForce, Windows	NO	YES	NO	NO	FAISS
llama-2	nvolveqa_40k	Langchain	QA Chatbot, Task Decomposition Agent	No	No	YES	YES	FAISS
mixtral_8x7b	nvolveqa_40k	Langchain	Minimilastic example showcasing RAG using Nvidia AI foundation models	No	No	YES	YES	FAISS

Enterprise RAG Examples

The enterprise RAG examples run as microservies distributed across multiple VMs and GPUs. They show how RAG pipelines can be orchestrated withKubernetes and deployed withHelm.

Enterprise RAG examples include aKubernetes operator for LLM lifecycle management. It is compatible with theNVIDIA GPU operator that automates GPU discovery and lifecycle management in a Kubernetes cluster.

Enterprise RAG examples also support local and remote inference viaTensorRT-LLM andNVIDIA AI Foundations endpoints.

Model	Embedding	Framework	Description	Multi-GPU	Multi-node	TRT-LLM	NVIDIA AI Foundation	Triton	Vector Database
llama-2	NV-Embed-QA-003	Llamaindex	QA Chatbot, Helm, k8s	NO	NO	YES	NO	YES	Milvus

Tools

Example tools and tutorials to enhance LLM development and productivity when using NVIDIA RAG pipelines.

Name	Description	Deployment	Tutorial
Evaluation	Example open source RAG eval tool that uses synthetic data generation and LLM-as-a-judge	Docker compose file	README
Observability	Observability serves as an efficient mechanism for both monitoring and debugging RAG pipelines.	Docker compose file	README

Open Source Integrations

These are open source connectors for NVIDIA-hosted and self-hosted API endpoints. These open source connectors are maintained and tested by NVIDIA engineers.

Name	Framework	Chat	Text Embedding	Python	Description
NVIDIA AI Foundation Endpoints	Langchain	YES	YES	YES	Easy access to NVIDIA hosted models. Supports chat, embedding, code generation, steerLM, multimodal, and RAG.
NVIDIA Triton + TensorRT-LLM	Langchain	YES	YES	YES	This connector allows Langchain to remotely interact with a Triton inference server over GRPC or HTTP tfor optimized LLM inference.
NVIDIA Triton Inference Server	LlamaIndex	YES	YES	NO	Triton inference server provides API access to hosted LLM models over gRPC.
NVIDIA TensorRT-LLM	LlamaIndex	YES	YES	NO	TensorRT-LLM provides a Python API to build TensorRT engines with state-of-the-art optimizations for LLM inference on NVIDIA GPUs.

NVIDIA support

In each example README we indicate the level of support provided.

Feedback / Contributions

We're posting these examples on GitHub to support the NVIDIA LLM community, facilitate feedback. We invite contributions via GitHub Issues or pull requests!

Known issues

In each of the READMEs, we indicate any known issues and encourage the community to provide feedback.
The datasets provided as part of this project is under a different license for research and evaluation purposes.
This project will download and install additional third-party open source software projects. Review the license terms of these open source projects before use.

About

Generative AI reference workflows optimized for accelerated infrastructure and microservice architecture.

Releases

No releases published

Packages

No packages published

Languages

Python57.3%
Jupyter Notebook29.4%
Go9.1%
Makefile1.5%
Jinja0.8%
Shell0.8%
Other1.1%

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

License

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

NVIDIA Generative AI Examples

Introduction

NVIDIA NGC

Retrieval Augmented Generation (RAG)

Developer RAG Examples

Enterprise RAG Examples

Tools

Open Source Integrations

NVIDIA support

Feedback / Contributions

Known issues

About

Resources

License

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages

Languages

Movatterモバイル変換

License

RajiRai/GenerativeAIExamples

Folders and files

Latest commit

History

Repository files navigation

NVIDIA Generative AI Examples

Introduction

NVIDIA NGC

Retrieval Augmented Generation (RAG)

Developer RAG Examples

Enterprise RAG Examples

Tools

Open Source Integrations

NVIDIA support

Feedback / Contributions

Known issues

About

Resources

License

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages0

Languages

Packages