serving-infrastructure

Here are 2 public repositories matching this topic...

ksm26 /Efficiently-Serving-LLMs

Learn the ins and outs of efficiently serving Large Language Models (LLMs). Dive into optimization techniques, including KV caching and Low Rank Adapters (LoRA), and gain hands-on experience with Predibase’s LoRAX framework inference server.

text-generation batch-processing server-optimization model-serving model-acceleration inference-optimization optimization-techniques machine-learning-operations deep-learning-techniques model-inference-service performance-enhancement scalability-strategies serving-infrastructure large-scale-deployment

UpdatedApr 12, 2024
Jupyter Notebook

christycorrupt935 /Hands-On-Large-Language-Models

Star2

🛠️ Explore large language models through hands-on projects and tutorials to enhance your understanding and practical skills in natural language processing.