Modular Documentation
The Modular Platform accelerates AI inference and abstracts hardwarecomplexity. Using our Docker container, you can deploy a GenAI model fromHugging Face with an OpenAI-compatible endpoint on a wide range of hardware.
And if you need to customize the model or tune a GPU kernel, Modularprovides a depth of model extensibility and GPU programmability that youwon't find anywhere else.
Get startedpython
from openai import OpenAIclient = OpenAI(base_url="http://0.0.0.0:8000/v1", api_key="EMPTY")completion = client.chat.completions.create( model="google/gemma-3-27b-it", messages=[ {"role": "user", "content": "Who won the world series in 2020?"} ],)print(completion.choices[0].message.content)Learning tools
500+ models supported
We're on a mission to make open source AI models as fast and easy to use as possible. Every model in our repo has been optimized usingMAX Graph to ensure performance and portability across any architecture.
View Model RepoLatest blog posts
Go to blog






