Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Serve, optimize and scale PyTorch models in production

License

NotificationsYou must be signed in to change notification settings

pytorch/serve

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

⚠️ Notice: Limited Maintenance

This project is no longer actively maintained. While existing releases remain available, there are no planned updates, bug fixes, new features, or security patches. Users should be aware that vulnerabilities may not be addressed.

TorchServe now enforces token authorization enabled and model API control disabled by default. These security features are intended to address the concern of unauthorized API calls and to prevent potential malicious code from being introduced to the model server. Refer the following documentation for more information:Token Authorization,Model API control

TorchServe

Nightly buildDocker Nightly buildBenchmark NightlyDocker Regression NightlyKServe Regression NightlyKubernetes Regression Nightly

TorchServe is a flexible and easy-to-use tool for serving and scaling PyTorch models in production.

Requires python >= 3.8

curl http://127.0.0.1:8080/predictions/bert -T input.txt

🚀 Quick start with TorchServe

# Install dependenciespython ./ts_scripts/install_dependencies.py# Include dependencies for accelerator support with the relevant optional flagspython ./ts_scripts/install_dependencies.py --rocm=rocm61python ./ts_scripts/install_dependencies.py --cuda=cu121# Latest releasepip install torchserve torch-model-archiver torch-workflow-archiver# Nightly buildpip install torchserve-nightly torch-model-archiver-nightly torch-workflow-archiver-nightly

🚀 Quick start with TorchServe (conda)

# Install dependenciespython ./ts_scripts/install_dependencies.py# Include depeendencies for accelerator support with the relevant optional flagspython ./ts_scripts/install_dependencies.py --rocm=rocm61python ./ts_scripts/install_dependencies.py --cuda=cu121# Latest releaseconda install -c pytorch torchserve torch-model-archiver torch-workflow-archiver# Nightly buildconda install -c pytorch-nightly torchserve torch-model-archiver torch-workflow-archiver

Getting started guide

🐳 Quick Start with Docker

# Latest releasedocker pull pytorch/torchserve# Nightly builddocker pull pytorch/torchserve-nightly

Refer totorchserve docker for details.

🤖 Quick Start LLM Deployment

VLLM Engine

# Make sure to install torchserve with pip or conda as described above and login with `huggingface-cli login`python -m ts.llm_launcher --model_id meta-llama/Llama-3.2-3B-Instruct --disable_token_auth# Try it outcurl -X POST -d'{"model":"meta-llama/Llama-3.2-3B-Instruct", "prompt":"Hello, my name is", "max_tokens": 200}' --header"Content-Type: application/json""http://localhost:8080/predictions/model/1.0/v1/completions"

TRT-LLM Engine

# Make sure to install torchserve with python venv as described above and login with `huggingface-cli login`# pip install -U --use-deprecated=legacy-resolver -r requirements/trt_llm.txtpython -m ts.llm_launcher --model_id meta-llama/Meta-Llama-3.1-8B-Instruct --engine trt_llm --disable_token_auth# Try it outcurl -X POST -d'{"prompt":"count from 1 to 9 in french ", "max_tokens": 100}' --header"Content-Type: application/json""http://localhost:8080/predictions/model"

🚢 Quick Start LLM Deployment with Docker

#export token=<HUGGINGFACE_HUB_TOKEN>docker build --pull. -f docker/Dockerfile.vllm -t ts/vllmdocker run --rm -ti --shm-size 10g --gpus all -e HUGGING_FACE_HUB_TOKEN=$token -p 8080:8080 -v data:/data ts/vllm --model_id meta-llama/Meta-Llama-3-8B-Instruct --disable_token_auth# Try it outcurl -X POST -d'{"model":"meta-llama/Meta-Llama-3-8B-Instruct", "prompt":"Hello, my name is", "max_tokens": 200}' --header"Content-Type: application/json""http://localhost:8080/predictions/model/1.0/v1/completions"

Refer toLLM deployment for details and other methods.

⚡ Why TorchServe

🤔 How does TorchServe work

🏆 Highlighted Examples

Formore examples

🛡️ TorchServe Security Policy

SECURITY.md

🤓 Learn More

https://pytorch.org/serve

🫂 Contributing

We welcome all contributions!

To learn more about how to contribute, see the contributor guidehere.

📰 News

💖 All Contributors

Made withcontrib.rocks.

⚖️ Disclaimer

This repository is jointly operated and maintained by Amazon, Meta and a number of individual contributors listed in theCONTRIBUTORS file. For questions directed at Meta, please send an email toopensource@fb.com. For questions directed at Amazon, please send an email totorchserve@amazon.com. For all other questions, please open up an issue in this repositoryhere.

TorchServe acknowledges theMulti Model Server (MMS) project from which it was derived


[8]ページ先頭

©2009-2025 Movatter.jp