Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Hugging Face Inference Endpoints now supports GGUF out of the box!#9669

Pinned
ngxson started this conversation inShow and tell
Discussion options

ngxson
Sep 27, 2024
Collaborator

You can now deploy any GGUF model on your own endpoint, in just a few clicks!

Simply select GGUF, select hardware configuration and done! An endpoint powered byllama-server (built frommaster branch) will be deployed automatically. It works with all llama.cpp-compatible models, with all size, from 0.1B up to 405B parameters.

Try it now -->https://ui.endpoints.huggingface.co/

And the best part is:

@ggerganov: ggml.ai will be receiving a revenue share from all llama.cpp-powered endpoints used on HF. So for anyone who wants to support us, make sure to give those endpoints a try♥️

A huge thanks to@ggerganov@slaren and@huggingface team for making this possible!

llama.hfe.ok.mp4
You must be logged in to vote

Replies: 1 comment

Comment options

ngxson
Sep 27, 2024
Collaborator Author

Hermes 405B model can be deployed on 2xA100. The generation speed is around 8t/s, which is not bad!

Screenshot 2024-09-27 at 14 26 50
You must be logged in to vote
0 replies
Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment
Labels
None yet
1 participant
@ngxson

[8]ページ先頭

©2009-2025 Movatter.jp