huggingface/text-generation-inferencePublic

NotificationsYou must be signed in to change notification settings
Fork1.2k
Star10.3k

Trtllm backend improvements#3231

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Jump to bottom

Open

leejuyuu wants to merge9 commits intohuggingface:main

base:main

Choose a base branch

fromleejuyuu:trtllm

Open

Trtllm backend improvements#3231

leejuyuu wants to merge9 commits intohuggingface:mainfromleejuyuu:trtllm

Conversation

Copy link

leejuyuu commentedMay 17, 2025

What does this PR do?

Trtllm backend improvements

feat: add new finish reasons
fix: fix prometheus_port CLI short arg conflict
fix: fix segfault when canceling request
feat: add stop sequence support
feat: catch broader exception
feat: check existence of config files

Fixes#3205

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read thecontributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or theforum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@mfuntowicz

leejuyuu added9 commits

May 18, 2025 03:30

feat(trtllm): add new finish reasons

c458d21

Add new finish reasons introduced in TensorRT-LLM v0.16.0.

fix: fix prometheus_port CLI short arg conflict

cc4b584

The short arg of `prometheus_port` conflicts with `port`. Remove theshort arg variant.Fixeshuggingface#3205

fix(trtllm): fix segfault when canceling request

0858af2

When a request is cancelled, the `tensorrt_llm::executor::Result`contains `outputTokenIds` with size 1, but `outputTokenIds[0]` has size0. This causes `as_generation_step` to segfault.Check the size of `outputTokenIds` and `logProbs` before attempting toaccess the inner vector. The `finishReasons` can be skipped because ithas only one dimension and the minimum beam size is 1.Because cxx have not added Option support yet, include two boolean flagsto denote whether the value is valid.Change log level when request is cancelled to debug.

feat(trtllm): add stop sequence support

27d0330

Support per request stop sequences.

feat(trtllm): catch broader exception

987337b

The trycatch only uses the `what()` method, which means we can catch thebroader `std::exception` instead. This is beneficial becausenlohmann/json also throws exception.

feat(trtllm): check existence of config files

56dd0a0

When the required config files are not present, nlohmann/json throwsparsing error, which does not help much for identifying what was wrong.Check the existence of these files early and return specific errormessages.

fix(trtllm): fix do_sample being ignored

41819d7

Currently, the do_sample option is ignored and the executor will alwayssample. Set top_k to 1 if do_sample is false.

feat(trtllm): get more accurate start time

f7bd82a

Get a more accurate inference start time from the trtllm response.Because `Instant` does not expose absolute value, create referencepoints on both sides and return duration relative to the referencepoint instead.

perf(trtllm): reduce futile loop iterations

fab395b

The executor_status_looper runs a spin loop, even if there are no activerequests. This makes the service constantly wasting a CPU core.Make the loop block on receiving requests if there are no running onesto reduce CPU usage when idle.

Labels

None yet

1 participant

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Trtllm backend improvements#3231

Are you sure you want to change the base?

Trtllm backend improvements#3231

Uh oh!

Conversation

leejuyuu commentedMay 17, 2025

What does this PR do?

Before submitting

Who can review?

Uh oh!

Uh oh!