- Notifications
You must be signed in to change notification settings - Fork1.9k
Description
🚀 The feature, motivation and pitch
First, we would like to express our gratitude to the engineers of TensorRT-LLM. They have brought tremendous surprises to the community and driven significant progress in both science and industry. However, AFD (Attention-FFN Disaggregation), which has garnered substantial attention within the community recently, is not yet supported by TensorRT-LLM.
I have carefully reviewed the TensorRT-LLM documentation and paid particular attention to relevant issues, arriving at the following conclusions:
The TensorRT-LLM framework does not yet fully support a mature and usable AFD (Attention-FFN Disaggregation) feature. However, experimental implementations for specific models already exist and are in the development and iteration phase. Key issues still need to be addressed before its official deployment.
We hope the community will extend more warmth and encouragement to the TensorRT-LLM developers to accelerate the realization of the AFD feature. It is my wish that this post can serve as a centralized hub for one-stop solutions related to the AFD feature, making it easier for everyone to access relevant resources. Finally, we wish the developers good health and smooth work.
Alternatives
No response
Additional context
No response
Before submitting a new issue...
- Make sure you already searched for relevant issues, and checked thedocumentation andexamples for answers to frequently asked questions.