- Notifications
You must be signed in to change notification settings - Fork0
The official implementation of "LightTransfer: Your Long-Context LLM is Secretly a Hybrid Model with Effortless Adaptation"
sail-sg/LightTrans
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
LightTransfer is a lightweight transformation framework for enhancing the efficiency of large transformer models, such as LLaMA and QwQ, in long-context understanding and long CoT generation. By identifyinglazy layers—those primarily attending to initial or recent tokens—LightTransfer replaces their full attention with streaming attention, significantly reducing memory overhead.
- Improved efficiency with minimal performance loss:
LightTransfer achieves up to2.17× higher throughput while maintaining strong performance (<1.5% drop on LongBench). - Flexible adaptation for long-context tasks:
Workswithout retraining for long-context understanding and requires only minimal fine-tuning for advanced long CoT generation, such as mathematical reasoning inQwQ-STILL, achieving53.3% on AIME24.
For more details, visit ourproject page.
[2025.03.16] We release the checkpoint of QwQ-32B-LightTransfer. Seemodel card for details.
We release the checkpoint ofQwQ-LightTransfer, which is a 32B-parameter model built onQwen/Qwen2.5-32B-Instruct and fine-tuned via SFT onRUC-AIBOX/long_form_thought_data_5k.
- By replacing 50% of the model’s full attention layers with streaming attention,specifically layers [5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 26, 27, 28, 30, 31, 32, 33, 35, 37, 38, 43, 51], it substantially reduces memory costs.
- QwQ-LightTransfer scores 53.3% on the advanced math benchmark AIME24, demonstrating its strong o1-like long reasoning capabilities.
We have evaluated QwQ-LightTransfer on several long reasoning generation benchmarks. Some of the evaluation results are shown in the table below.
Method | Math-OAI | AIME24 | AIME25 | GSM8K |
---|---|---|---|---|
o1-preview | 85.5 | 44.6 | - | - |
QwQ-STILL | 90.2 | 46.7 | 33.3 | 95.6 |
LongGen | 78.2 | 16.7 | - | 95.4 |
LightTransfer | 90.7 | 53.3 | 40.0 | 95.5 |
Import from Transformers
To load the QwQ-LightTransfer model using Transformers, use the following code:
importtorchfromtransformersimportAutoTokenizer,AutoModelForCausalLMmodel_name='QwQ-32B-LightTransfer'tokenizer=AutoTokenizer.from_pretrained(model_name)model=AutoModelForCausalLM.from_pretrained(model_name,torch_dtype=torch.bfloat16,trust_remote_code=True,device_map='auto')text="Hi, I'm QwQ-32B-LightTransfer."inputs=tokenizer(text,return_tensors='pt').to(model.device)withtorch.no_grad():outputs=model.generate(inputs['input_ids'],max_gen_len=32000)print(tokenizer.decode(outputs[0]))
Evaluation scripts
Code and model weights are licensed under Apache-2.0.
@misc{zhang2025lighttransferlongcontextllmsecretly, title={LightTransfer: Your Long-Context LLM is Secretly a Hybrid Model with Effortless Adaptation}, author={Xuan Zhang and Fengzhuo Zhang and Cunxiao Du and Chao Du and Tianyu Pang and Wei Gao and Min Lin}, year={2025}, eprint={2410.13846}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2410.13846}, }
About
The official implementation of "LightTransfer: Your Long-Context LLM is Secretly a Hybrid Model with Effortless Adaptation"
Resources
Uh oh!
There was an error while loading.Please reload this page.
Stars
Watchers
Forks
Releases
Packages0
Contributors2
Uh oh!
There was an error while loading.Please reload this page.