Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up

The official implementation of "LightTransfer: Your Long-Context LLM is Secretly a Hybrid Model with Effortless Adaptation"

NotificationsYou must be signed in to change notification settings

sail-sg/LightTrans

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 

Repository files navigation

 
LightTransfer: Your Long-Context LLM is Secretly a Hybrid Model with Effortless Adaptation
 

🤗HuggingFace | •🆕Update News | •🤔Reporting Issues | •📜Paper Link

Introduction

LightTransfer is a lightweight transformation framework for enhancing the efficiency of large transformer models, such as LLaMA and QwQ, in long-context understanding and long CoT generation. By identifyinglazy layers—those primarily attending to initial or recent tokens—LightTransfer replaces their full attention with streaming attention, significantly reducing memory overhead.

  • Improved efficiency with minimal performance loss:
    LightTransfer achieves up to2.17× higher throughput while maintaining strong performance (<1.5% drop on LongBench).
  • Flexible adaptation for long-context tasks:
    Workswithout retraining for long-context understanding and requires only minimal fine-tuning for advanced long CoT generation, such as mathematical reasoning inQwQ-STILL, achieving53.3% on AIME24.

For more details, visit ourproject page.

News

[2025.03.16] We release the checkpoint of QwQ-32B-LightTransfer. Seemodel card for details.

LightTranfer-Train

We release the checkpoint ofQwQ-LightTransfer, which is a 32B-parameter model built onQwen/Qwen2.5-32B-Instruct and fine-tuned via SFT onRUC-AIBOX/long_form_thought_data_5k.

  • By replacing 50% of the model’s full attention layers with streaming attention,specifically layers [5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 26, 27, 28, 30, 31, 32, 33, 35, 37, 38, 43, 51], it substantially reduces memory costs.
  • QwQ-LightTransfer scores 53.3% on the advanced math benchmark AIME24, demonstrating its strong o1-like long reasoning capabilities.

Performance Evaluation

We have evaluated QwQ-LightTransfer on several long reasoning generation benchmarks. Some of the evaluation results are shown in the table below.

MethodMath-OAIAIME24AIME25GSM8K
o1-preview85.544.6--
QwQ-STILL90.246.733.395.6
LongGen78.216.7-95.4
LightTransfer90.753.340.095.5

Usages

Import from Transformers

To load the QwQ-LightTransfer model using Transformers, use the following code:

importtorchfromtransformersimportAutoTokenizer,AutoModelForCausalLMmodel_name='QwQ-32B-LightTransfer'tokenizer=AutoTokenizer.from_pretrained(model_name)model=AutoModelForCausalLM.from_pretrained(model_name,torch_dtype=torch.bfloat16,trust_remote_code=True,device_map='auto')text="Hi, I'm QwQ-32B-LightTransfer."inputs=tokenizer(text,return_tensors='pt').to(model.device)withtorch.no_grad():outputs=model.generate(inputs['input_ids'],max_gen_len=32000)print(tokenizer.decode(outputs[0]))

Evaluation scripts

License

Code and model weights are licensed under Apache-2.0.

Citation

@misc{zhang2025lighttransferlongcontextllmsecretly,      title={LightTransfer: Your Long-Context LLM is Secretly a Hybrid Model with Effortless Adaptation},       author={Xuan Zhang and Fengzhuo Zhang and Cunxiao Du and Chao Du and Tianyu Pang and Wei Gao and Min Lin},      year={2025},      eprint={2410.13846},      archivePrefix={arXiv},      primaryClass={cs.CL},      url={https://arxiv.org/abs/2410.13846}, }

About

The official implementation of "LightTransfer: Your Long-Context LLM is Secretly a Hybrid Model with Effortless Adaptation"

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages


[8]ページ先頭

©2009-2025 Movatter.jp