- Notifications
You must be signed in to change notification settings - Fork2
FocusLLM: Scaling LLM’s Context by Parallel Decoding
License
leezythu/FocusLLM
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
This repository contains the official implementation ofFocusLLM: Scaling LLM’s Context by Parallel Decoding.
conda create -n focusllm python=3.10.14conda activate focusllmconda install pytorch pytorch-cuda=12.1 -c pytorch -c nvidiapip install transformers deepspeed accelerate datasets peft pandas seaborn rouge fuzzywuzzy jieba python-Levenshteinpip install flash-attn --no-build-isolation
Download the data for training and evaluation on Longbench. For Infinite-Bench, you can download fromInfiniteBench. We will also release the checkpoint of FocusLLM.
https://huggingface.co/datasets/zhangyik21/focusllm_train_data/tree/main
bash train.sh
Hyper parameters
memory_stride
[64, 128, 256, 512, 1024, 2048]local_window
3072add_params
[q, k, v, o]
Hyper parameters
memory_stride
2048local_window
2048inference_batch_size
[TODO in the next version]- parallel level for parallel decoding
bash eval_passkey.sh
bash eval_longbench.sh
bash eval_infbench.sh
This project builds upon the codebase ofActivation Beacon, and we sincerely thank the authors for their valuable contribution. However, please note that the "beacon token" mentioned in our code actually refers to the "candidate token" as described in our paper. While we reuse the term "beacon token," its function is fundamentally different from the beacon token in the Activation Beacon paper. For details on how the candidate token functions, please refer to our paper.
Due to memory constraints, during training, we randomly select either the repetition loss or continuation loss for optimization at each step. If sufficient memory is available, you can modify the forward function in src/activation_beacon_llama/modeling_llama.py to optimize both losses simultaneously.
If you find this repository useful, please give us a star ⭐.
To cite our work:
@misc{li2024focusllmscalingllmscontext, title={FocusLLM: Scaling LLM's Context by Parallel Decoding}, author={Zhenyu Li and Yike Zhang and Tengyu Pan and Yutao Sun and Zhichao Duan and Junjie Fang and Rong Han and Zixuan Wang and Jianyong Wang}, year={2024}, eprint={2408.11745}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2408.11745}, }