- Notifications
You must be signed in to change notification settings - Fork30
Code and data for "Lumos: Learning Agents with Unified Data, Modular Design, and Open-Source LLMs"
License
allenai/lumos
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
🖋Authors:Da Yin,Faeze Brahman,Abhilasha Ravichander,Khyathi Chandu,Kai-Wei Chang,Yejin Choi,Bill Yuchen Lin
We introduce 🪄Lumos, Language Agents withUnified Data Formats,Modular Design, andOpen-Source LLMs.Lumos unifies a suite of complex interactive tasks and achieves competitive performance with GPT-4/3.5-based and larger open-source agents.
- 🧩Modular Architecture:
- 🧩Lumos consists of planning, grounding, and execution modules built based on LLAMA-2-7B/13B and off-the-shelf APIs.
- 🤗Lumos utilizes a unified data format that encompasses multiple task types, thereby enabling the developed agent framework to conveniently support a range of interactive tasks.
- 🌍Diverse Training Data:
- 🌍Lumos is trained with ~56K diverse high-quality subgoal/action annotations from ground-truth reasoning steps in existing benchmarks with GPT-4.
- ⚒️Lumos data can be instrumental for future research in developing open-source agents for complex interactive tasks.
- 🚀Competitive Performance:
- 🚀Lumos is comparable or even beatsGPT-series agents on web/complex QA tasks Mind2Web and HotpotQA, andlarger open agents on math and multimodal tasks.
- 🚀Lumos exceeds contemporaneous agents that have beenfine-tuned with in-domain HotpotQA, Mind2Web and ScienceQA annotations, such asFiReAct,AgentLM, andAutoAct.
- 🚀Lumos performs better than open agent baseline formulations includingchain-of-thoughts andintegrated training.
- 🚀Lumos surpasses larger open LLM agents and domain-specific agents on unseen tasks, WebShop and InterCode_SQL.
If you find this work is relevant with your research, please feel free to cite our work!
@article{yin2023lumos, title={{Agent Lumos: Unified and Modular Training for Open-Source Language Agents}}, author={Yin, Da and Brahman, Faeze and Ravichander, Abhilasha and Chandu, Khyathi and Chang, Kai-Wei and Choi, Yejin and Lin, Bill Yuchen}, journal={arXiv preprint arXiv:2311.05657}, year={2023}}
- [2024, Mar 18] We release the latestLumos version:
- 📑Lumos paper that covers newmultimodal tasks and 13B-scale model experiments
- 🤗Lumos demo that illustratesLumos planning and grounding processes
- [2023, Nov 8] We release the important items for training and evaluatingLumos:
- 💻Lumos code for annotation generation, training and evaluation
- 🤗Lumos checkpoints with 7B model size
- 🤗Lumos training annotations and their raw data
./setup.sh
Please make sure that the cudatoolkit version insetup.sh
aligns with your local cuda version.
We collect all the training annotations, raw data and prompt converted annotations in a singleGoogle Drive folder. It can be downloaded by
cd datapython -c "import gdown; gdown.download_folder('https://drive.google.com/drive/folders/1ASFhOkhezgewVxR01dQg-8KUVR8IdBlY?usp=sharing', quiet=True)"
We also provide generated annotations for planning and grounding modules in 🤗 Huggingface Datasets.
Dataset Names | 🤗 Huggingface Links |
---|---|
lumos_complex_qa_iterative | Planning,Grounding |
lumos_complex_qa_onetime | Planning,Grounding |
lumos_web_agent_iterative | Planning,Grounding |
lumos_multimodal_iterative | Planning,Grounding |
lumos_maths_iterative | Planning,Grounding |
lumos_maths_onetime | Planning,Grounding |
lumos_unified_iterative | Planning,Grounding |
./train.sh [MODULE] [FORMULATION]
[MODULE]
can be eitherplan
orground
.[FORMULATION]
can be eitheriterative
oronetime
.
You can adjust the fine-tuning hyperparameters and specific task you want to fine-tune in the training scripts such asfinetune_llama2_plan_iterative.sh
inscripts/train
.
We also provide the fine-tuned planning and grounding module checkpoints in 🤗 Huggingface.
Model Names | 🤗 Huggingface Links |
---|---|
lumos_complex_qa_iterative | Planning,Grounding |
lumos_complex_qa_iterative-13B | Planning,Grounding |
lumos_complex_qa_onetime | Planning,Grounding |
lumos_web_agent_iterative | Planning,Grounding |
lumos_web_agent_iterative-13B | Planning,Grounding |
lumos_maths_iterative | Planning,Grounding |
lumos_maths_onetime | Planning,Grounding |
lumos_maths_onetime-13B | Planning,Grounding |
lumos_unified_iterative | Planning,Grounding |
lumos_unified_iterative-13B | Planning,Grounding |
Evaluation scripts for different datasets are underscripts/eval
. For example, you can evaluate Lumos on HotpotQA by running:
./scripts/eval/hotpotqa.sh
We provide the code for generating training annotations based on raw existing benchmarks from scratch.
Before generating annotations, we first need to download the existing benchmarks providing ground-truth intermediate reasoning steps.The raw data are can be downloaded via thisGoogle Drive folder.
python -m data.prompt_convertion \ --domain DOMAIN \ --data_fn DATA_FN \ --convert_all
domain
covers maths, complex QA, web agent, multimodal.data_fn
is the path where raw benchmarks are stored.
For multimodal task annotation generation, please downloadCOCO 2017 train images indata/train/multimodal/raw_data
and unzip it.
We greatly thank Tulu team for providing awesomecode to finetune LLAMA-2. We also sincerely appreciate the contributors ofzeno-build,Mind2Web, andWebShop for providing fast GPT prompting, HTML preprocessing and evaluation docker environment.