- Notifications
You must be signed in to change notification settings - Fork1
On The Planning Abilities of OpenAI's o1 Models: Feasibility, Optimality, and Generalizability
VITA-Group/o1-planning
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
Kevin Wang * ·Junbo Li * ·Neel P. Bhatt * ·Yihan Xi
.Qiang Liu .Ufuk Topcu .Atlas Wang .
We evaluated the GPT4 and o1 on planning tasks, highlighting their strength in problem understanding and identifying challenges in spatial reasoning and generalization.
- [2025/1] We are currently working on developing the benchmark and plan to release the code and data within a month.
We will update the detailed information and share access to more files soon.
- Release detailed experiments evaluation
- Project page
- Release automoation evaluation script (This would take a while)
OpenAI's o1 Models └─results └─barman (the domains) ... └─tyreworld └─p_.pddl.prompt (the prompt we used for experiments, including the domain and problem in natural language) └─p_.pddl.gpt4 (GPT4 results to the prompt) └─p_.pddl.o1-mini (O1-mini results to the prompt) └─p_.pddl.o1-preivew(o1-preview results to the prompt) └─random.py(only in randomized example, this encode the problem with random symbol) └─visual (this would include more visual examples and graphic) └─scripts (scripts used to generate files, and update in the future)
The detailedexperiment results
If you find our paper useful or interesting, please consider giving a star ⭐ and citing the following paper 📝.
@misc{wang2024planningabilitiesopenaiso1,title={On The Planning Abilities of OpenAI's o1 Models: Feasibility, Optimality, and Generalizability},author={Kevin Wang and Junbo Li and Neel P. Bhatt and Yihan Xi and Qiang Liu and Ufuk Topcu and Zhangyang Wang},year={2024},eprint={2409.19924},archivePrefix={arXiv},primaryClass={cs.AI},url={https://arxiv.org/abs/2409.19924}, }
The basic prompts are from llm+p available atthis GitHub repository. We thank all the authors for their great work and repos.
There are also some concurrent works that were released recently or will be released soon: