Movatterモバイル変換


[0]ホーム

URL:


Skip to main content
Cornell University
We gratefully acknowledge support from the Simons Foundation,member institutions, and all contributors.Donate
arxiv logo>cs> arXiv:2404.14219
arXiv logo
Cornell University Logo

Computer Science > Computation and Language

arXiv:2404.14219 (cs)
[Submitted on 22 Apr 2024 (v1), last revised 30 Aug 2024 (this version, v4)]

Title:Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone

Authors:Marah Abdin,Jyoti Aneja,Hany Awadalla,Ahmed Awadallah,Ammar Ahmad Awan,Nguyen Bach,Amit Bahree,Arash Bakhtiari,Jianmin Bao,Harkirat Behl,Alon Benhaim,Misha Bilenko,Johan Bjorck,Sébastien Bubeck,Martin Cai,Qin Cai,Vishrav Chaudhary,Dong Chen,Dongdong Chen,Weizhu Chen,Yen-Chun Chen,Yi-Ling Chen,Hao Cheng,Parul Chopra,Xiyang Dai,Matthew Dixon,Ronen Eldan,Victor Fragoso,Jianfeng Gao,Mei Gao,Min Gao,Amit Garg,Allie Del Giorno,Abhishek Goswami,Suriya Gunasekar,Emman Haider,Junheng Hao,Russell J. Hewett,Wenxiang Hu,Jamie Huynh,Dan Iter,Sam Ade Jacobs,Mojan Javaheripi,Xin Jin,Nikos Karampatziakis,Piero Kauffmann,Mahoud Khademi,Dongwoo Kim,Young Jin Kim,Lev Kurilenko,James R. Lee,Yin Tat Lee,Yuanzhi Li,Yunsheng Li,Chen Liang,Lars Liden,Xihui Lin,Zeqi Lin,Ce Liu,Liyuan Liu,Mengchen Liu,Weishung Liu,Xiaodong Liu,Chong Luo,Piyush Madan,Ali Mahmoudzadeh,David Majercak,Matt Mazzola,Caio César Teodoro Mendes,Arindam Mitra,Hardik Modi,Anh Nguyen,Brandon Norick,Barun Patra,Daniel Perez-Becker,Thomas Portet,Reid Pryzant,Heyang Qin,Marko Radmilac,Liliang Ren,Gustavo de Rosa,Corby Rosset,Sambudha Roy,Olatunji Ruwase,Olli Saarikivi,Amin Saied,Adil Salim,Michael Santacroce,Shital Shah,Ning Shang,Hiteshi Sharma,Yelong Shen,Swadheen Shukla,Xia Song,Masahiro Tanaka,Andrea Tupini,Praneetha Vaddamanu,Chunyu Wang,Guanhua Wang,Lijuan Wang et al. (29 additional authors not shown)
View PDFHTML (experimental)
Abstract:We introduce phi-3-mini, a 3.8 billion parameter language model trained on 3.3 trillion tokens, whose overall performance, as measured by both academic benchmarks and internal testing, rivals that of models such as Mixtral 8x7B and GPT-3.5 (e.g., phi-3-mini achieves 69% on MMLU and 8.38 on MT-bench), despite being small enough to be deployed on a phone. Our training dataset is a scaled-up version of the one used for phi-2, composed of heavily filtered publicly available web data and synthetic data. The model is also further aligned for robustness, safety, and chat format. We also provide parameter-scaling results with a 7B, 14B models trained for 4.8T tokens, called phi-3-small, phi-3-medium, both significantly more capable than phi-3-mini (e.g., respectively 75%, 78% on MMLU, and 8.7, 8.9 on MT-bench). To enhance multilingual, multimodal, and long-context capabilities, we introduce three models in the phi-3.5 series: phi-3.5-mini, phi-3.5-MoE, and phi-3.5-Vision. The phi-3.5-MoE, a 16 x 3.8B MoE model with 6.6 billion active parameters, achieves superior performance in language reasoning, math, and code tasks compared to other open-source models of similar scale, such as Llama 3.1 and the Mixtral series, and on par with Gemini-1.5-Flash and GPT-4o-mini. Meanwhile, phi-3.5-Vision, a 4.2 billion parameter model derived from phi-3.5-mini, excels in reasoning tasks and is adept at handling both single-image and text prompts, as well as multi-image and text prompts.
Comments:24 pages
Subjects:Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Cite as:arXiv:2404.14219 [cs.CL]
 (orarXiv:2404.14219v4 [cs.CL] for this version)
 https://doi.org/10.48550/arXiv.2404.14219
arXiv-issued DOI via DataCite

Submission history

From: Sebastien Bubeck [view email]
[v1] Mon, 22 Apr 2024 14:32:33 UTC (3,072 KB)
[v2] Tue, 23 Apr 2024 14:49:38 UTC (3,072 KB)
[v3] Thu, 23 May 2024 22:42:40 UTC (12,248 KB)
[v4] Fri, 30 Aug 2024 21:17:17 UTC (12,361 KB)
Full-text links:

Access Paper:

Current browse context:
cs.CL
Change to browse by:
export BibTeX citation

Bookmark

BibSonomy logoReddit logo

Bibliographic and Citation Tools

Bibliographic Explorer(What is the Explorer?)
Connected Papers(What is Connected Papers?)
scite Smart Citations(What are Smart Citations?)

Code, Data and Media Associated with this Article

CatalyzeX Code Finder for Papers(What is CatalyzeX?)
Hugging Face(What is Huggingface?)
Papers with Code(What is Papers with Code?)

Demos

Hugging Face Spaces(What is Spaces?)

Recommenders and Search Tools

Influence Flower(What are Influence Flowers?)
CORE Recommender(What is CORE?)

arXivLabs: experimental projects with community collaborators

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.

Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.

Have an idea for a project that will add value for arXiv's community?Learn more about arXivLabs.

Which authors of this paper are endorsers? |Disable MathJax (What is MathJax?)

[8]ページ先頭

©2009-2025 Movatter.jp