pytorch/executorchPublic

NotificationsYou must be signed in to change notification settings
Fork762
Star3.7k

On-device AI across mobile, embedded and edge for PyTorch

License

View license

3.7k stars 762 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 10,153 Commits
.Package.swift		.Package.swift
.ci		.ci
.githooks		.githooks
.github		.github
backends		backends
codegen		codegen
configurations		configurations
data/bin		data/bin
desktop		desktop
devtools		devtools
docs		docs
examples		examples
exir		exir
export		export
extension		extension
kernels		kernels
profiler		profiler
runtime		runtime
schema		schema
scripts		scripts
shim @ cf6a954		shim @ cf6a954
shim_et		shim_et
src		src
test		test
third-party		third-party
tools		tools
util		util
website		website
zephyr		zephyr
.buckconfig		.buckconfig
.clang-format		.clang-format
.clang-tidy		.clang-tidy
.cmake-format.yaml		.cmake-format.yaml
.cmakelintrc		.cmakelintrc
.flake8		.flake8
.gitignore		.gitignore
.gitmodules		.gitmodules
.lintrunner.toml		.lintrunner.toml
.mypy.ini		.mypy.ini
CMakeLists.txt		CMakeLists.txt
CMakePresets.json		CMakePresets.json
CODEOWNERS		CODEOWNERS
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
Makefile		Makefile
Package.swift		Package.swift
README-wheel.md		README-wheel.md
README.md		README.md
Test.cmake		Test.cmake
conftest.py		conftest.py
install_executorch.bat		install_executorch.bat
install_executorch.py		install_executorch.py
install_executorch.sh		install_executorch.sh
install_requirements.py		install_requirements.py
install_requirements.sh		install_requirements.sh
install_utils.py		install_utils.py
pyproject.toml		pyproject.toml
pytest-windows.ini		pytest-windows.ini
pytest.ini		pytest.ini
requirements-dev.txt		requirements-dev.txt
requirements-examples.txt		requirements-examples.txt
requirements-lintrunner.txt		requirements-lintrunner.txt
run_python_script.sh		run_python_script.sh
setup.py		setup.py
torch_pin.py		torch_pin.py
version.txt		version.txt

Repository files navigation

ExecuTorch

On-device AI inference powered by PyTorch

ExecuTorch is PyTorch's unified solution for deploying AI models on-device—from smartphones to microcontrollers—built for privacy, performance, and portability. It powers Meta's on-device AI acrossInstagram, WhatsApp, Quest 3, Ray-Ban Meta Smart Glasses, andmore.

DeployLLMs, vision, speech, and multimodal models with the same PyTorch APIs you already know—accelerating research to production with seamless model export, optimization, and deployment. No manual C++ rewrites. No format conversions. No vendor lock-in.

📘 Table of Contents

Why ExecuTorch?

🔒 Native PyTorch Export — Direct export from PyTorch. No .onnx, .tflite, or intermediate format conversions. Preserve model semantics.
⚡ Production-Proven — Powers billions of users atMeta with real-time on-device inference.
💾 Tiny Runtime — 50KB base footprint. Runs on microcontrollers to high-end smartphones.
🚀12+ Hardware Backends — Open-source acceleration for Apple, Qualcomm, ARM, MediaTek, Vulkan, and more.
🎯 One Export, Multiple Backends — Switch hardware targets with a single line change. Deploy the same model everywhere.

How It Works

ExecuTorch usesahead-of-time (AOT) compilation to prepare PyTorch models for edge deployment:

🧩 Export — Capture your PyTorch model graph withtorch.export()
⚙️ Compile — Quantize, optimize, and partition to hardware backends →.pte
🚀 Execute — Load.pte on-device via lightweight C++ runtime

Models use a standardizedCore ATen operator set.Partitioners delegate subgraphs to specialized hardware (NPU/GPU) with CPU fallback.

Learn more:How ExecuTorch Works •Architecture Guide

Quick Start

Installation

pip install executorch

For platform-specific setup (Android, iOS, embedded systems), see theQuick Start documentation for additional info.

Export and Deploy in 3 Steps

importtorchfromexecutorch.exirimportto_edge_transform_and_lowerfromexecutorch.backends.xnnpack.partition.xnnpack_partitionerimportXnnpackPartitioner# 1. Export your PyTorch modelmodel=MyModel().eval()example_inputs= (torch.randn(1,3,224,224),)exported_program=torch.export.export(model,example_inputs)# 2. Optimize for target hardware (switch backends with one line)program=to_edge_transform_and_lower(exported_program,partitioner=[XnnpackPartitioner()]# CPU | CoreMLPartitioner() for iOS | QnnPartitioner() for Qualcomm).to_executorch()# 3. Save for deploymentwithopen("model.pte","wb")asf:f.write(program.buffer)# Test locally via ExecuTorch runtime's pybind API (optional)fromexecutorch.runtimeimportRuntimeruntime=Runtime.get()method=runtime.load_program("model.pte").load_method("forward")outputs=method.execute([torch.randn(1,3,224,224)])

Run on Device

C++

#include<executorch/extension/module/module.h>#include<executorch/extension/tensor/tensor.h>Modulemodule("model.pte");auto tensor = make_tensor_ptr({2,2}, {1.0f,2.0f,3.0f,4.0f});auto outputs =module.forward(tensor);

Swift (iOS)

import ExecuTorchletmodule=Module(filePath:"model.pte")letinput=Tensor<Float>([1.0,2.0,3.0,4.0], shape:[2,2])letoutputs=try module.forward(input)

Kotlin (Android)

val module=Module.load("model.pte")val inputTensor=Tensor.fromBlob(floatArrayOf(1.0f,2.0f,3.0f,4.0f), longArrayOf(2,2))val outputs= module.forward(EValue.from(inputTensor))

LLM Example: Llama

Export Llama models using theexport_llm script orOptimum-ExecuTorch:

# Using export_llmpython -m executorch.extension.llm.export.export_llm --model llama3_2 --output llama.pte# Using Optimum-ExecuTorchoptimum-cliexport executorch \  --model meta-llama/Llama-3.2-1B \  --task text-generation \  --recipe xnnpack \  --output_dir llama_model

Run on-device with the LLM runner API:

C++

#include<executorch/extension/llm/runner/text_llm_runner.h>auto runner = create_llama_runner("llama.pte","tiktoken.bin");executorch::extension::llm::GenerationConfig config{    .seq_len =128, .temperature =0.8f};runner->generate("Hello, how are you?", config);

Swift (iOS)

import ExecuTorchLLMletrunner=TextRunner(modelPath:"llama.pte", tokenizerPath:"tiktoken.bin")try runner.generate("Hello, how are you?",Config{    $0.sequenceLength=128}){ tokeninprint(token, terminator:"")}

Kotlin (Android) —API Docs •Demo App

val llmModule=LlmModule("llama.pte","tiktoken.bin",0.8f)llmModule.load()llmModule.generate("Hello, how are you?",128,object:LlmCallback {overridefunonResult(result:String) {print(result) }overridefunonStats(stats:String) { }})

For multimodal models (vision, audio), use theMultiModal runner API which extends the LLM runner to handle image and audio inputs alongside text. SeeLlava andVoxtral examples.

Seeexamples/models/llama for complete workflow including quantization, mobile deployment, and advanced options.

Next Steps:

📖Step-by-step tutorial — Complete walkthrough for your first model
⚡Colab notebook — Try ExecuTorch instantly in your browser
🤖Deploy Llama models — LLM workflow with quantization and mobile demos

Platform & Hardware Support

Platform	Supported Backends
Android	XNNPACK, Vulkan, Qualcomm, MediaTek, Samsung Exynos
iOS	XNNPACK, MPS, CoreML (Neural Engine)
Linux / Windows	XNNPACK, OpenVINO, CUDA(experimental)
macOS	XNNPACK, MPS, Metal(experimental)
Embedded / MCU	XNNPACK, ARM Ethos-U, NXP, Cadence DSP

SeeBackend Documentation for detailed hardware requirements and optimization guides.

Production Deployments

ExecuTorch powers on-device AI at scale across Meta's family of apps, VR/AR devices, and partner deployments.View success stories →

Examples & Models

LLMs:Llama 3.2/3.1/3,Qwen 3,Phi-4-mini,LiquidAI LFM2

Multimodal:Llava (vision-language),Voxtral (audio-language),Gemma (vision-language)

Vision/Speech:MobileNetV2,DeepLabV3,Whisper

Resources:examples/ directory •executorch-examples out-of-tree demos •Optimum-ExecuTorch for HuggingFace models

Key Features

ExecuTorch provides advanced capabilities for production deployment:

Quantization — Built-in support viatorchao for 8-bit, 4-bit, and dynamic quantization
Memory Planning — Optimize memory usage with ahead-of-time allocation strategies
Developer Tools — ETDump profiler, ETRecord inspector, and model debugger
Selective Build — Strip unused operators to minimize binary size
Custom Operators — Extend with domain-specific kernels
Dynamic Shapes — Support variable input sizes with bounded ranges

SeeAdvanced Topics for quantization techniques, custom backends, and compiler passes.