Movatterモバイル変換

tracel-ai/burnPublic

NotificationsYou must be signed in to change notification settings
Fork621
Star11.6k

Burn is a next generation Deep Learning Framework that doesn't compromise on flexibility, efficiency and portability.

burn.dev

License

Apache-2.0, MIT licenses found

Licenses found

11.6k stars 621 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 1,990 Commits
.cargo		.cargo
.github		.github
assets		assets
burn-book		burn-book
contributor-book		contributor-book
crates		crates
docs		docs
examples		examples
xtask		xtask
.gitignore		.gitignore
CITATION.cff		CITATION.cff
CODE-OF-CONDUCT.md		CODE-OF-CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE-APACHE		LICENSE-APACHE
LICENSE-MIT		LICENSE-MIT
NOTICES.md		NOTICES.md
POEM.md		POEM.md
README.md		README.md
_typos.toml		_typos.toml
codecov.yml		codecov.yml
deny.toml		deny.toml
rustfmt.toml		rustfmt.toml

Repository files navigation

Burn is a next generation Deep Learning Framework that doesn't compromise on
flexibility, efficiency and portability.

Performance

Because we believe the goal of a deep learning framework is to convert computation into usefulintelligence, we have made performance a core pillar of Burn. We strive to achieve top efficiency byleveraging multiple optimization techniques described below.

Click on each section for more details 👇

Automatic kernel fusion 💥

Using Burn means having your models optimized on any backend. When possible, we provide a way toautomatically and dynamically create custom kernels that minimize data relocation between differentmemory spaces, extremely useful when moving memory is the bottleneck.

As an example, you could write your own GELU activation function with the high level tensor api (seeRust code snippet below).

fngelu_custom<B:Backend,constD:usize>(x:Tensor<B,D>) ->Tensor<B,D>{let x = x.clone()*((x /SQRT_2).erf() +1);    x /2}

Then, at runtime, a custom low-level kernel will be automatically created for your specificimplementation and will rival a handcrafted GPU implementation. The kernel consists of about 60lines of WGSL WebGPU Shading Language,an extremely verbose lower level shader language you probably don't want to program your deeplearning models in!

Asynchronous execution ❤️‍🔥

Forfirst-party backends, an asynchronous execution styleis used, which allows to perform various optimizations, such as the previously mentioned automatickernel fusion.

Asynchronous execution also ensures that the normal execution of the framework does not block themodel computations, which implies that the framework overhead won't impact the speed of executionsignificantly. Conversely, the intense computations in the model do not interfere with theresponsiveness of the framework. For more information about our asynchronous backends, seethis blog post.

Thread-safe building blocks 🦞

Burn emphasizes thread safety by leveraging theownership system of Rust.With Burn, each module is the owner of its weights. It is therefore possible to send a module toanother thread for computing the gradients, then send the gradients to the main thread that canaggregate them, andvoilà, you get multi-device training.

This is a very different approach from what PyTorch does, where backpropagation actually mutates thegrad attribute of each tensor parameter. This is not a thread-safe operation and thereforerequires lower level synchronization primitives, seedistributed training for reference. Note thatthis is still very fast, but not compatible across different backends and quite hard to implement.

Intelligent memory management 🦀

One of the main roles of a deep learning framework is to reduce the amount of memory necessary torun models. The naive way of handling memory is that each tensor has its own memory space, which isallocated when the tensor is created then deallocated as the tensor gets out of scope. However,allocating and deallocating data is very costly, so a memory pool is often required to achieve goodthroughput. Burn offers an infrastructure that allows for easily creating and selecting memorymanagement strategies for backends. For more details on memory management in Burn, seethis blog post.

Another very important memory optimization of Burn is that we keep track of when a tensor can bemutated in-place just by using the ownership system well. Even though it is a rather small memoryoptimization on its own, it adds up considerably when training or running inference with largermodels and contributes to reduce the memory usage even more. For more information, seethis blog post about tensor handling.

Automatic kernel selection 🎯

A good deep learning framework should ensure that models run smoothly on all hardware. However, notall hardware share the same behavior in terms of execution speed. For instance, a matrixmultiplication kernel can be launched with many different parameters, which are highly sensitive tothe size of the matrices and the hardware. Using the wrong configuration could reduce the speed ofexecution by a large factor (10 times or even more in extreme cases), so choosing the right kernelsbecomes a priority.

With our home-made backends, we run benchmarks automatically and choose the best configuration forthe current hardware and matrix sizes with a reasonable caching strategy.

This adds a small overhead by increasing the warmup execution time, but stabilizes quickly after afew forward and backward passes, saving lots of time in the long run. Note that this feature isn'tmandatory, and can be disabled when cold starts are a priority over optimized throughput.

Hardware specific features 🔥

It is no secret that deep learning is mostly relying on matrix multiplication as its core operation,since this is how fully-connected neural networks are modeled.

More and more, hardware manufacturers optimize their chips specifically for matrix multiplicationworkloads. For instance, Nvidia has itsTensor Cores and today most cellphones have AI specializedchips. As of this moment, we support Tensor Cores with our LibTorch, Candle, CUDA, Metal and WGPU/SPIR-Vbackends, but not other accelerators yet. We hopethis issue gets resolved at some point to bringsupport to our WGPU backend.

Custom Backend Extension 🎒

Burn aims to be the most flexible deep learning framework. While it's crucial to maintaincompatibility with a wide variety of backends, Burn also provides the ability to extend thefunctionalities of a backend implementation to suit your personal modeling requirements.

This versatility is advantageous in numerous ways, such as supporting custom operations like flashattention or manually writing your own kernel for a specific backend to enhance performance. Seethis section in the Burn Book 🔥for more details.

Backend

Burn strives to be as fast as possible on as many hardwares as possible, with robust implementations.We believe this flexibility is crucial for modern needs where you may train your models in the cloud,then deploy on customer hardwares, which vary from user to user.

Supported Backends

Backend	Devices	Class
CUDA	NVIDIA GPUs	First-Party
ROCm	AMD GPUs	First-Party
Metal	Apple GPUs	First-Party
Vulkan	Most GPUs on Linux & Windows	First-Party
Wgpu	Most GPUs	First-Party
NdArray	Most CPUs	Third-Party
LibTorch	Most GPUs & CPUs	Third-Party
Candle	Nvidia, Apple GPUs & CPUs	Third-Party

Compared to other frameworks, Burn has a very different approach to supporting many backends. Bydesign, most code is generic over the Backend trait, which allows us to build Burn with swappablebackends. This makes composing backend possible, augmenting them with additional functionalitiessuch as autodifferentiation and automatic kernel fusion.

Autodiff: Backend decorator that brings backpropagation to any backend 🔄

Contrary to the aforementioned backends, Autodiff is actually a backenddecorator. This means thatit cannot exist by itself; it must encapsulate another backend.

The simple act of wrapping a base backend with Autodiff transparently equips it withautodifferentiation support, making it possible to call backward on your model.

use burn::backend::{Autodiff,Wgpu};use burn::tensor::{Distribution,Tensor};fnmain(){typeBackend =Autodiff<Wgpu>;let x:Tensor<Backend,2> =Tensor::random([32,32],Distribution::Default);let y:Tensor<Backend,2> =Tensor::random([32,32],Distribution::Default).require_grad();let tmp = x.clone() + y.clone();let tmp = tmp.matmul(x);let tmp = tmp.exp();let grads = tmp.backward();let y_grad = y.grad(&grads).unwrap();println!("{y_grad}");}

Of note, it is impossible to make the mistake of calling backward on a model that runs on a backendthat does not support autodiff (for inference), as this method is only offered by an Autodiffbackend.

See theAutodiff Backend README for more details.

Fusion: Backend decorator that brings kernel fusion to all first-party backends

This backend decorator enhances a backend with kernel fusion, provided that the inner backendsupports it. Note that you can compose this backend with other backend decorators such as Autodiff.For now, only the WGPU and CUDA backends have support for fused kernels.

use burn::backend::{Autodiff,Fusion,Wgpu};use burn::tensor::{Distribution,Tensor};fnmain(){typeBackend =Autodiff<Fusion<Wgpu>>;let x:Tensor<Backend,2> =Tensor::random([32,32],Distribution::Default);let y:Tensor<Backend,2> =Tensor::random([32,32],Distribution::Default).require_grad();let tmp = x.clone() + y.clone();let tmp = tmp.matmul(x);let tmp = tmp.exp();let grads = tmp.backward();let y_grad = y.grad(&grads).unwrap();println!("{y_grad}");}

Of note, we plan to implement automatic gradient checkpointing based on compute bound and memorybound operations, which will work gracefully with the fusion backend to make your code run evenfaster during training, seethis issue.

See theFusion Backend README for more details.

Router (Beta): Backend decorator that composes multiple backends into a single one

That backend simplifies hardware operability, if for instance you want to execute some operations on the CPU and other operations on the GPU.

use burn::tensor::{Distribution,Tensor};use burn::backend::{NdArray,Router,Wgpu, ndarray::NdArrayDevice, router::duo::MultiDevice, wgpu::WgpuDevice,};fnmain(){typeBackend =Router<(Wgpu,NdArray)>;let device_0 =MultiDevice::B1(WgpuDevice::DiscreteGpu(0));let device_1 =MultiDevice::B2(NdArrayDevice::Cpu);let tensor_gpu =Tensor::<Backend,2>::random([3,3], burn::tensor::Distribution::Default,&device_0);let tensor_cpu =Tensor::<Backend,2>::random([3,3], burn::tensor::Distribution::Default,&device_1);}

Remote (Beta): Backend decorator for remote backend execution, useful for distributed computations

That backend has two parts, one client and one server.The client sends tensor operations over the network to a remote compute backend.You can use any first-party backend as server in a single line of code:

fnmain_server(){// Start a server on port 3000.    burn::server::start::<burn::backend::Cuda>(Default::default(),3000);}fnmain_client(){// Create a client that communicate with the server on port 3000.use burn::backend::{Autodiff,RemoteBackend};typeBackend =Autodiff<RemoteDevice>;let device =RemoteDevice::new("ws://localhost:3000");let tensor_gpu =Tensor::<Backend,2>::random([3,3],Distribution::Default,&device);}

Training & Inference

The whole deep learning workflow is made easy with Burn, as you can monitor your training progresswith an ergonomic dashboard, and run inference everywhere from embedded devices to large GPUclusters.

Burn was built from the ground up with training and inference in mind. It's also worth noting howBurn, in comparison to frameworks like PyTorch, simplifies the transition from training todeployment, eliminating the need for code changes.

Click on the following sections to expand 👇

Training Dashboard 📈

As you can see in the previous video (click on the picture!), a new terminal UI dashboard based ontheRatatui crate allows users to follow their trainingwith ease without having to connect to any external application.

You can visualize your training and validation metrics updating in real-time and analyze thelifelong progression or recent history of any registered metrics using only the arrow keys. Breakfrom the training loop without crashing, allowing potential checkpoints to be fully written orimportant pieces of code to complete without interruption 🛡

ONNX Support 🐫

ONNX (Open Neural Network Exchange) is an open-standard format that exports both the architectureand the weights of a deep learning model.

Burn supports the importation of models that follow the ONNX standard so you can easily port a modelyou have written in another framework like TensorFlow or PyTorch to Burn to benefit from all theadvantages our framework offers.

Our ONNX support is further described inthis section of the Burn Book 🔥.

Note: This crate is in active development and currently supports alimited set of ONNX operators.

Importing PyTorch or Safetensors Models 🚚

You can load weights from PyTorch or Safetensors formats directly into your Burn-defined models. This makes it easy to reuse existing models while benefiting from Burn's performance and deployment features.

Learn more:

Inference in the Browser 🌐

Several of our backends can compile to Web Assembly: Candle and NdArray for CPU, and WGPU for GPU.This means that you can run inference directly within a browser. We provide several examples ofthis:

MNIST where you can draw digits and a small convnet tries tofind which one it is! 2️⃣ 7️⃣ 😰
Image Classification where you can upload images andclassify them! 🌄

Embedded:no_std support ⚙️

Burn's core components supportno_std. Thismeans it can run in bare metal environment such as embedded devices without an operating system.

As of now, only the NdArray backend can be used in ano_std environment.

Benchmarks

To evaluate performance across different backends and track improvements over time, we provide adedicated benchmarking suite.

Run and compare benchmarks usingburn-bench.

⚠️WarningWhen using one of thewgpu backends, you may encounter compilation errors related to recursive type evaluation. This is due to complex type nesting within thewgpu dependency chain.To resolve this issue, add the following line at the top of yourmain.rs orlib.rs file:
#![recursion_limit ="256"]
The default recursion limit (128) is often just below the required depth (typically 130-150) due to deeply nested associated types and trait bounds.

Getting Started

Just heard of Burn? You are at the right place! Just continue reading this section and we hope youcan get on board really quickly.

The Burn Book 🔥

To begin working effectively with Burn, it is crucial to understand its key components andphilosophy. This is why we highly recommend new users to read the first sections ofThe Burn Book 🔥. It provides detailed examples and explanationscovering every facet of the framework, including building blocks like tensors, modules, andoptimizers, all the way to advanced usage, like coding your own GPU kernels.

The project is constantly evolving, and we try as much as possible to keep the book up to datewith new additions. However, we might miss some details sometimes, so if you see something weird,let us know! We also gladly accept Pull Requests 😄

Examples 🙏

Let's start with a code snippet that shows how intuitive the framework is to use! In the following,we declare a neural network module with some parameters along with its forward pass.

use burn::nn;use burn::module::Module;use burn::tensor::backend::Backend;#[derive(Module,Debug)]pubstructPositionWiseFeedForward<B:Backend>{linear_inner: nn::Linear<B>,linear_outer: nn::Linear<B>,dropout: nn::Dropout,gelu: nn::Gelu,}impl<B:Backend>PositionWiseFeedForward<B>{pubfnforward<constD:usize>(&self,input:Tensor<B,D>) ->Tensor<B,D>{let x =self.linear_inner.forward(input);let x =self.gelu.forward(x);let x =self.dropout.forward(x);self.linear_outer.forward(x)}}

We have a somewhat large amount ofexamples in the repository that shows how to usethe framework in different scenarios.

Followingthe book:

Basic Workflow : Creates a custom CNNModule to train on the MNIST datasetand use for inference.
Custom Training Loop : Implements a basic training loop insteadof using theLearner.
Custom WGPU Kernel : Learn how to create your own customoperation with the WGPU backend.

Additional examples:

Custom CSV Dataset : Implements a dataset to parse CSV data for aregression task.
Regression : Trains a simple MLP on the California Housing datasetto predict the median house value for a district.
Custom Image Dataset : Trains a simple CNN on custom imagedataset following a simple folder structure.
Custom Renderer : Implements a custom renderer to display theLearner progress.
Image Classification Web : Image classification web browserdemo using Burn, WGPU and WebAssembly.
MNIST Inference on Web : An interactive MNIST inference demo inthe browser. The demo is availableonline.
MNIST Training : Demonstrates how to train a customModule (MLP) with theLearner configured to log metrics and keep training checkpoints.
Named Tensor : Performs operations with the experimentalNamedTensorfeature.
ONNX Import Inference : Imports an ONNX model pre-trained on MNIST toperform inference on a sample image with Burn.
PyTorch Import Inference : Imports a PyTorch model pre-trained onMNIST to perform inference on a sample image with Burn.
Text Classification : Trains a text classification transformermodel on the AG News or DbPedia dataset. The trained model can then be used to classify a textsample.
Text Generation : Trains a text generation transformer model on theDbPedia dataset.
Wasserstein GAN MNIST : Trains a WGAN model to generate new handwritten digitsbased on MNIST.

For more practical insights, you can clone the repository and run any of them directly on yourcomputer!

Pre-trained Models 🤖

We keep an updated and curated list of models and examples built with Burn, see thetracel-ai/models repository for more details.

Don't see the model you want? Don't hesitate to open an issue, and we may prioritize it. Built amodel using Burn and want to share it? You can also open a Pull Request and add your model under thecommunity section!

Why use Rust for Deep Learning? 🦀

Deep Learning is a special form of software where you need very high level abstractions as well asextremely fast execution time. Rust is the perfect candidate for that use case since it provideszero-cost abstractions to easily create neural network modules, and fine-grained control over memoryto optimize every detail.

It's important that a framework be easy to use at a high level so that its users can focus oninnovating in the AI field. However, since running models relies so heavily on computations,performance can't be neglected.

To this day, the mainstream solution to this problem has been to offer APIs in Python, but rely onbindings to low-level languages such as C/C++. This reduces portability, increases complexity andcreates frictions between researchers and engineers. We feel like Rust's approach to abstractionsmakes it versatile enough to tackle this two languages dichotomy.

Rust also comes with the Cargo package manager, which makes it incredibly easy to build, test, anddeploy from any environment, which is usually a pain in Python.

Although Rust has the reputation of being a difficult language at first, we strongly believe itleads to more reliable, bug-free solutions built faster (after some practice 😅)!

Deprecation Note
Since0.14.0, the internal structure for tensor data has changed. ThepreviousData struct was deprecated and officially removed since0.17.0 in favor of the newTensorData struct, which allows for more flexibility by storing the underlying data as bytes andkeeping the data type as a field. If you are usingData in your code, make sure to switch toTensorData.

Loading Model Records From Previous Versions⚠️

In the event that you are trying to load a model record saved in a version older than0.14.0, makesure to use a compatible version (0.14,0.15 or0.16) with therecord-backward-compatfeature flag.

features = [..., "record-backward-compat"]

Otherwise, the record won't be deserialized correctly and you will get an error message. This errorwill also point you to the backward compatible feature flag.

The backward compatibility was maintained for deserialization when loading records. Therefore, assoon as you have saved the record again it will be saved according to the new structure and you canupgrade back to the current version

Please note that binary formats are not backward compatible. Thus, you will need to load your recordin a previous version and save it in any of the other self-describing record format (e.g., using theNamedMpkFileRecorder) before using a compatible version (as described) with therecord-backward-compat feature flag.

Community

If you are excited about the project, don't hesitate to join ourDiscord! We try to be as welcoming as possible to everybody fromany background. You can ask your questions and share what you built with the community!

Contributing

Before contributing, please take a moment to review ourcode of conduct. It's also highlyrecommended to read thearchitecture overview,which explains some of our architectural decisions. Refer to ourcontributing guide for more details.

Status

Burn is currently in active development, and there will be breaking changes. While any resultingissues are likely to be easy to fix, there are no guarantees at this stage.

License

Burn is distributed under the terms of both the MIT license and the Apache License (Version 2.0).SeeLICENSE-APACHE andLICENSE-MIT for details. Opening a pullrequest is assumed to signal agreement with these licensing terms.

About

Burn is a next generation Deep Learning Framework that doesn't compromise on flexibility, efficiency and portability.

burn.dev

Topics

rust machine-learning deep-learning metal cross-platform neural-network vulkan cuda wasm pytorch scientific-computing ndarray tensor webgpu rocm autodiff onnx kernel-fusion

Resources

Readme

License

Apache-2.0, MIT licenses found

Releases22

v0.18.0 Latest

Jul 18, 2025

+ 21 releases

Packages

No packages published

Contributors199

+ 185 contributors

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

License

Licenses found

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Performance

Backend

Training & Inference

Benchmarks

Getting Started

Community

Status

License

About

Topics

Resources

License

Licenses found

Code of conduct

Uh oh!

Stars

Watchers

Forks

Releases22

Packages

Uh oh!

Contributors199

Uh oh!

Languages

Movatterモバイル変換

License

Licenses found

tracel-ai/burn

Folders and files

Latest commit

History

Repository files navigation

Performance

Backend

Training & Inference

Benchmarks

Getting Started

Community

Status

License

About

Topics

Resources

License

Licenses found

Code of conduct

Uh oh!

Stars

Watchers

Forks

Releases22

Packages0

Uh oh!

Contributors199

Uh oh!

Languages

Packages