calebwin/emuPublic

NotificationsYou must be signed in to change notification settings
Fork52
Star1.6k

The write-once-run-anywhere GPGPU library for Rust

License

MIT license

1.6k stars 52 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 90 Commits
.idea		.idea
docs		docs
em		em
emu_core		emu_core
emu_examples/arithmetic		emu_examples/arithmetic
emu_glsl		emu_glsl
emu_macro		emu_macro
emu_tests		emu_tests
.gitattributes		.gitattributes
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md

Repository files navigation

The old version of Emu (which used macros) is here.

Overview

Emu is a GPGPU library for Rust with a focus on portability, modularity, and performance.

It's a CUDA-esque compute-specific abstraction overWebGPU providing specific functionality to make WebGPU feel more like CUDA. Here's a quick run-down of highlight features...

Emu can run anywhere - Emu uses WebGPU to support DirectX, Metal, Vulkan (and also OpenGL and browser eventually) as compile targets. This allows Emu to run on pretty much any user interface including desktop, mobile, and browser. By movingheavy computations to the user's device, you can reduce system latency and improve privacy.
Emu makes compute easier - Emu makes WebGPU feel like CUDA. It does this by providing...
- DeviceBox<T> as a wrapper for data that lives on the GPU (thereby ensuring type-safe data movement)
- DevicePool as a no-config auto-managed pool of devices (similar to CUDA)
- trait Cache - a no-setup-required LRU cache of JITed compute kernels.
Emu is transparent - Emu is a fully transparent abstraction. This means, at any point, you can decide to remove the abstraction and work directly with WebGPU constructs with zero overhead. For example, if you want to mix Emu with WebGPU-based graphics, you can do that with zero overhead. You can also swap out the JIT compiler artifact cache with your own cache, manage the device pool if you wish, and define your own compile-to-SPIR-V compiler that interops with Emu.
Emu is asynchronous - Emu is fully asynchronous. Most API calls will be non-blocking and can be synchronized by calls toDeviceBox::get when data is read back from device.

An example

Here's a quick example of Emu. You can find more inemu_core/examples and most recent documentationhere.

First, we just import a bunch of stuff

use emu_glsl::*;use emu_core::prelude::*;use zerocopy::*;

We can define types of structures so that they can be safely serialized and deserialized to/from the GPU.

#[repr(C)]#[derive(AsBytes,FromBytes,Copy,Clone,Default,Debug)]structRectangle{x:u32,y:u32,w:i32,h:i32,}

For this example, we make this entire function async but in reality you will only want small blocks of code to be async (like a bunch of asynchronous memory transfers and computation) and these blocks will be sent off to an executor to execute. You definitely don't want to do something like this where you are blocking (by doing an entire compilation step) in your async code.

fnmain() ->Result<(),Box<dyn std::error::Error>>{    futures::executor::block_on(assert_device_pool_initialized());// first, we move a bunch of rectangles to the GPUletmut x:DeviceBox<[Rectangle]> =vec![Default::default();128].as_device_boxed()?;// then we compile some GLSL code using the GlslCompile compiler and// the GlobalCache for caching compiler artifactslet c =compile::<String,GlslCompile,_,GlobalCache>(GlslBuilder::new().set_entry_point_name("main").add_param_mut().set_code_with_glsl(r#"#version 450layout(local_size_x = 1) in; // our thread block size is 1, that is we only have 1 thread per blockstruct Rectangle {    uint x;    uint y;    int w;    int h;};// make sure to use only a single set and keep all your n parameters in n storage buffers in bindings 0 to n-1// you shouldn't use push constants or anything OTHER than storage buffers for passing stuff into the kernel// just use buffers with one buffer per bindinglayout(set = 0, binding = 0) buffer Rectangles {    Rectangle[] rectangles;}; // this is used as both input and output for convenienceRectangle flip(Rectangle r) {    r.x = r.x + r.w;    r.y = r.y + r.h;    r.w *= -1;    r.h *= -1;    return r;}// there should be only one entry point and it should be named "main"// ultimately, Emu has to kind of restrict how you use GLSL because it is compute focusedvoid main() {    uint index = gl_GlobalInvocationID.x; // this gives us the index in the x dimension of the thread space    rectangles[index] = flip(rectangles[index]);}            "#,))?.finish()?;// we spawn 128 threads (really 128 thread blocks)unsafe{spawn(128).launch(call!(c,&mut x));}// this is the Future we need to block on to get stuff to happen// everything else is non-blocking in the API (except stuff like compilation)println!("{:?}", futures::executor::block_on(x.get())?);Ok(())}

And last but certainly not least, we use an executor to execute.

fnmain(){    futures::executor::block_on(do_some_stuff()).expect("failed to do stuff on GPU");}

Built with Emu

Emu is relatively new but has already been used for GPU acceleration in a variety of projects.

Used intoil for GPU-accelerated linear algebra
Used inipl3hasher for hash collision finding
Used inbigbang for simulating gravitational acceleration (used older version of Emu)

Getting started

The latest stable version ison Crates.io. To start using Emu, simply add the following line to yourCargo.toml.

[dependencies]emu_core ="0.1.1"

To understand how to start using Emu, check outthe docs. If you have any questions, pleaseask in the Discord.

Contributing

Feedback, discussion, PRs would all very much be appreciated. Some relatively high-priority, non-API-breaking things that have yet to be implemented are the following in rough order of priority.

Enusre that WebGPU polling is done correctly in `DeviceBox::get
Add support for WGLSL as input, useNaga for shader compilation
Add WASM support inCargo.toml
Add benchmarks`
Reuse staging buffers between differentDeviceBoxes
Maybe use uniforms forDeviceBox<T> whenT is small (maybe)

If you are interested in any of these or anything else, please don't hesitate to open an issue on GitHub or discuss moreon Discord.

About

The write-once-run-anywhere GPGPU library for Rust

calebwin.github.io/emu

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

License

Folders and files

Latest commit

History

Repository files navigation

Overview

An example

Built with Emu

Getting started

Contributing

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Uh oh!

Contributors8

Uh oh!

Languages

Movatterモバイル変換

License

calebwin/emu

Folders and files

Latest commit

History

Repository files navigation

Overview

An example

Built with Emu

Getting started

Contributing

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Uh oh!

Contributors8

Uh oh!

Languages