zml/zmlPublic

NotificationsYou must be signed in to change notification settings
Fork85
Star2.4k

Any model. Any hardware. Zero compromise. Built with@ziglang /@openxla / MLIR /@bazelbuild

License

Apache-2.0 license

2.4k stars 85 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 211 Commits
.github		.github
.zed		.zed
async		async
bazel		bazel
docs		docs
examples		examples
ffi		ffi
mlir		mlir
pjrt		pjrt
platforms		platforms
runtimes		runtimes
stdx		stdx
third_party		third_party
tools		tools
zml		zml
.bazelignore		.bazelignore
.bazelrc		.bazelrc
.bazelrc-examples		.bazelrc-examples
.bazelrc-zml		.bazelrc-zml
.bazelversion		.bazelversion
.gitignore		.gitignore
.nvim.lua		.nvim.lua
.vscode		.vscode
BUILD.bazel		BUILD.bazel
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
MODULE.bazel		MODULE.bazel
MODULE.bazel.lock		MODULE.bazel.lock
README.md		README.md
build.zig		build.zig
platform_mappings		platform_mappings

Repository files navigation

Website |Getting Started |Documentation |Discord |Contributing

Bonjour 👋

At ZML, we are creating exciting AI products on top of our high-performanceAI inference stack. Our stack is built for production, using the amazingZig language,MLIR, and thepower ofBazel.

Take me straight togetting started orgive me a taste 🥐!

We're happy to share!

We're very happy to share our inference stack with the World and hope it allowsyou, too, to build cool and exciting AI projects.

To give you a glimpse of what you can do with ZML, here is an early demo:

It shows a prototype running a LLaMA2 model sharded on 1 NVIDIA RTX 4090, 1 AMD6800XT, and 1 Google Cloud TPU v2. All accelerators were hosted in differentlocations, with activations being passed over a VPN.

All processes used the same model code, cross-compiled on a Mac, and copied ontothe servers.

For more inspiration, see also the examples below or check out theexamples folder.

Getting started

Prerequisites

We usebazel to build ZML and its dependencies. The only prerequisite isbazel, which we recommend to download throughbazelisk, a version managerforbazel.

Please note: If you do not wish to installbazel system-wide, we provideexamples/bazel.sh which downloads it to your home folderand runs it.

Install Bazel (recommended):

macOS

brew install bazelisk

Linux

curl -L -o /usr/local/bin/bazel 'https://github.com/bazelbuild/bazelisk/releases/download/v1.25.0/bazelisk-linux-amd64'chmod +x /usr/local/bin/bazel

Run a pre-packaged model

We have implemented a variety of example models in ZML. See our referenceimplementations in theexamples folder.

MNIST

Theclassic handwritten digitsrecognition task. The model is tasked to recognize a handwritten digit, whichhas been converted to a 28x28 pixel monochrome image.Bazel will download apre-trained model, and the test dataset. The program will load the model,compile it, and classify a randomly picked example from the test dataset.

On the command line:

cd examplesbazel run --config=release //mnist# or./bazel.sh run --config=release //mnist

Meta Llama 3.1 8B

This model has restrictions, seehere. Itrequiresapproval from Meta on Huggingface, which can take a few hours to get granted.

While waiting, you can already generate an access token to log into HuggingFacefrombazel; seehere.

Once you've been granted access, you're ready to download a gated model likeMeta-Llama-3.1-8B-Instruct!

# requires token in $HOME/.cache/huggingface/token, as created by the# `huggingface-cli login` command, or the `HUGGINGFACE_TOKEN` environment variable.cd examplesbazel run --config=release //llama:Llama-3.1-8B-Instructbazel run --config=release //llama:Llama-3.1-8B-Instruct -- --prompt="What is the capital of France?"

You can also tryLlama-3.1-70B-Instruct if you have enough memory.

Meta Llama 3.2 1B

Like the 8B model above, this model also requires approval. Seehere for access requirements.

cd examplesbazel run --config=release //llama:Llama-3.2-1B-Instructbazel run --config=release //llama:Llama-3.2-1B-Instruct -- --prompt="What is the capital of France?"

For a larger 3.2 model, you can also tryLlama-3.2-3B-Instruct.

Running Models on GPU / TPU

You can compile models for accelerator runtimes by appending one or more of thefollowing arguments to the command line when compiling / running a model:

NVIDIA CUDA:--@zml//runtimes:cuda=true
AMD RoCM:--@zml//runtimes:rocm=true
Google TPU:--@zml//runtimes:tpu=true
AWS Trainium/Inferentia 2:--@zml//runtimes:neuron=true
AVOID CPU:--@zml//runtimes:cpu=false

The latter, avoiding compilation for CPU, cuts down compilation time.

So, to run the OpenLLama model from above on your host sporting an NVIDIA GPU,run the following:

cd examplesbazel run --config=release //llama:Llama-3.2-1B-Instruct             \          --@zml//runtimes:cuda=true                       \          -- --prompt="What is the capital of France?"

Run Tests

bazel test //zml:test

A taste of ZML

MNIST

conststd=@import("std");constzml=@import("zml");/// Model definitionconstMnist=struct {fc1:Layer,fc2:Layer,constLayer=struct {weight:zml.Tensor,bias:zml.Tensor,pubfnforward(self:Layer,input:zml.Tensor)zml.Tensor {returnself.weight.matmul(input).add(self.bias).relu();        }    };/// just two linear layers + relu activationpubfnforward(self:Mnist,input:zml.Tensor)zml.Tensor {std.log.info("Compiling for target: {s}", .{@tagName(input.getContext().target())});varx=input.flattenAll().convert(.f32);constlayers: []constLayer= &.{self.fc1,self.fc2 };for (layers)|layer| {x=zml.call(layer,.forward, .{x});        }returnx.argMax(0,.u8).indices;    }};

Tagged Tensors

constSdpa=struct {pubfnforward(_:Sdpa,ctx:*zml.Context,q_:zml.Tensor,k_:zml.Tensor,v_:zml.Tensor)zml.Tensor {constq=q_.withTags(.{.b,.h,.q,.hd });constk=k_.withTags(.{.b,.h,.k,.hd });constv=v_.withTags(.{.b,.h,.k,.hd });constattn_mask=zml.nn.causalAttnMask(ctx, .{ .q=q.dim(.q), .k=k.dim(.k) },q.dtype(),null);returnzml.nn.sdpa(ctx,q,k,v, .{ .attn_mask=attn_mask });    }};

Where to go next:

You might want to check out moreexamples, read through thedocumentation directly on GitHub, or, for the full renderingexperience, browse theonline documentation with included API reference.

Contributing

Seehere.

License

ZML is licensed under theApache 2.0 license.

Thanks to our contributors

About

Any model. Any hardware. Zero compromise. Built with@ziglang /@openxla / MLIR /@bazelbuild

docs.zml.ai

Releases

No releases published

Contributors16

+ 2 contributors

Movatterモバイル変換

License

zml/zml

Folders and files

Latest commit

History

Repository files navigation

Bonjour 👋

We're happy to share!

Getting started

Prerequisites

macOS

Linux

Run a pre-packaged model

MNIST

Meta Llama 3.1 8B

Meta Llama 3.2 1B

Running Models on GPU / TPU

Run Tests

A taste of ZML

MNIST

Tagged Tensors

Where to go next:

Contributing

License

Thanks to our contributors

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Uh oh!

Contributors16

Uh oh!

Languages