Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
/zmlPublic

Any model. Any hardware. Zero compromise. Built with@ziglang /@openxla / MLIR /@bazelbuild

License

NotificationsYou must be signed in to change notification settings

zml/zml

Repository files navigation

Bonjour 👋

At ZML, we are creating exciting AI products on top of our high-performanceAI inference stack. Our stack is built for production, using the amazingZig language,MLIR, and thepower ofBazel.

Take me straight togetting started orgive me a taste 🥐!

 

We're happy to share!

We're very happy to share our inference stack with the World and hope it allowsyou, too, to build cool and exciting AI projects.

To give you a glimpse of what you can do with ZML, here is an early demo:

It shows a prototype running a LLaMA2 model sharded on 1 NVIDIA RTX 4090, 1 AMD6800XT, and 1 Google Cloud TPU v2. All accelerators were hosted in differentlocations, with activations being passed over a VPN.

All processes used the same model code, cross-compiled on a Mac, and copied ontothe servers.

For more inspiration, see also the examples below or check out theexamples folder.

Getting started

Prerequisites

We usebazel to build ZML and its dependencies. The only prerequisite isbazel, which we recommend to download throughbazelisk, a version managerforbazel.

Please note: If you do not wish to installbazel system-wide, we provideexamples/bazel.sh which downloads it to your home folderand runs it.

Install Bazel (recommended):

macOS

brew install bazelisk

Linux

curl -L -o /usr/local/bin/bazel 'https://github.com/bazelbuild/bazelisk/releases/download/v1.25.0/bazelisk-linux-amd64'chmod +x /usr/local/bin/bazel

Run a pre-packaged model

We have implemented a variety of example models in ZML. See our referenceimplementations in theexamples folder.

MNIST

Theclassic handwritten digitsrecognition task. The model is tasked to recognize a handwritten digit, whichhas been converted to a 28x28 pixel monochrome image.Bazel will download apre-trained model, and the test dataset. The program will load the model,compile it, and classify a randomly picked example from the test dataset.

On the command line:

cd examplesbazel run -c opt //mnist# or./bazel.sh run -c opt //mnist

Meta Llama 3.1 8B

This model has restrictions, seehere. Itrequiresapproval from Meta on Huggingface, which can take a few hours to get granted.

While waiting, you can already generate an access token to log into HuggingFacefrombazel; seehere.

Once you've been granted access, you're ready to download a gated model likeMeta-Llama-3.1-8B-Instruct!

# requires token in $HOME/.cache/huggingface/token, as created by the# `huggingface-cli login` command, or the `HUGGINGFACE_TOKEN` environment variable.cd examplesbazel run -c opt //llama:Llama-3.1-8B-Instructbazel run -c opt //llama:Llama-3.1-8B-Instruct -- --prompt="What is the capital of France?"

You can also tryLlama-3.1-70B-Instruct if you have enough memory.

Meta Llama 3.2 1B

Like the 8B model above, this model also requires approval. Seehere for access requirements.

cd examplesbazel run -c opt //llama:Llama-3.2-1B-Instructbazel run -c opt //llama:Llama-3.2-1B-Instruct -- --prompt="What is the capital of France?"

For a larger 3.2 model, you can also tryLlama-3.2-3B-Instruct.

Running Models on GPU / TPU

You can compile models for accelerator runtimes by appending one or more of thefollowing arguments to the command line when compiling / running a model:

  • NVIDIA CUDA:--@zml//runtimes:cuda=true
  • AMD RoCM:--@zml//runtimes:rocm=true
  • Google TPU:--@zml//runtimes:tpu=true
  • AWS Trainium/Inferentia 2:--@zml//runtimes:neuron=true
  • AVOID CPU:--@zml//runtimes:cpu=false

The latter, avoiding compilation for CPU, cuts down compilation time.

So, to run the OpenLLama model from above on your host sporting an NVIDIA GPU,run the following:

cd examplesbazel run -c opt //llama:Llama-3.2-1B-Instruct             \          --@zml//runtimes:cuda=true                       \          -- --prompt="What is the capital of France?"

Run Tests

bazel test //zml:test

A taste of ZML

MNIST

conststd=@import("std");constzml=@import("zml");/// Model definitionconstMnist=struct {fc1:Layer,fc2:Layer,constLayer=struct {weight:zml.Tensor,bias:zml.Tensor,pubfnforward(self:Layer,input:zml.Tensor)zml.Tensor {returnself.weight.matmul(input).add(self.bias).relu();        }    };/// just two linear layers + relu activationpubfnforward(self:Mnist,input:zml.Tensor)zml.Tensor {std.log.info("Compiling for target: {s}", .{@tagName(input.getContext().target())});varx=input.flattenAll().convert(.f32);constlayers: []constLayer= &.{self.fc1,self.fc2 };for (layers)|layer| {x=zml.call(layer,.forward, .{x});        }returnx.argMax(0,.u8).indices;    }};

Tagged Tensors

constSdpa=struct {pubfnforward(_:Sdpa,ctx:*zml.Context,q_:zml.Tensor,k_:zml.Tensor,v_:zml.Tensor)zml.Tensor {constq=q_.withTags(.{.b,.h,.q,.hd });constk=k_.withTags(.{.b,.h,.k,.hd });constv=v_.withTags(.{.b,.h,.k,.hd });constattn_mask=zml.nn.causalAttnMask(ctx, .{ .q=q.dim(.q), .k=k.dim(.k) },q.dtype(),null);returnzml.nn.sdpa(ctx,q,k,v, .{ .attn_mask=attn_mask });    }};

Where to go next:

You might want to check out moreexamples, read through thedocumentation directly on GitHub, or, for the full renderingexperience, browse theonline documentation with included API reference.

Contributing

Seehere.

License

ZML is licensed under theApache 2.0 license.

Thanks to our contributors


[8]ページ先頭

©2009-2025 Movatter.jp