NotificationsYou must be signed in to change notification settings
Fork952
Star14k

ONNX Runtime improvements (experimental native webgpu; fix iOS)#1231

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Jump to bottom

Merged

xenova merged 7 commits intohuggingface:ort-improvementsfromfs-eire:fs-eire/nodejs-support-native-webgpu-ep

Apr 25, 2025

Merged

ONNX Runtime improvements (experimental native webgpu; fix iOS)#1231

xenova merged 7 commits intohuggingface:ort-improvementsfromfs-eire:fs-eire/nodejs-support-native-webgpu-ep

Apr 25, 2025

Conversation

Copy link

Contributor

fs-eire commentedMar 13, 2025•
edited by xenova
Loading

This change allows using WebGPU in transformers.js with ORT Node.js binding.

Still doing testing (while the tests need this change)

Closes#1242

Copy link

AdamStrojek commentedMar 16, 2025

Wouldn't it be better to do the same thing as it is done in Onnx Runtime Web?

    if (apis.IS_WEBGPU_AVAILABLE) {        supportedDevices.push('webgpu');    }

Electron applications can have WebGPU enabled when terminal Node not. Alsoonnx-runtime-node provides only backers for native modules, whenonnx-runtime-web have bindings for WebGPU, so just adding supported devices will not work without switching runtime

Copy link

ContributorAuthor

fs-eire commentedMar 16, 2025

If I remember it correctly, IS_WEBGPU_AVAILABLE is checked against nagivator.gpu, which is only available in browser.

For electron, the rendering process is actually a "web" environment instead of "node"

Copy link

AdamStrojek commentedMar 16, 2025

Yes, you are correct,IS_WEBGPU_AVAILABLE is just a simple check againstnavigation.gpu. In theory, it is possible to install a 3rd-party package for WebGPU support in Node, but it is a complicated topic. Still, my comment is valid; I copied my example from a few lines higher in the same source file.

I recently did tests. Unfortunately, transformers.js are not detecting Electron applications correctly and mark them as Node applications, so it provides only CPU. I had a lot of trouble getting it running in an Electron app. Mostly, it was picky aboutpath andfs packages. If I changed the target platform to Node, it generated other problems. I'm preparing a new issue report for developers with my findings.

I already did tests with your branch, and this simple change didn’t enable WebGPU in Electron apps.

fs-eire added3 commits

March 21, 2025 10:30

customize the wasm paths

c39f3dc

update implementation

ea2b574

allow using 'webgpu' in nodejs binding

f15e632

fs-eire force-pushed thefs-eire/nodejs-support-native-webgpu-ep branch froma536b8d to2dbde16Compare

April 18, 2025 23:26

update version of onnxruntime-node

6cfeec3

fs-eire force-pushed thefs-eire/nodejs-support-native-webgpu-ep branch from2dbde16 to6cfeec3Compare

April 18, 2025 23:26

Copy link

ContributorAuthor

fs-eire commentedApr 18, 2025

Updated the version ofonnxruntime-node to 1.22.0-dev.20250418-c19a49615b. This version supports WebGPU on Windows and macOS.

Copy link

Collaborator

xenova commentedApr 19, 2025•
edited
Loading

Wow thanks@fs-eire! Very exciting!!! Does the browser packagehttps://www.npmjs.com/package/onnxruntime-web/v/1.22.0-dev.20250418-c19a49615b release also add anything of significance?

Copy link

ContributorAuthor

fs-eire commentedApr 19, 2025

Wow thanks@fs-eire! Very exciting!!! Does the browser packagehttps://www.npmjs.com/package/onnxruntime-web/v/1.22.0-dev.20250418-c19a49615b release also add anything of significance?

No.

BTW for WebGPU EP support in onnxruntime-web : There are still some perf issue for using WebGPU EP in a WebAssembly build. If you want to do conformance test only for WebGPU EP (eg. check correctness but not latency), I can offer you a private build of onnxruntime-web with WebGPU EP.

Copy link

Collaborator

xenova commentedApr 19, 2025

That would be great! Feel free to send via slack perhaps? Eventually, we can hook this into the Transformers.js CI to ensure correctness across all supported architectures.

Upgrade onnxruntime-web to same version as onnxruntime-node

0c3bc8d

Copy link

Collaborator

xenova commentedApr 19, 2025•
edited
Loading

I've been testing the webgpu EP for some llama/qwen models, and running into a few correctness issues.

Here's some code to help test/debug:

import{pipeline,TextStreamer}from"@huggingface/transformers";// Create a text generation pipelineconstgenerator=awaitpipeline("text-generation","onnx-community/ZR1-1.5B-ONNX",{dtype:"q4f16",device:"webgpu"},// device="cpu" works fine);// Define the list of messagesconstmessages=[{role:"system",content:"You are a helpful assistant."},{role:"user",content:"Write me a poem about Machine Learning."},];// Generate a responseconstoutput=awaitgenerator(messages,{max_new_tokens:512,do_sample:false,streamer:newTextStreamer(generator.tokenizer,{skip_prompt:true,skip_special_tokens:true}),});console.log(output[0].generated_text.at(-1).content);

AdamStrojek mentioned this pull request

Apr 19, 2025

Allow to choose ONNX Runtime in Electron App#1240

Open

Copy link

Collaborator

xenova commentedApr 19, 2025

I can confirm that q4 (instead of q4f16) works correctly, so it looks to be an issue with the f16 implementation.

Update list of supported devices

751e702

Copy link

Contributor

guschmue commentedApr 21, 2025

I can confirm that q4 (instead of q4f16) works correctly, so it looks to be an issue with the f16 implementation.

for webgpu-ep / DeepSeek-R1-Distill-Qwen-1.5B we know about some open issue when GQA takes the FA2 path.
Don't happen on all GPU's but I can reproduce it on nvidia.

If ZR1-1.5B-ONNX is similar to DeepSeek-R1-Distill-Qwen-1.5B, might be the same. Not tried DeepSeek-R1-Distill-Qwen-1.5B with fp32. Let me check on this.

Copy link

Contributor

guschmue commentedApr 21, 2025

looks like the same issue as deepseek when GQA uses FA2 with fp16. fp32 seems ok.
I'll put this high on my list to look at.

Copy link

Collaborator

xenova commentedApr 22, 2025

Great, thanks@guschmue!

xenova mentioned this pull request

Apr 25, 2025

customize the wasm paths#1250

Merged

Merge branch 'pr/1250' into pr/1231

8f4cc0c

xenova changed the title~~[WIP] allow using 'webgpu' in nodejs binding~~ONNX Runtime improvements (experimental native webgpu; fix iOS)

Apr 25, 2025

xenova changed the base branch frommain toort-improvements

April 25, 2025 22:43

xenova marked this pull request as ready for review

April 25, 2025 22:43

Copy link

Collaborator

xenova commentedApr 25, 2025

I'm accumulating all these changes intohttps://github.com/huggingface/transformers.js/tree/ort-improvements to make development and testing a bit easier (many version bumps and ort-specific changes)

xenova merged commit747a04d intohuggingface:ort-improvements

Apr 25, 2025

Labels

None yet

4 participants

Movatterモバイル変換

ONNX Runtime improvements (experimental native webgpu; fix iOS)#1231

ONNX Runtime improvements (experimental native webgpu; fix iOS)#1231

Uh oh!

Conversation

fs-eire commentedMar 13, 2025• edited by xenovaLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Uh oh!

AdamStrojek commentedMar 16, 2025

Uh oh!

fs-eire commentedMar 16, 2025

Uh oh!

AdamStrojek commentedMar 16, 2025

Uh oh!

fs-eire commentedApr 18, 2025

Uh oh!

xenova commentedApr 19, 2025• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Uh oh!

fs-eire commentedApr 19, 2025

Uh oh!

xenova commentedApr 19, 2025

Uh oh!

xenova commentedApr 19, 2025• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Uh oh!

xenova commentedApr 19, 2025

Uh oh!

guschmue commentedApr 21, 2025

Uh oh!

guschmue commentedApr 21, 2025

Uh oh!

xenova commentedApr 22, 2025

Uh oh!

xenova commentedApr 25, 2025

Uh oh!

Uh oh!

fs-eire commentedMar 13, 2025•
edited by xenova
Loading

xenova commentedApr 19, 2025•
edited
Loading

xenova commentedApr 19, 2025•
edited
Loading