- Notifications
You must be signed in to change notification settings - Fork952
ONNX Runtime improvements (experimental native webgpu; fix iOS)#1231
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
ONNX Runtime improvements (experimental native webgpu; fix iOS)#1231
Uh oh!
There was an error while loading.Please reload this page.
Conversation
AdamStrojek commentedMar 16, 2025
Wouldn't it be better to do the same thing as it is done in Onnx Runtime Web?
Electron applications can have WebGPU enabled when terminal Node not. Also |
If I remember it correctly, IS_WEBGPU_AVAILABLE is checked against nagivator.gpu, which is only available in browser. For electron, the rendering process is actually a "web" environment instead of "node" |
AdamStrojek commentedMar 16, 2025
Yes, you are correct, I recently did tests. Unfortunately, transformers.js are not detecting Electron applications correctly and mark them as Node applications, so it provides only CPU. I had a lot of trouble getting it running in an Electron app. Mostly, it was picky about I already did tests with your branch, and this simple change didn’t enable WebGPU in Electron apps. |
a536b8d
to2dbde16
Compare2dbde16
to6cfeec3
CompareUpdated the version of |
xenova commentedApr 19, 2025 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
Wow thanks@fs-eire! Very exciting!!! Does the browser packagehttps://www.npmjs.com/package/onnxruntime-web/v/1.22.0-dev.20250418-c19a49615b release also add anything of significance? |
No. BTW for WebGPU EP support in onnxruntime-web : There are still some perf issue for using WebGPU EP in a WebAssembly build. If you want to do conformance test only for WebGPU EP (eg. check correctness but not latency), I can offer you a private build of onnxruntime-web with WebGPU EP. |
That would be great! Feel free to send via slack perhaps? Eventually, we can hook this into the Transformers.js CI to ensure correctness across all supported architectures. |
xenova commentedApr 19, 2025 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
I've been testing the webgpu EP for some llama/qwen models, and running into a few correctness issues. Here's some code to help test/debug: import{pipeline,TextStreamer}from"@huggingface/transformers";// Create a text generation pipelineconstgenerator=awaitpipeline("text-generation","onnx-community/ZR1-1.5B-ONNX",{dtype:"q4f16",device:"webgpu"},// device="cpu" works fine);// Define the list of messagesconstmessages=[{role:"system",content:"You are a helpful assistant."},{role:"user",content:"Write me a poem about Machine Learning."},];// Generate a responseconstoutput=awaitgenerator(messages,{max_new_tokens:512,do_sample:false,streamer:newTextStreamer(generator.tokenizer,{skip_prompt:true,skip_special_tokens:true}),});console.log(output[0].generated_text.at(-1).content); |
I can confirm that q4 (instead of q4f16) works correctly, so it looks to be an issue with the f16 implementation. |
for webgpu-ep / DeepSeek-R1-Distill-Qwen-1.5B we know about some open issue when GQA takes the FA2 path. If ZR1-1.5B-ONNX is similar to DeepSeek-R1-Distill-Qwen-1.5B, might be the same. Not tried DeepSeek-R1-Distill-Qwen-1.5B with fp32. Let me check on this. |
looks like the same issue as deepseek when GQA uses FA2 with fp16. fp32 seems ok. |
Great, thanks@guschmue! |
I'm accumulating all these changes intohttps://github.com/huggingface/transformers.js/tree/ort-improvements to make development and testing a bit easier (many version bumps and ort-specific changes) |
Uh oh!
There was an error while loading.Please reload this page.
This change allows using WebGPU in transformers.js with ORT Node.js binding.
Still doing testing (while the tests need this change)
Closes#1242