- Notifications
You must be signed in to change notification settings - Fork964
Open
Description
Question
I am trying transformers.js with WebGPU. The performance is great, but I found that transformers.js returns a Float32Array where the model is quantized tofp16
:
constextractor=awaitpipeline("feature-extraction","bge-small-zh-v1.5",{device:"webgpu",dtype:"fp16",local_files_only:true,},);// ...constembeddings=awaitextractor(texts,{pooling:"mean",normalize:true});console.log(embeddings.data);// -> Float32Array(5120000) [...]
Since the model itself has only 16-bit precision, returning a Float32Array (instead ofFloat16Array that is supported in latest browsers) seems a waste of performance. Is this comment correct, and do we have plans to support Float16Array for better performance? Thanks!