- Notifications
You must be signed in to change notification settings - Fork1k
Releases: huggingface/transformers.js
3.7.6
4c908ecWhat's new?
- Fix issue when
temperature=0anddo_sample=trueby@nico-martin in#1431 - Fix type errors by@nico-martin in#1436
- Add support for NanoChat in#1441
- Add support for Parakeet CTC in#1440
New Contributors
- @nico-martin made their first contribution in#1431
Full Changelog:3.7.5...3.7.6
Assets2
Uh oh!
There was an error while loading.Please reload this page.
3.7.5
c670bb9Assets2
Uh oh!
There was an error while loading.Please reload this page.
3.7.4
d6b3998What's new?
- Correctly assign logits warpers in
_get_logits_processorin#1422
Full Changelog:3.7.3...3.7.4
Assets2
Uh oh!
There was an error while loading.Please reload this page.
3.7.2
28852a2What's new?
Add support for DINOv3 in#1390
Seehere for the full list of supported models.
Example: Compute image embeddings
import{pipeline}from'@huggingface/transformers';constimage_feature_extractor=awaitpipeline('image-feature-extraction','onnx-community/dinov3-vits16-pretrain-lvd1689m-ONNX',);consturl='https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/cats.png';constfeatures=awaitimage_feature_extractor(url);console.log(features);
Try it out using ouronline demo:
dinov3.mp4
Full Changelog:3.7.1...3.7.2
Assets2
Uh oh!
There was an error while loading.Please reload this page.
3.7.1
8d6c400Assets2
Uh oh!
There was an error while loading.Please reload this page.
3.7.0
0feb5b7🚀 Transformers.js v3.7 — Voxtral, LFM2, ModernBERT Decoder
🤖 New models
This update adds support for 3 new architectures:
Voxtral
Voxtral Mini is an enhancement ofMinistral 3B, incorporating state-of-the-art audio input capabilities while retaining best-in-class text performance. It excels at speech transcription, translation and audio understanding. ONNX weights for Voxtral-Mini-3B-2507 can be foundhere. Learn more about Voxtral in the releaseblog post.
Try it out with ouronline demo:
Voxtral.WebGPU.demo.mp4
Example: Audio transcription
import{VoxtralForConditionalGeneration,VoxtralProcessor,TextStreamer,read_audio}from"@huggingface/transformers";// Load the processor and modelconstmodel_id="onnx-community/Voxtral-Mini-3B-2507-ONNX";constprocessor=awaitVoxtralProcessor.from_pretrained(model_id);constmodel=awaitVoxtralForConditionalGeneration.from_pretrained(model_id,{dtype:{embed_tokens:"fp16",// "fp32", "fp16", "q8", "q4"audio_encoder:"q4",// "fp32", "fp16", "q8", "q4", "q4f16"decoder_model_merged:"q4",// "q4", "q4f16"},device:"webgpu",},);// Prepare the conversationconstconversation=[{"role":"user","content":[{"type":"audio"},{"type":"text","text":"lang:en [TRANSCRIBE]"},],}];consttext=processor.apply_chat_template(conversation,{tokenize:false});constaudio=awaitread_audio("http://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/mlk.wav",16000);constinputs=awaitprocessor(text,audio);// Generate the responseconstgenerated_ids=awaitmodel.generate({ ...inputs,max_new_tokens:256,streamer:newTextStreamer(processor.tokenizer,{skip_special_tokens:true,skip_prompt:true}),});// Decode the generated tokensconstnew_tokens=generated_ids.slice(null,[inputs.input_ids.dims.at(-1),null]);constgenerated_texts=processor.batch_decode(new_tokens,{skip_special_tokens:true},);console.log(generated_texts[0]);// I have a dream that one day this nation will rise up and live out the true meaning of its creed.
LFM2
LFM2 is a new generation of hybrid models developed byLiquid AI, specifically designed for edge AI and on-device deployment. It sets a new standard in terms of quality, speed, and memory efficiency.
The models, which we have converted to ONNX, come in three different sizes:350M,700M, and1.2B parameters.
Example: Text-generation with LFM2-350M:
import{pipeline,TextStreamer}from"@huggingface/transformers";// Create a text generation pipelineconstgenerator=awaitpipeline("text-generation","onnx-community/LFM2-350M-ONNX",{dtype:"q4"},);// Define the list of messagesconstmessages=[{role:"system",content:"You are a helpful assistant."},{role:"user",content:"What is the capital of France?"},];// Generate a responseconstoutput=awaitgenerator(messages,{max_new_tokens:512,do_sample:false,streamer:newTextStreamer(generator.tokenizer,{skip_prompt:true,skip_special_tokens:true}),});console.log(output[0].generated_text.at(-1).content);// The capital of France is Paris. It is a vibrant city known for its historical landmarks, art, fashion, and gastronomy.
ModernBERT Decoder
These models form part of the Ettin suite: the first collection of paired encoder-only and decoder-only models trained with identical data, architecture, and training recipes. Ettin enables fair comparisons between encoder and decoder architectures across multiple scales, providing state-of-the-art performance for open-data models in their respective size categories.
The list of supported models can be foundhere.
import{pipeline,TextStreamer}from"@huggingface/transformers";// Create a text generation pipelineconstgenerator=awaitpipeline("text-generation","onnx-community/ettin-decoder-150m-ONNX",{dtype:"fp32"},);// Generate a responseconsttext="Q: What is the capital of France?\nA:";constoutput=awaitgenerator(text,{max_new_tokens:128,streamer:newTextStreamer(generator.tokenizer,{skip_prompt:true,skip_special_tokens:true}),});console.log(output[0].generated_text);
Added in#1371.
🛠️ Other improvements
- Add special tokens in text-generation pipeline if tokenizer requires in#1370
Full Changelog:3.6.3...3.7.0
Assets2
Uh oh!
There was an error while loading.Please reload this page.
3.6.3
467f59cWhat's new?
- Bump
@huggingface/jinjato version 0.5.1 for new chat template functionality in#1364
Full Changelog:3.6.2...3.6.3
Assets2
Uh oh!
There was an error while loading.Please reload this page.
3.6.2
6f026f3What's new?
Add support for SmolLM3 in#1359
SmolLM3 is a 3B parameter language model designed to push the boundaries of small models. It supports 6 languages, advanced reasoning and long context. SmolLM3 is a fully open model that offers strong performance at the 3B–4B scale.
Example:
import{pipeline,TextStreamer}from"@huggingface/transformers";// Create a text generation pipelineconstgenerator=awaitpipeline("text-generation","HuggingFaceTB/SmolLM3-3B-ONNX",{dtype:"q4f16"},);// Define the list of messagesconstmessages=[{role:"system",content:"You are SmolLM, a language model created by Hugging Face. If asked by the user, here is some information about you: SmolLM has 3 billion parameters and can converse in 6 languages: English, Spanish, German, French, Italian, and Portuguese. SmolLM is a fully open model and was trained on a diverse mix of public datasets./think"},{role:"user",content:"Solve the equation x^2 - 3x + 2 = 0"},];// Generate a responseconstoutput=awaitgenerator(messages,{max_new_tokens:1024,do_sample:false,streamer:newTextStreamer(generator.tokenizer,{skip_prompt:true,skip_special_tokens:true}),});console.log(output[0].generated_text.at(-1).content);
Add support for ERNIE-4.5 in#1354
Example:import{pipeline,TextStreamer}from"@huggingface/transformers";// Create a text generation pipelineconstgenerator=awaitpipeline("text-generation","onnx-community/ERNIE-4.5-0.3B-ONNX",{dtype:"fp32"},// Options: "fp32", "fp16", "q8", "q4", "q4f16");// Define the list of messagesconstmessages=[{role:"system",content:"You are a helpful assistant."},{role:"user",content:"What is the capital of France?"},];// Generate a responseconstoutput=awaitgenerator(messages,{max_new_tokens:512,do_sample:false,streamer:newTextStreamer(generator.tokenizer,{skip_prompt:true,skip_special_tokens:true}),});console.log(output[0].generated_text.at(-1).content);// The capital of France is Paris.
Full Changelog:3.6.1...3.6.2
Assets2
Uh oh!
There was an error while loading.Please reload this page.
3.6.1
fc2847cWhat's new?
Add support for NeoBERT in#1350
import{pipeline}from"@huggingface/transformers";// Create feature extraction pipelineconstextractor=awaitpipeline("feature-extraction","onnx-community/NeoBERT-ONNX");// Compute embeddingsconsttext="NeoBERT is the most efficient model of its kind!";constembedding=awaitextractor(text,{pooling:"cls"});console.log(embedding.dims);// [1, 768]
Improve webworker detection to support ServiceWorker and SharedWorker by@aungKhantPaing in#1346
Fix optional
from_pretrainedtypes in#1352
New Contributors
- @aungKhantPaing made their first contribution in#1346
- @fidoriel made their first contribution in#1351
Full Changelog:3.6.0...3.6.1
Assets2
Uh oh!
There was an error while loading.Please reload this page.
3.6.0
7b45042🚀 Transformers.js v3.6 — Gemma 3n, Qwen3-Embedding, Llava-Qwen2
🤖 New models
Gemma 3n
Gemma 3n, which was announced as apreview during Google I/O, is a model that is designed from the ground up torun locally on your hardware. On top of that, it's nativelymultimodal, supporting image, text, audio, and video inputs 🤯
Gemma 3n models have multiple architecture innovations:
- They are available in two sizes based oneffective parameters. While the raw parameter count of this model is 6B, the architecture design allows the model to be run with a memory footprint comparable to a traditional 2B model by offloading low-utilization matrices from the accelerator.
- They use a MatFormer architecture that allows nesting sub-models within theE4B model. We provide one sub-model (this model repository), or you can access a spectrum of custom-sized models using theMix-and-Match method.
Learn more about these techniques in thetechnical blog post and theGemma documentation.
As part of the release, we are releasing ONNX weights for thegemma-3n-E2B-it variant (link), making it compatible with Transformers.js:
Warning
Due to the model's large size, we currently only support Node.js, Deno, and Bun execution.
In-browser WebGPU support is actively being worked on, so stay tuned for an update!
Example: Caption an image
import{AutoProcessor,AutoModelForImageTextToText,load_image,TextStreamer,}from"@huggingface/transformers";// Load processor and modelconstmodel_id="onnx-community/gemma-3n-E2B-it-ONNX";constprocessor=awaitAutoProcessor.from_pretrained(model_id);constmodel=awaitAutoModelForImageTextToText.from_pretrained(model_id,{dtype:{embed_tokens:"q8",audio_encoder:"q8",vision_encoder:"fp16",decoder_model_merged:"q4",},device:"cpu",// NOTE: WebGPU support coming soon!});// Prepare promptconstmessages=[{role:"user",content:[{type:"image"},{type:"text",text:"Describe this image in detail."},],},];constprompt=processor.apply_chat_template(messages,{add_generation_prompt:true,});// Prepare inputsconsturl="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/bee.jpg";constimage=awaitload_image(url);constaudio=null;constinputs=awaitprocessor(prompt,image,audio,{add_special_tokens:false,});// Generate outputconstoutputs=awaitmodel.generate({ ...inputs,max_new_tokens:512,do_sample:false,streamer:newTextStreamer(processor.tokenizer,{skip_prompt:true,skip_special_tokens:false,// callback_function: (text) => { /* Do something with the streamed output */ },}),});// Decode outputconstdecoded=processor.batch_decode(outputs.slice(null,[inputs.input_ids.dims.at(-1),null]),{skip_special_tokens:true},);console.log(decoded[0]);
See example output
The image is a close-up, slightly macro shot of a cluster of vibrant pink cosmos flowers in full bloom. The flowers are the focal point, with their delicate, slightly ruffled petals radiating outwards. They have a soft, almost pastel pink hue, and their edges are subtly veined. A small, dark-colored bee is actively visiting one of the pink flowers, its body positioned near the center of the bloom. The bee appears to be collecting pollen or nectar. The flowers are attached to slender, brownish-green stems, and some of the surrounding foliage is visible in a blurred background, suggesting a natural outdoor setting. There are also hints of other flowers in the background, including some red ones, adding a touch of contrast to the pink. The lighting in the image seems to be natural daylight, casting soft shadows and highlighting the textures of the petals and the bee. The overall impression is one of delicate beauty and the gentle activity of nature.Example: Transcribe audio
import{AutoProcessor,AutoModelForImageTextToText,TextStreamer,}from"@huggingface/transformers";importwavefilefrom"wavefile";// Load processor and modelconstmodel_id="onnx-community/gemma-3n-E2B-it-ONNX";constprocessor=awaitAutoProcessor.from_pretrained(model_id);constmodel=awaitAutoModelForImageTextToText.from_pretrained(model_id,{dtype:{embed_tokens:"q8",audio_encoder:"q4",vision_encoder:"fp16",decoder_model_merged:"q4",},device:"cpu",// NOTE: WebGPU support coming soon!});// Prepare promptconstmessages=[{role:"user",content:[{type:"audio"},{type:"text",text:"Transcribe this audio verbatim."},],},];constprompt=processor.apply_chat_template(messages,{add_generation_prompt:true,});// Prepare inputsconsturl="https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/jfk.wav";constbuffer=Buffer.from(awaitfetch(url).then((x)=>x.arrayBuffer()));constwav=newwavefile.WaveFile(buffer);wav.toBitDepth("32f");// Pipeline expects input as a Float32Arraywav.toSampleRate(processor.feature_extractor.config.sampling_rate);letaudioData=wav.getSamples();if(Array.isArray(audioData)){if(audioData.length>1){for(leti=0;i<audioData[0].length;++i){audioData[0][i]=(Math.sqrt(2)*(audioData[0][i]+audioData[1][i]))/2;}}audioData=audioData[0];}constimage=null;constaudio=audioData;constinputs=awaitprocessor(prompt,image,audio,{add_special_tokens:false,});// Generate outputconstoutputs=awaitmodel.generate({ ...inputs,max_new_tokens:512,do_sample:false,streamer:newTextStreamer(processor.tokenizer,{skip_prompt:true,skip_special_tokens:false,// callback_function: (text) => { /* Do something with the streamed output */ },}),});// Decode outputconstdecoded=processor.batch_decode(outputs.slice(null,[inputs.input_ids.dims.at(-1),null]),{skip_special_tokens:true},);console.log(decoded[0]);
See example output
And so, my fellow Americans, ask not what your country can do for you. Ask what you can do for your country.Qwen3-Embedding
The Qwen3 Embedding model series is the latest proprietary model of the Qwen family, specifically designed for text embedding and ranking tasks. Building upon the dense foundational models of the Qwen3 series, it provides a comprehensive range of text embeddings and reranking models in various sizes (0.6B, 4B, and 8B). This series inherits the exceptional multilingual capabilities, long-text understanding, and reasoning skills of its foundational model.
You can run it with Transformers.js as follows:
import{pipeline,matmul}from"@huggingface/transformers";// Create a feature extraction pipelineconstextractor=awaitpipeline("feature-extraction","onnx-community/Qwen3-Embedding-0.6B-ONNX",{dtype:"fp32",// Options: "fp32", "fp16", "q8"// device: "webgpu",},);functionget_detailed_instruct(task_description,query){return`Instruct:${task_description}\nQuery:${query}`;}// Each query must come with a one-sentence instruction that describes the taskconsttask="Given a web search query, retrieve relevant passages that answer the query";constqueries=[get_detailed_instruct(task,"What is the capital of China?"),get_detailed_instruct(task,"Explain gravity"),];// No need to add instruction for retrieval documentsconstdocuments=["The capital of China is Beijing.","Gravity is a force that attracts two bodies towards each other. It gives weight to physical objects and is responsible for the movement of planets around the sun.",];constinput_texts=[...queries, ...documents];// Extract embeddings for queries and documentsconstoutput=awaitextractor(input_texts,{pooling:"last_token",normalize:true,});constscores=awaitmatmul(output.slice([0,queries.length]),// Query embeddingsoutput.slice([queries.length,null]).transpose(1,0),// Document embeddings);console.log(scores.tolist());// [// [ 0.7645590305328369, 0.14142560958862305 ],// [ 0.13549776375293732, 0.599955141544342 ]// ]
Llava-Qwen2
Finally, we also added support for Llava models with a Qwen2 text backbone:
import{AutoProcessor,AutoModelForImageTextToText,load_image,TextStreamer,}from"@huggingface/transformers";// Load processor and modelconstmodel_id="onnx-community/FastVLM-0.5B-ONNX";constprocessor=awaitAutoProcessor.from_pretrained(model_id);constmodel=awaitAutoModelForImageTextToText.from_pretrained(model_id,{dtype:{embed_tokens:"fp16",vision_encoder:"q4",decoder_model_merged:"q4",},});// Prepare promptconstmessages=[{role:"user",content:"<image>Describe this image in detail.",},];constprompt=processor.apply_cha...
Assets2
Uh oh!
There was an error while loading.Please reload this page.
