import{pipeline}from'@huggingface/transformers';constimage_feature_extractor=awaitpipeline('image-feature-extraction','onnx-community/dinov3-vits16-pretrain-lvd1689m-ONNX',);consturl='https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/cats.png';constfeatures=awaitimage_feature_extractor(url);console.log(features);

Try it out using ouronline demo:

dinov3.mp4

Full Changelog:3.7.1...3.7.2

Assets2

2 people reacted

3.7.1

01 Aug 21:14

xenova

3.7.1

8d6c400

This commit was created on GitHub.com and signed with GitHub’sverified signature.

GPG key ID:B5690EEEBB952194

Verified

Learn about vigilant mode.

3.7.1

What's new?

Add support for Arcee in#1377
Optimize tensor.slice() by@Honry in#1381

New Contributors

@Honry made their first contribution in#1381

Full Changelog:3.7.0...3.7.1

Contributors

Honry

Assets2

5 people reacted

3.7.0

23 Jul 03:12

xenova

3.7.0

0feb5b7

This commit was created on GitHub.com and signed with GitHub’sverified signature.

GPG key ID:B5690EEEBB952194

Verified

Learn about vigilant mode.

3.7.0

🚀 Transformers.js v3.7 — Voxtral, LFM2, ModernBERT Decoder

🤖 New models

This update adds support for 3 new architectures:

Voxtral

Voxtral Mini is an enhancement ofMinistral 3B, incorporating state-of-the-art audio input capabilities while retaining best-in-class text performance. It excels at speech transcription, translation and audio understanding. ONNX weights for Voxtral-Mini-3B-2507 can be foundhere. Learn more about Voxtral in the releaseblog post.

Try it out with ouronline demo:

Voxtral.WebGPU.demo.mp4

Example: Audio transcription

import{VoxtralForConditionalGeneration,VoxtralProcessor,TextStreamer,read_audio}from"@huggingface/transformers";// Load the processor and modelconstmodel_id="onnx-community/Voxtral-Mini-3B-2507-ONNX";constprocessor=awaitVoxtralProcessor.from_pretrained(model_id);constmodel=awaitVoxtralForConditionalGeneration.from_pretrained(model_id,{dtype:{embed_tokens:"fp16",// "fp32", "fp16", "q8", "q4"audio_encoder:"q4",// "fp32", "fp16", "q8", "q4", "q4f16"decoder_model_merged:"q4",// "q4", "q4f16"},device:"webgpu",},);// Prepare the conversationconstconversation=[{"role":"user","content":[{"type":"audio"},{"type":"text","text":"lang:en [TRANSCRIBE]"},],}];consttext=processor.apply_chat_template(conversation,{tokenize:false});constaudio=awaitread_audio("http://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/mlk.wav",16000);constinputs=awaitprocessor(text,audio);// Generate the responseconstgenerated_ids=awaitmodel.generate({    ...inputs,max_new_tokens:256,streamer:newTextStreamer(processor.tokenizer,{skip_special_tokens:true,skip_prompt:true}),});// Decode the generated tokensconstnew_tokens=generated_ids.slice(null,[inputs.input_ids.dims.at(-1),null]);constgenerated_texts=processor.batch_decode(new_tokens,{skip_special_tokens:true},);console.log(generated_texts[0]);// I have a dream that one day this nation will rise up and live out the true meaning of its creed.

Added in#1373 and#1375.

LFM2

LFM2 is a new generation of hybrid models developed byLiquid AI, specifically designed for edge AI and on-device deployment. It sets a new standard in terms of quality, speed, and memory efficiency.

The models, which we have converted to ONNX, come in three different sizes:350M,700M, and1.2B parameters.

Example: Text-generation with LFM2-350M:

import{pipeline,TextStreamer}from"@huggingface/transformers";// Create a text generation pipelineconstgenerator=awaitpipeline("text-generation","onnx-community/LFM2-350M-ONNX",{dtype:"q4"},);// Define the list of messagesconstmessages=[{role:"system",content:"You are a helpful assistant."},{role:"user",content:"What is the capital of France?"},];// Generate a responseconstoutput=awaitgenerator(messages,{max_new_tokens:512,do_sample:false,streamer:newTextStreamer(generator.tokenizer,{skip_prompt:true,skip_special_tokens:true}),});console.log(output[0].generated_text.at(-1).content);// The capital of France is Paris. It is a vibrant city known for its historical landmarks, art, fashion, and gastronomy.

Added in#1367 and#1369.

ModernBERT Decoder

These models form part of the Ettin suite: the first collection of paired encoder-only and decoder-only models trained with identical data, architecture, and training recipes. Ettin enables fair comparisons between encoder and decoder architectures across multiple scales, providing state-of-the-art performance for open-data models in their respective size categories.

The list of supported models can be foundhere.

import{pipeline,TextStreamer}from"@huggingface/transformers";// Create a text generation pipelineconstgenerator=awaitpipeline("text-generation","onnx-community/ettin-decoder-150m-ONNX",{dtype:"fp32"},);// Generate a responseconsttext="Q: What is the capital of France?\nA:";constoutput=awaitgenerator(text,{max_new_tokens:128,streamer:newTextStreamer(generator.tokenizer,{skip_prompt:true,skip_special_tokens:true}),});console.log(output[0].generated_text);

Added in#1371.

🛠️ Other improvements

Add special tokens in text-generation pipeline if tokenizer requires in#1370

Full Changelog:3.6.3...3.7.0

Assets2

5 people reacted

3.6.3

11 Jul 20:11

xenova

3.6.3

467f59c

This commit was created on GitHub.com and signed with GitHub’sverified signature.

GPG key ID:B5690EEEBB952194

Verified

Learn about vigilant mode.

3.6.3

What's new?

Bump@huggingface/jinja to version 0.5.1 for new chat template functionality in#1364

Full Changelog:3.6.2...3.6.3

Assets2

3 people reacted

3.6.2

08 Jul 17:46

xenova

3.6.2

6f026f3

This commit was created on GitHub.com and signed with GitHub’sverified signature.

GPG key ID:B5690EEEBB952194

Verified

Learn about vigilant mode.

3.6.2

What's new?

Add support for SmolLM3 in#1359

SmolLM3 is a 3B parameter language model designed to push the boundaries of small models. It supports 6 languages, advanced reasoning and long context. SmolLM3 is a fully open model that offers strong performance at the 3B–4B scale.

Example:

import{pipeline,TextStreamer}from"@huggingface/transformers";// Create a text generation pipelineconstgenerator=awaitpipeline("text-generation","HuggingFaceTB/SmolLM3-3B-ONNX",{dtype:"q4f16"},);// Define the list of messagesconstmessages=[{role:"system",content:"You are SmolLM, a language model created by Hugging Face. If asked by the user, here is some information about you: SmolLM has 3 billion parameters and can converse in 6 languages: English, Spanish, German, French, Italian, and Portuguese. SmolLM is a fully open model and was trained on a diverse mix of public datasets./think"},{role:"user",content:"Solve the equation x^2 - 3x + 2 = 0"},];// Generate a responseconstoutput=awaitgenerator(messages,{max_new_tokens:1024,do_sample:false,streamer:newTextStreamer(generator.tokenizer,{skip_prompt:true,skip_special_tokens:true}),});console.log(output[0].generated_text.at(-1).content);

Add support for ERNIE-4.5 in#1354
Example:

import{pipeline,TextStreamer}from"@huggingface/transformers";// Create a text generation pipelineconstgenerator=awaitpipeline("text-generation","onnx-community/ERNIE-4.5-0.3B-ONNX",{dtype:"fp32"},// Options: "fp32", "fp16", "q8", "q4", "q4f16");// Define the list of messagesconstmessages=[{role:"system",content:"You are a helpful assistant."},{role:"user",content:"What is the capital of France?"},];// Generate a responseconstoutput=awaitgenerator(messages,{max_new_tokens:512,do_sample:false,streamer:newTextStreamer(generator.tokenizer,{skip_prompt:true,skip_special_tokens:true}),});console.log(output[0].generated_text.at(-1).content);// The capital of France is Paris.

Full Changelog:3.6.1...3.6.2

Assets2

6 people reacted

3.6.1

02 Jul 05:20

xenova

3.6.1

fc2847c

This commit was created on GitHub.com and signed with GitHub’sverified signature.

GPG key ID:B5690EEEBB952194

Verified

Learn about vigilant mode.

3.6.1

What's new?

Add support for NeoBERT in#1350

import{pipeline}from"@huggingface/transformers";// Create feature extraction pipelineconstextractor=awaitpipeline("feature-extraction","onnx-community/NeoBERT-ONNX");// Compute embeddingsconsttext="NeoBERT is the most efficient model of its kind!";constembedding=awaitextractor(text,{pooling:"cls"});console.log(embedding.dims);// [1, 768]

Improve webworker detection to support ServiceWorker and SharedWorker by@aungKhantPaing in#1346
Pin numpy version for scripts by@fidoriel in#1351
Fix optionalfrom_pretrained types in#1352

New Contributors

@aungKhantPaing made their first contribution in#1346
@fidoriel made their first contribution in#1351

Full Changelog:3.6.0...3.6.1

Contributors

aungKhantPaing and fidoriel

Assets2

6 people reacted

3.6.0

26 Jun 15:48

xenova

3.6.0

7b45042

This commit was created on GitHub.com and signed with GitHub’sverified signature.

GPG key ID:B5690EEEBB952194

Verified

Learn about vigilant mode.

3.6.0

🚀 Transformers.js v3.6 — Gemma 3n, Qwen3-Embedding, Llava-Qwen2

🤖 New models

Gemma 3n

Gemma 3n, which was announced as apreview during Google I/O, is a model that is designed from the ground up torun locally on your hardware. On top of that, it's nativelymultimodal, supporting image, text, audio, and video inputs 🤯

Gemma 3n models have multiple architecture innovations:

They are available in two sizes based oneffective parameters. While the raw parameter count of this model is 6B, the architecture design allows the model to be run with a memory footprint comparable to a traditional 2B model by offloading low-utilization matrices from the accelerator.
They use a MatFormer architecture that allows nesting sub-models within theE4B model. We provide one sub-model (this model repository), or you can access a spectrum of custom-sized models using theMix-and-Match method.

Learn more about these techniques in thetechnical blog post and theGemma documentation.

As part of the release, we are releasing ONNX weights for thegemma-3n-E2B-it variant (link), making it compatible with Transformers.js:

Warning

Due to the model's large size, we currently only support Node.js, Deno, and Bun execution.
In-browser WebGPU support is actively being worked on, so stay tuned for an update!

Example: Caption an image

import{AutoProcessor,AutoModelForImageTextToText,load_image,TextStreamer,}from"@huggingface/transformers";// Load processor and modelconstmodel_id="onnx-community/gemma-3n-E2B-it-ONNX";constprocessor=awaitAutoProcessor.from_pretrained(model_id);constmodel=awaitAutoModelForImageTextToText.from_pretrained(model_id,{dtype:{embed_tokens:"q8",audio_encoder:"q8",vision_encoder:"fp16",decoder_model_merged:"q4",},device:"cpu",// NOTE: WebGPU support coming soon!});// Prepare promptconstmessages=[{role:"user",content:[{type:"image"},{type:"text",text:"Describe this image in detail."},],},];constprompt=processor.apply_chat_template(messages,{add_generation_prompt:true,});// Prepare inputsconsturl="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/bee.jpg";constimage=awaitload_image(url);constaudio=null;constinputs=awaitprocessor(prompt,image,audio,{add_special_tokens:false,});// Generate outputconstoutputs=awaitmodel.generate({  ...inputs,max_new_tokens:512,do_sample:false,streamer:newTextStreamer(processor.tokenizer,{skip_prompt:true,skip_special_tokens:false,// callback_function: (text) => { /* Do something with the streamed output */ },}),});// Decode outputconstdecoded=processor.batch_decode(outputs.slice(null,[inputs.input_ids.dims.at(-1),null]),{skip_special_tokens:true},);console.log(decoded[0]);

See example output

The image is a close-up, slightly macro shot of a cluster of vibrant pink cosmos flowers in full bloom. The flowers are the focal point, with their delicate, slightly ruffled petals radiating outwards. They have a soft, almost pastel pink hue, and their edges are subtly veined. A small, dark-colored bee is actively visiting one of the pink flowers, its body positioned near the center of the bloom. The bee appears to be collecting pollen or nectar. The flowers are attached to slender, brownish-green stems, and some of the surrounding foliage is visible in a blurred background, suggesting a natural outdoor setting. There are also hints of other flowers in the background, including some red ones, adding a touch of contrast to the pink. The lighting in the image seems to be natural daylight, casting soft shadows and highlighting the textures of the petals and the bee. The overall impression is one of delicate beauty and the gentle activity of nature.

Example: Transcribe audio

import{AutoProcessor,AutoModelForImageTextToText,TextStreamer,}from"@huggingface/transformers";importwavefilefrom"wavefile";// Load processor and modelconstmodel_id="onnx-community/gemma-3n-E2B-it-ONNX";constprocessor=awaitAutoProcessor.from_pretrained(model_id);constmodel=awaitAutoModelForImageTextToText.from_pretrained(model_id,{dtype:{embed_tokens:"q8",audio_encoder:"q4",vision_encoder:"fp16",decoder_model_merged:"q4",},device:"cpu",// NOTE: WebGPU support coming soon!});// Prepare promptconstmessages=[{role:"user",content:[{type:"audio"},{type:"text",text:"Transcribe this audio verbatim."},],},];constprompt=processor.apply_chat_template(messages,{add_generation_prompt:true,});// Prepare inputsconsturl="https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/jfk.wav";constbuffer=Buffer.from(awaitfetch(url).then((x)=>x.arrayBuffer()));constwav=newwavefile.WaveFile(buffer);wav.toBitDepth("32f");// Pipeline expects input as a Float32Arraywav.toSampleRate(processor.feature_extractor.config.sampling_rate);letaudioData=wav.getSamples();if(Array.isArray(audioData)){if(audioData.length>1){for(leti=0;i<audioData[0].length;++i){audioData[0][i]=(Math.sqrt(2)*(audioData[0][i]+audioData[1][i]))/2;}}audioData=audioData[0];}constimage=null;constaudio=audioData;constinputs=awaitprocessor(prompt,image,audio,{add_special_tokens:false,});// Generate outputconstoutputs=awaitmodel.generate({  ...inputs,max_new_tokens:512,do_sample:false,streamer:newTextStreamer(processor.tokenizer,{skip_prompt:true,skip_special_tokens:false,// callback_function: (text) => { /* Do something with the streamed output */ },}),});// Decode outputconstdecoded=processor.batch_decode(outputs.slice(null,[inputs.input_ids.dims.at(-1),null]),{skip_special_tokens:true},);console.log(decoded[0]);

See example output

And so, my fellow Americans, ask not what your country can do for you. Ask what you can do for your country.

Qwen3-Embedding

The Qwen3 Embedding model series is the latest proprietary model of the Qwen family, specifically designed for text embedding and ranking tasks. Building upon the dense foundational models of the Qwen3 series, it provides a comprehensive range of text embeddings and reranking models in various sizes (0.6B, 4B, and 8B). This series inherits the exceptional multilingual capabilities, long-text understanding, and reasoning skills of its foundational model.

You can run it with Transformers.js as follows:

import{pipeline,matmul}from"@huggingface/transformers";// Create a feature extraction pipelineconstextractor=awaitpipeline("feature-extraction","onnx-community/Qwen3-Embedding-0.6B-ONNX",{dtype:"fp32",// Options: "fp32", "fp16", "q8"// device: "webgpu",},);functionget_detailed_instruct(task_description,query){return`Instruct:${task_description}\nQuery:${query}`;}// Each query must come with a one-sentence instruction that describes the taskconsttask="Given a web search query, retrieve relevant passages that answer the query";constqueries=[get_detailed_instruct(task,"What is the capital of China?"),get_detailed_instruct(task,"Explain gravity"),];// No need to add instruction for retrieval documentsconstdocuments=["The capital of China is Beijing.","Gravity is a force that attracts two bodies towards each other. It gives weight to physical objects and is responsible for the movement of planets around the sun.",];constinput_texts=[...queries, ...documents];// Extract embeddings for queries and documentsconstoutput=awaitextractor(input_texts,{pooling:"last_token",normalize:true,});constscores=awaitmatmul(output.slice([0,queries.length]),// Query embeddingsoutput.slice([queries.length,null]).transpose(1,0),// Document embeddings);console.log(scores.tolist());// [//   [ 0.7645590305328369, 0.14142560958862305 ],//   [ 0.13549776375293732, 0.599955141544342 ]// ]

Llava-Qwen2

Finally, we also added support for Llava models with a Qwen2 text backbone:

import{AutoProcessor,AutoModelForImageTextToText,load_image,TextStreamer,}from"@huggingface/transformers";// Load processor and modelconstmodel_id="onnx-community/FastVLM-0.5B-ONNX";constprocessor=awaitAutoProcessor.from_pretrained(model_id);constmodel=awaitAutoModelForImageTextToText.from_pretrained(model_id,{dtype:{embed_tokens:"fp16",vision_encoder:"q4",decoder_model_merged:"q4",},});// Prepare promptconstmessages=[{role:"user",content:"<image>Describe this image in detail.",},];constprompt=processor.apply_cha...

Assets2

10 people reacted

Movatterモバイル変換

Releases: huggingface/transformers.js

3.7.6

What's new?

New Contributors

Contributors

Uh oh!

3.7.5

What's new?

Uh oh!

3.7.4

What's new?

Uh oh!

3.7.2

What's new?

Uh oh!

3.7.1

What's new?

New Contributors

Contributors

Uh oh!

3.7.0

🚀 Transformers.js v3.7 — Voxtral, LFM2, ModernBERT Decoder

🤖 New models

Voxtral

LFM2

ModernBERT Decoder

🛠️ Other improvements

Uh oh!

3.6.3

What's new?

Uh oh!

3.6.2

What's new?

Uh oh!

3.6.1

What's new?

New Contributors

Contributors

Uh oh!

3.6.0

🚀 Transformers.js v3.6 — Gemma 3n, Qwen3-Embedding, Llava-Qwen2

🤖 New models

Gemma 3n

Qwen3-Embedding

Llava-Qwen2

Uh oh!