unum-cloud/UFormPublic

NotificationsYou must be signed in to change notification settings
Fork77
Star1.2k

Pocket-Sized Multimodal AI for content understanding and generation across multilingual texts, images, and 🔜 video, up to 5x faster than OpenAI CLIP and LLaVA 🖼️ & 🖋️

License

Apache-2.0 license

1.2k stars 77 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 307 Commits
.github/workflows		.github/workflows
.vscode		.vscode
assets		assets
docs		docs
javascript		javascript
python		python
swift		swift
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.swift-format		.swift-format
BENCHMARKS.md		BENCHMARKS.md
CITATION.cff		CITATION.cff
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
Package.resolved		Package.resolved
Package.swift		Package.swift
README.md		README.md
VERSION		VERSION
package-lock.json		package-lock.json
package.json		package.json
pyproject.toml		pyproject.toml
yarn.lock		yarn.lock

Repository files navigation

UForm

Pocket-Sized Multimodal AI
For Content Understanding and Generation

Multimodal Embeddings from 64 to 768 Dimensions • 1B Parameter Chat
Short Texts • Images • 🔜 Video Clips • 🔜 Long Documents
ONNX • CoreML • PyTorch
Python •JavaScript •Swift

Welcome to UForm, amultimodal AI library that's as versatile as it is efficient.UFormtiny embedding models will help you understand and search visual and textual content across various languages.UFormsmall generative models, on the other hand, don't only support conversational and chat use-cases, but are great for fast image captioning and Visual Question Answering (VQA).With compactcustom pre-trained transformer models, this can run anywhere from your server farm down to your smartphone.

Features

Tiny Embeddings: 64-dimensionalMatryoshka-style embeddings for extremely fastsearch.
Throughput: Thanks to the small size, the inference speed is2-4x faster than competitors.
Portable: Models come with native ONNX support, making them easy to deploy on any platform.
Quantization Aware: Down-cast embeddings fromf32 toi8 without losing much recall.
Multilingual: Trained on a balanced dataset, the recall is great across over 20 languages.

Models

For accuracy and speed benchmarks refer to theevaluation page.

Embedding Models

Model	Parameters	Languages	Architecture
`uform3-image-text-english-large` 🆕	365 M	1	12 layer BERT, ViT-L/14
`uform3-image-text-english-base`	143 M	1	4 layer BERT, ViT-B/16
`uform3-image-text-english-small` 🆕	79 M	1	4 layer BERT, ViT-S/16
`uform3-image-text-multilingual-base`	206M	21	12 layer BERT, ViT-B/16

Generative Models

Model	Parameters	Purpose	Architecture
`uform-gen2-dpo` 🆕	1.2 B	Chat, Image Captioning, VQA	qwen1.5-0.5B, ViT-H/14
`uform-gen2-qwen-500m`	1.2 B	Chat, Image Captioning, VQA	qwen1.5-0.5B, ViT-H/14
`uform-gen`⚠️	1.5 B	Image Captioning, VQA	llama-1.3B, ViT-B/16

Quick Start Examples

Embedding Models

First,pip install uform.Then, load the model:

fromuformimportget_model,Modality# Defaults to `dtype='bfloat16'` for ~2x speedup with minimal accuracy lossprocessors,models=get_model('unum-cloud/uform3-image-text-english-small',device='cuda')model_text=models[Modality.TEXT_ENCODER]model_image=models[Modality.IMAGE_ENCODER]processor_text=processors[Modality.TEXT_ENCODER]processor_image=processors[Modality.IMAGE_ENCODER]

Embed images:

importrequestsfromioimportBytesIOfromPILimportImageimage_url='https://media-cdn.tripadvisor.com/media/photo-s/1b/28/6b/53/lovely-armenia.jpg'image=Image.open(BytesIO(requests.get(image_url).content))image_data=processor_image(image)image_features,image_embedding=model_image.encode(image_data,return_features=True)

Embed queries:

text='a cityscape bathed in the warm glow of the sun, with varied architecture and a towering, snow-capped mountain rising majestically in the background'text_data=processor_text(text)text_features,text_embedding=model_text.encode(text_data,return_features=True)

For more details check out:

Python docs on embedding models inpython/README.md
JavaScript docs on embedding models injavascript/README.md
Swift docs on embedding models inswift/README.md

Generative Models

The generative models are natively compatible with

fromtransformersimportAutoModel,AutoProcessormodel=AutoModel.from_pretrained('unum-cloud/uform-gen2-dpo',trust_remote_code=True)processor=AutoProcessor.from_pretrained('unum-cloud/uform-gen2-dpo',trust_remote_code=True)prompt='Question or Instruction'image=Image.open('image.jpg')inputs=processor(text=[prompt],images=[image],return_tensors='pt')withtorch.inference_mode():output=model.generate(**inputs,do_sample=False,use_cache=True,max_new_tokens=256,eos_token_id=151645,pad_token_id=processor.tokenizer.pad_token_id    )prompt_len=inputs['input_ids'].shape[1]decoded_text=processor.batch_decode(output[:,prompt_len:])[0]

For more details check out:

Python docs on generative models inpython/README.md
JavaScript docs on generative models 🔜
Swift docs on generative models 🔜

Technical Details

Down-casting, Quantization, Matryoshka, and Slicing

Depending on the application, the embeddings can be down-casted to smaller numeric representations without losing much recall.Switching fromf32 tof16 is recommended in almost all cases, unless you are running on very old hardware without half-precision support.Switching toi8 with linear scaling is also possible, but will be noticeable in the recall on larger collections with millions of searchable entries.Similarly, for higher-dimensional embeddings (512 or 768), a common strategy is to quantize them into single-bit representations for faster search.

importnumpyasnpf32_embedding:np.ndarray=model.encode_text(text_data,return_features=False)f16_embedding:np.ndarray=f32_embedding.astype(np.float16)i8_embedding:np.ndarray= (f32_embedding*127).astype(np.int8)b1_embedding:np.ndarray=np.packbits((f32_embedding>0).astype(np.uint8))

Alternative approach to quantization is to use the Matryoshka embeddings, where the embeddings are sliced into smaller parts, and the search is performed in a hierarchical manner.

importnumpyasnplarge_embedding:np.ndarray=model.encode_text(text_data,return_features=False)small_embedding:np.ndarray=large_embedding[:, :256]tiny_embedding:np.ndarray=large_embedding[:, :64]

Both approaches are natively supported by theUSearch vector-search engine and theSimSIMD numerics libraries.When dealing with small collections (up to millions of entries) and looking for low-latency cosine distance calculations, you canachieve 5x-2500x performance improvement over Torch, NumPy, SciPy, and vanilla Python using SimSIMD.

fromsimsimdimportcosine,hammingdistance:float=cosine(f32_embedding,f32_embedding)# 32x SciPy performance on Apple M2 CPUdistance:float=cosine(f16_embedding,f16_embedding)# 79x SciPy performance on Apple M2 CPUdistance:float=cosine(i8_embedding,i8_embedding)# 133x SciPy performance on Apple M2 CPUdistance:float=hamming(b1_embedding,b1_embedding)# 17x SciPy performance on Apple M2 CPU

Similarly, when dealing with large collections (up to billions of entries per server) and looking for high-throughput search, you canachieve 100x performance improvement over FAISS and other vector-search solutions using USearch.Here are a couple of examples:

fromusearch.indeximportIndexf32_index=Index(ndim=64,metric='cos',dtype='f32')# for Matryoshka embeddingsf16_index=Index(ndim=64,metric='cos',dtype='f16')# for Matryoshka embeddingsi8_index=Index(ndim=256,metric='cos',dtype='i8')# for quantized embeddingsb1_index=Index(ndim=768,metric='hamming',dtype='b1')# for binary embeddings

Compact Packaging

PyTorch is a heavy dependency to carry, especially if you run on Edge or IoT devices.Using vanilla ONNX runtime, one can significantly reduce memory consumption and deployment latency.

$ conda create -n uform_torch python=3.10 -y$ conda create -n uform_onnx python=3.10 -y$ conda activate uform_torch&& pip install -e".[torch]"&& conda deactivate$ conda activate uform_onnx&& pip install -e".[onnx]"&& conda deactivate$ du -sh$(conda info --envs| grep'uform_torch'| awk'{print $2}')> 5.2G~/conda/envs/uform_torch$ du -sh$(conda info --envs| grep'uform_onnx'| awk'{print $2}')> 461M~/conda/envs/uform_onnx

Most of that weight can be further reduced down to 100 MB for both the model and the runtime.You can pick one of many supportedONNX execution providers, which includes XNNPACK, CUDA and TensorRT for Nvidia GPUs, OpenVINO on Intel, DirectML on Windows, ROCm on AMD, CoreML on Apple devices, and more to come.

Multimodal Chat in CLI

The generative models can be used for chat-like experiences in the command line.For that, you can use theuform-chat CLI tool, which is available in the UForm package.

$ pip install uform$ uform-chat --model unum-cloud/uform-gen2-dpo --image=zebra.jpg$ uform-chat --model unum-cloud/uform-gen2-dpo \>     --image="https://bit.ly/3tIVg9M" \>     --device="cuda:0" \>     --fp16

About

Pocket-Sized Multimodal AI for content understanding and generation across multilingual texts, images, and 🔜 video, up to 5x faster than OpenAI CLIP and LLaVA 🖼️ & 🖋️

unum-cloud.github.io/UForm

Releases40

Release v3.1.4 Latest

Oct 30, 2025

+ 39 releases

Contributors21

+ 7 contributors

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

License

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

UForm

Pocket-Sized Multimodal AI
For Content Understanding and Generation

Features

Models

Embedding Models

Generative Models

Quick Start Examples

Embedding Models

Generative Models

Technical Details

Down-casting, Quantization, Matryoshka, and Slicing

Compact Packaging

Multimodal Chat in CLI

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases40

Uh oh!

Contributors21

Uh oh!

Languages

Movatterモバイル変換

License

unum-cloud/UForm

Folders and files

Latest commit

History

Repository files navigation

UForm

Pocket-Sized Multimodal AIFor Content Understanding and Generation

Features

Models

Embedding Models

Generative Models

Quick Start Examples

Embedding Models

Generative Models

Technical Details

Down-casting, Quantization, Matryoshka, and Slicing

Compact Packaging

Multimodal Chat in CLI

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases40

Uh oh!

Contributors21

Uh oh!

Languages

Pocket-Sized Multimodal AI
For Content Understanding and Generation