- Notifications
You must be signed in to change notification settings - Fork77
Pocket-Sized Multimodal AI for content understanding and generation across multilingual texts, images, and 🔜 video, up to 5x faster than OpenAI CLIP and LLaVA 🖼️ & 🖋️
License
unum-cloud/UForm
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
Multimodal Embeddings from 64 to 768 Dimensions • 1B Parameter Chat
Short Texts • Images • 🔜 Video Clips • 🔜 Long Documents
ONNX • CoreML • PyTorch
Python •JavaScript •Swift
Welcome to UForm, amultimodal AI library that's as versatile as it is efficient.UFormtiny embedding models will help you understand and search visual and textual content across various languages.UFormsmall generative models, on the other hand, don't only support conversational and chat use-cases, but are great for fast image captioning and Visual Question Answering (VQA).With compactcustom pre-trained transformer models, this can run anywhere from your server farm down to your smartphone.
- Tiny Embeddings: 64-dimensionalMatryoshka-style embeddings for extremely fastsearch.
- Throughput: Thanks to the small size, the inference speed is2-4x faster than competitors.
- Portable: Models come with native ONNX support, making them easy to deploy on any platform.
- Quantization Aware: Down-cast embeddings from
f32toi8without losing much recall. - Multilingual: Trained on a balanced dataset, the recall is great across over 20 languages.
For accuracy and speed benchmarks refer to theevaluation page.
| Model | Parameters | Languages | Architecture |
|---|---|---|---|
uform3-image-text-english-large 🆕 | 365 M | 1 | 12 layer BERT, ViT-L/14 |
uform3-image-text-english-base | 143 M | 1 | 4 layer BERT, ViT-B/16 |
uform3-image-text-english-small 🆕 | 79 M | 1 | 4 layer BERT, ViT-S/16 |
uform3-image-text-multilingual-base | 206M | 21 | 12 layer BERT, ViT-B/16 |
| Model | Parameters | Purpose | Architecture |
|---|---|---|---|
uform-gen2-dpo 🆕 | 1.2 B | Chat, Image Captioning, VQA | qwen1.5-0.5B, ViT-H/14 |
uform-gen2-qwen-500m | 1.2 B | Chat, Image Captioning, VQA | qwen1.5-0.5B, ViT-H/14 |
uform-gen | 1.5 B | Image Captioning, VQA | llama-1.3B, ViT-B/16 |
First,pip install uform.Then, load the model:
fromuformimportget_model,Modality# Defaults to `dtype='bfloat16'` for ~2x speedup with minimal accuracy lossprocessors,models=get_model('unum-cloud/uform3-image-text-english-small',device='cuda')model_text=models[Modality.TEXT_ENCODER]model_image=models[Modality.IMAGE_ENCODER]processor_text=processors[Modality.TEXT_ENCODER]processor_image=processors[Modality.IMAGE_ENCODER]
Embed images:
importrequestsfromioimportBytesIOfromPILimportImageimage_url='https://media-cdn.tripadvisor.com/media/photo-s/1b/28/6b/53/lovely-armenia.jpg'image=Image.open(BytesIO(requests.get(image_url).content))image_data=processor_image(image)image_features,image_embedding=model_image.encode(image_data,return_features=True)
Embed queries:
text='a cityscape bathed in the warm glow of the sun, with varied architecture and a towering, snow-capped mountain rising majestically in the background'text_data=processor_text(text)text_features,text_embedding=model_text.encode(text_data,return_features=True)
For more details check out:
- Python docs on embedding models inpython/README.md
- JavaScript docs on embedding models injavascript/README.md
- Swift docs on embedding models inswift/README.md
The generative models are natively compatible with
fromtransformersimportAutoModel,AutoProcessormodel=AutoModel.from_pretrained('unum-cloud/uform-gen2-dpo',trust_remote_code=True)processor=AutoProcessor.from_pretrained('unum-cloud/uform-gen2-dpo',trust_remote_code=True)prompt='Question or Instruction'image=Image.open('image.jpg')inputs=processor(text=[prompt],images=[image],return_tensors='pt')withtorch.inference_mode():output=model.generate(**inputs,do_sample=False,use_cache=True,max_new_tokens=256,eos_token_id=151645,pad_token_id=processor.tokenizer.pad_token_id )prompt_len=inputs['input_ids'].shape[1]decoded_text=processor.batch_decode(output[:,prompt_len:])[0]
For more details check out:
- Python docs on generative models inpython/README.md
- JavaScript docs on generative models 🔜
- Swift docs on generative models 🔜
Depending on the application, the embeddings can be down-casted to smaller numeric representations without losing much recall.Switching fromf32 tof16 is recommended in almost all cases, unless you are running on very old hardware without half-precision support.Switching toi8 with linear scaling is also possible, but will be noticeable in the recall on larger collections with millions of searchable entries.Similarly, for higher-dimensional embeddings (512 or 768), a common strategy is to quantize them into single-bit representations for faster search.
importnumpyasnpf32_embedding:np.ndarray=model.encode_text(text_data,return_features=False)f16_embedding:np.ndarray=f32_embedding.astype(np.float16)i8_embedding:np.ndarray= (f32_embedding*127).astype(np.int8)b1_embedding:np.ndarray=np.packbits((f32_embedding>0).astype(np.uint8))
Alternative approach to quantization is to use the Matryoshka embeddings, where the embeddings are sliced into smaller parts, and the search is performed in a hierarchical manner.
importnumpyasnplarge_embedding:np.ndarray=model.encode_text(text_data,return_features=False)small_embedding:np.ndarray=large_embedding[:, :256]tiny_embedding:np.ndarray=large_embedding[:, :64]
Both approaches are natively supported by theUSearch vector-search engine and theSimSIMD numerics libraries.When dealing with small collections (up to millions of entries) and looking for low-latency cosine distance calculations, you canachieve 5x-2500x performance improvement over Torch, NumPy, SciPy, and vanilla Python using SimSIMD.
fromsimsimdimportcosine,hammingdistance:float=cosine(f32_embedding,f32_embedding)# 32x SciPy performance on Apple M2 CPUdistance:float=cosine(f16_embedding,f16_embedding)# 79x SciPy performance on Apple M2 CPUdistance:float=cosine(i8_embedding,i8_embedding)# 133x SciPy performance on Apple M2 CPUdistance:float=hamming(b1_embedding,b1_embedding)# 17x SciPy performance on Apple M2 CPU
Similarly, when dealing with large collections (up to billions of entries per server) and looking for high-throughput search, you canachieve 100x performance improvement over FAISS and other vector-search solutions using USearch.Here are a couple of examples:
fromusearch.indeximportIndexf32_index=Index(ndim=64,metric='cos',dtype='f32')# for Matryoshka embeddingsf16_index=Index(ndim=64,metric='cos',dtype='f16')# for Matryoshka embeddingsi8_index=Index(ndim=256,metric='cos',dtype='i8')# for quantized embeddingsb1_index=Index(ndim=768,metric='hamming',dtype='b1')# for binary embeddings
PyTorch is a heavy dependency to carry, especially if you run on Edge or IoT devices.Using vanilla ONNX runtime, one can significantly reduce memory consumption and deployment latency.
$ conda create -n uform_torch python=3.10 -y$ conda create -n uform_onnx python=3.10 -y$ conda activate uform_torch&& pip install -e".[torch]"&& conda deactivate$ conda activate uform_onnx&& pip install -e".[onnx]"&& conda deactivate$ du -sh$(conda info --envs| grep'uform_torch'| awk'{print $2}')> 5.2G~/conda/envs/uform_torch$ du -sh$(conda info --envs| grep'uform_onnx'| awk'{print $2}')> 461M~/conda/envs/uform_onnx
Most of that weight can be further reduced down to 100 MB for both the model and the runtime.You can pick one of many supportedONNX execution providers, which includes XNNPACK, CUDA and TensorRT for Nvidia GPUs, OpenVINO on Intel, DirectML on Windows, ROCm on AMD, CoreML on Apple devices, and more to come.
The generative models can be used for chat-like experiences in the command line.For that, you can use theuform-chat CLI tool, which is available in the UForm package.
$ pip install uform$ uform-chat --model unum-cloud/uform-gen2-dpo --image=zebra.jpg$ uform-chat --model unum-cloud/uform-gen2-dpo \> --image="https://bit.ly/3tIVg9M" \> --device="cuda:0" \> --fp16
About
Pocket-Sized Multimodal AI for content understanding and generation across multilingual texts, images, and 🔜 video, up to 5x faster than OpenAI CLIP and LLaVA 🖼️ & 🖋️
Topics
Resources
License
Contributing
Uh oh!
There was an error while loading.Please reload this page.
Stars
Watchers
Forks
Uh oh!
There was an error while loading.Please reload this page.
