Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Pocket-Sized Multimodal AI for content understanding and generation across multilingual texts, images, and 🔜 video, up to 5x faster than OpenAI CLIP and LLaVA 🖼️ & 🖋️

License

NotificationsYou must be signed in to change notification settings

unum-cloud/UForm

Repository files navigation

Pocket-Sized Multimodal AI
For Content Understanding and Generation


Discord     LinkedIn     Twitter     Blog     GitHub

Multimodal Embeddings from 64 to 768 Dimensions • 1B Parameter Chat
Short Texts • Images • 🔜 Video Clips • 🔜 Long Documents
ONNX • CoreML • PyTorch
PythonJavaScriptSwift


UForm Chat Preview

Welcome to UForm, amultimodal AI library that's as versatile as it is efficient.UFormtiny embedding models will help you understand and search visual and textual content across various languages.UFormsmall generative models, on the other hand, don't only support conversational and chat use-cases, but are great for fast image captioning and Visual Question Answering (VQA).With compactcustom pre-trained transformer models, this can run anywhere from your server farm down to your smartphone.

Features

  • Tiny Embeddings: 64-dimensionalMatryoshka-style embeddings for extremely fastsearch.
  • Throughput: Thanks to the small size, the inference speed is2-4x faster than competitors.
  • Portable: Models come with native ONNX support, making them easy to deploy on any platform.
  • Quantization Aware: Down-cast embeddings fromf32 toi8 without losing much recall.
  • Multilingual: Trained on a balanced dataset, the recall is great across over 20 languages.

Models

For accuracy and speed benchmarks refer to theevaluation page.

Embedding Models

ModelParametersLanguagesArchitecture
uform3-image-text-english-large 🆕365 M112 layer BERT, ViT-L/14
uform3-image-text-english-base143 M14 layer BERT, ViT-B/16
uform3-image-text-english-small 🆕79 M14 layer BERT, ViT-S/16
uform3-image-text-multilingual-base206M2112 layer BERT, ViT-B/16

Generative Models

ModelParametersPurposeArchitecture
uform-gen2-dpo 🆕1.2 BChat, Image Captioning, VQAqwen1.5-0.5B, ViT-H/14
uform-gen2-qwen-500m1.2 BChat, Image Captioning, VQAqwen1.5-0.5B, ViT-H/14
uform-gen⚠️1.5 BImage Captioning, VQAllama-1.3B, ViT-B/16

Quick Start Examples

Embedding Models

First,pip install uform.Then, load the model:

fromuformimportget_model,Modality# Defaults to `dtype='bfloat16'` for ~2x speedup with minimal accuracy lossprocessors,models=get_model('unum-cloud/uform3-image-text-english-small',device='cuda')model_text=models[Modality.TEXT_ENCODER]model_image=models[Modality.IMAGE_ENCODER]processor_text=processors[Modality.TEXT_ENCODER]processor_image=processors[Modality.IMAGE_ENCODER]

Embed images:

importrequestsfromioimportBytesIOfromPILimportImageimage_url='https://media-cdn.tripadvisor.com/media/photo-s/1b/28/6b/53/lovely-armenia.jpg'image=Image.open(BytesIO(requests.get(image_url).content))image_data=processor_image(image)image_features,image_embedding=model_image.encode(image_data,return_features=True)

Embed queries:

text='a cityscape bathed in the warm glow of the sun, with varied architecture and a towering, snow-capped mountain rising majestically in the background'text_data=processor_text(text)text_features,text_embedding=model_text.encode(text_data,return_features=True)

For more details check out:

Generative Models

The generative models are natively compatible with

fromtransformersimportAutoModel,AutoProcessormodel=AutoModel.from_pretrained('unum-cloud/uform-gen2-dpo',trust_remote_code=True)processor=AutoProcessor.from_pretrained('unum-cloud/uform-gen2-dpo',trust_remote_code=True)prompt='Question or Instruction'image=Image.open('image.jpg')inputs=processor(text=[prompt],images=[image],return_tensors='pt')withtorch.inference_mode():output=model.generate(**inputs,do_sample=False,use_cache=True,max_new_tokens=256,eos_token_id=151645,pad_token_id=processor.tokenizer.pad_token_id    )prompt_len=inputs['input_ids'].shape[1]decoded_text=processor.batch_decode(output[:,prompt_len:])[0]

For more details check out:

  • Python docs on generative models inpython/README.md
  • JavaScript docs on generative models 🔜
  • Swift docs on generative models 🔜

Technical Details

Down-casting, Quantization, Matryoshka, and Slicing

Depending on the application, the embeddings can be down-casted to smaller numeric representations without losing much recall.Switching fromf32 tof16 is recommended in almost all cases, unless you are running on very old hardware without half-precision support.Switching toi8 with linear scaling is also possible, but will be noticeable in the recall on larger collections with millions of searchable entries.Similarly, for higher-dimensional embeddings (512 or 768), a common strategy is to quantize them into single-bit representations for faster search.

importnumpyasnpf32_embedding:np.ndarray=model.encode_text(text_data,return_features=False)f16_embedding:np.ndarray=f32_embedding.astype(np.float16)i8_embedding:np.ndarray= (f32_embedding*127).astype(np.int8)b1_embedding:np.ndarray=np.packbits((f32_embedding>0).astype(np.uint8))

Alternative approach to quantization is to use the Matryoshka embeddings, where the embeddings are sliced into smaller parts, and the search is performed in a hierarchical manner.

importnumpyasnplarge_embedding:np.ndarray=model.encode_text(text_data,return_features=False)small_embedding:np.ndarray=large_embedding[:, :256]tiny_embedding:np.ndarray=large_embedding[:, :64]

Both approaches are natively supported by theUSearch vector-search engine and theSimSIMD numerics libraries.When dealing with small collections (up to millions of entries) and looking for low-latency cosine distance calculations, you canachieve 5x-2500x performance improvement over Torch, NumPy, SciPy, and vanilla Python using SimSIMD.

fromsimsimdimportcosine,hammingdistance:float=cosine(f32_embedding,f32_embedding)# 32x SciPy performance on Apple M2 CPUdistance:float=cosine(f16_embedding,f16_embedding)# 79x SciPy performance on Apple M2 CPUdistance:float=cosine(i8_embedding,i8_embedding)# 133x SciPy performance on Apple M2 CPUdistance:float=hamming(b1_embedding,b1_embedding)# 17x SciPy performance on Apple M2 CPU

Similarly, when dealing with large collections (up to billions of entries per server) and looking for high-throughput search, you canachieve 100x performance improvement over FAISS and other vector-search solutions using USearch.Here are a couple of examples:

fromusearch.indeximportIndexf32_index=Index(ndim=64,metric='cos',dtype='f32')# for Matryoshka embeddingsf16_index=Index(ndim=64,metric='cos',dtype='f16')# for Matryoshka embeddingsi8_index=Index(ndim=256,metric='cos',dtype='i8')# for quantized embeddingsb1_index=Index(ndim=768,metric='hamming',dtype='b1')# for binary embeddings

Compact Packaging

PyTorch is a heavy dependency to carry, especially if you run on Edge or IoT devices.Using vanilla ONNX runtime, one can significantly reduce memory consumption and deployment latency.

$ conda create -n uform_torch python=3.10 -y$ conda create -n uform_onnx python=3.10 -y$ conda activate uform_torch&& pip install -e".[torch]"&& conda deactivate$ conda activate uform_onnx&& pip install -e".[onnx]"&& conda deactivate$ du -sh$(conda info --envs| grep'uform_torch'| awk'{print $2}')> 5.2G~/conda/envs/uform_torch$ du -sh$(conda info --envs| grep'uform_onnx'| awk'{print $2}')> 461M~/conda/envs/uform_onnx

Most of that weight can be further reduced down to 100 MB for both the model and the runtime.You can pick one of many supportedONNX execution providers, which includes XNNPACK, CUDA and TensorRT for Nvidia GPUs, OpenVINO on Intel, DirectML on Windows, ROCm on AMD, CoreML on Apple devices, and more to come.

Multimodal Chat in CLI

The generative models can be used for chat-like experiences in the command line.For that, you can use theuform-chat CLI tool, which is available in the UForm package.

$ pip install uform$ uform-chat --model unum-cloud/uform-gen2-dpo --image=zebra.jpg$ uform-chat --model unum-cloud/uform-gen2-dpo \>     --image="https://bit.ly/3tIVg9M" \>     --device="cuda:0" \>     --fp16

Contributors21


[8]ページ先頭

©2009-2025 Movatter.jp