PRITHIV SAKTHI U R PRITHIVSAKTHIUR

🎯

Focusing

Computer Vision, Multimodal AI

Achievements

Hi, I'm a Machine Learning Engineer, Hugging Face Fellow ML 🤗, Computer Vision Enthusiast.

FLUX-LoRA-DLC: FLUX.1-dev diffusion model with 255+ community LoRAs, 1.09K+ likes, 70K+ runs.[Collection]
Multimodal-OCR: OCR for images and videos using state-of-the-art vision-language models, 40K+ runs, 90K+ visits.[Collection]
Multimodal-VLM-Thinking: VLMs for captioning, OCR, reasoning, and multimodal tasks, 2.06K+ runs, 11.2K+ visits.[Collection]
Qwen3-VL-Outpost: VLM for image & video understanding with multilingual support, 6.2K+ runs, 49.9K+ visits.[Collection]
Flux Realism: Hyper-realistic image generation with FLUX.1-dev and Super Realism LoRA, 39.5K+ runs, 127.7K+ visits.[Collection]
Nano-Banana-AIO: Minimalistic Gemini API app to experience Google’s NanoBanana functionalities.[Collection]

Gliese-OCR-7B-Post1.0: Enhanced document retrieval, content extraction, and analysis, built on Camel-Doc-OCR-062825.[Collection]
DeepCaption-VLA-7B: Generates precise, descriptive image captions highlighting visual properties, object attributes.[Collection]
Camel-Doc-OCR: Document retrieval, content extraction, and analysis. (v2 080125)[Collection]
SigLIP2-0.1B-DownStream: Domain-specific image classification models fine-tuned from siglip2 for multi-label tasks.[Base]
Lumian2-VLR-7B: VLM for fine-grained multimodal reasoning, image/video captioning, and document comprehension with explainable step-by-step reasoning.[Demo]

Galactic-Qwen-14B: Top mid-range 14B model, ranked 59th, overall score 43.56.[Leaderboard]
Gauss-Opus-14B: Strong in math, ranked 356th, MATH Level 5 score 57.55.[Leaderboard]
Sombrero-Opus-14B: All-rounder mid-range 14B, ranked 104th, score 42.32.[Leaderboard]
Dinobot-Opus-14B: IFEval score 82.40, ranked 132nd, overall 41.77.[Leaderboard]
Qwen2-VL-OCR-2B: Edge-device VLM for handwriting, LaTeX, bills, and receipts, 250k+ downloads.[Run Demo]

Stranger Vision: Community for model modification and experimentation, < 1K downloads.[Collection]
Stranger Zone: Illustration adapters for diffusion models, 2M+ downloads.[Collection]
Stranger Guard: Image safety-guard models, 10k+ downloads.[Collection]
Stranger Operations: Model Fostering, Operations, and Cycle
Stranger Tools: Tools, Wheels, Fun

Multimodal-Outpost-NotebooksMultimodal-Outpost-NotebooksPublic
This repository contains a curated collection of notebooks for implementing state-of-the-art multimodal Vision-Language Models (VLMs).
Jupyter Notebook 24 4
FineTuning-SigLIP-2FineTuning-SigLIP-2Public
Fine-Tuning SigLIP 2 for Single/Multi-Label Image Classification. Image classification vision-language encoder model fine-tuned for Image Classification Tasks
Jupyter Notebook 48 7
OCR-ReportLab-NotebooksOCR-ReportLab-NotebooksPublic
A dedicated Colab notebooks to experiment (Nanonets OCR, Monkey OCR, OCRFlux 3B, Typhoo OCR 3B & more..) On T4 GPU - free tier
Jupyter Notebook 23 4
Flux-LoRA-DLCFlux-LoRA-DLCPublic
Experience the power of the FLUX.1-dev diffusion model combined with a massive collection of 255+ community-created LoRAs! This Gradio application provides an easy-to-use interface to explore diver…
Python 13 1
Qwen-Image-Edit-2509-LoRAs-FastQwen-Image-Edit-2509-LoRAs-FastPublic
Qwen-Image-Edit-2509-LoRAs-Fast is a high-performance, user-friendly web application built with Gradio that leverages the advanced Qwen/Qwen-Image-Edit-2509 model from Hugging Face for seamless ima…
Python 15 2
FLUX-REALISMFLUX-REALISMPublic
A Gradio-based web application for generating hyper-realistic images using FLUX.1-dev with Super Realism LoRA enhancement. This application provides an intuitive interface for creating high-quality…
Python 16 4