You signed in with another tab or window.Reload to refresh your session.You signed out in another tab or window.Reload to refresh your session.You switched accounts on another tab or window.Reload to refresh your session.Dismiss alert
📌 Official project website for our EMNLP Findings 2025 paper: “NLKI: A Lightweight Natural Language Knowledge Integration Framework for Improving Small VLMs in Commonsense VQA Tasks”
Commonsense visual–question answering often hinges on knowledge missing from the image or the question. Small vision–language models (sVLMs) such as ViLT, VisualBERT, and FLAVA lag behind their larger generative counterparts. We presentNLKI, an end-to-end framework that:
Retrieves natural language facts
Prompts an LLM to craft explanations
Feeds both signals to sVLMs
Our approach improves accuracy byup to 7% across CRIC, AOKVQA, and e-SNLI-VE, and with noise-robust training adds another2.5–5.5% gain — allowing250M-parameter models to rival medium-sized VLMs.
✨ Citation
If you use NLKI, please cite our work:
@misc{dutta2025nlkilightweightnaturallanguage,title={NLKI: A lightweight Natural Language Knowledge Integration Framework for Improving Small VLMs in Commonsense VQA Tasks},author={Aritra Dutta and Swapnanil Mukherjee and Deepanway Ghosal and Somak Aditya},year={2025},eprint={2508.19724},archivePrefix={arXiv},primaryClass={cs.CL},url={https://arxiv.org/abs/2508.19724}, }
About
Project page of NLKI framework which was published in the EMNLP 2025 Conference in the findings track.