RoRA-VLM: Robust Retrieval-Augmented Vision Language Models

Published in ICCV 2025 Workshop on Knowledge-Intensive Multimodal Reasoning, 2025

Jingyuan Qi, Zhiyang Xu, Rulin Shao, Zihao Lin, Yang Chen, Di Jin, Yu Cheng, Qifan Wang, Lifu Huang

ICCV 2025 Workshop on Knowledge-Intensive Multimodal Reasoning

Recommended citation: Jingyuan Qi, Zhiyang Xu, Rulin Shao, Zihao Lin, Yang Chen, Di Jin, Yu Cheng, Qifan Wang, Lifu Huang. "RoRA-VLM: Robust Retrieval-Augmented Vision Language Models." ICCV 2025 Workshop on Knowledge-Intensive Multimodal Reasoning.
Download Paper