Trials, Errors, and Breakthroughs: Our Rocky Road to OVD SOTA with Reinforcement Learning
• 2
Multimodal AI, VLM, VLA, VAM, etc
Which Pretraining Paradigm Better Serves Spatial Intelligence? An Empirical Comparison of Vision-Language and Video Generation Models
VLM-FO1: Bridging the Gap Between High-Level Reasoning and Fine-Grained Perception in VLMs