Submitted by xianbao 77 GLM-4.5: Agentic, Reasoning, and Coding (ARC) Foundation Models · 171 authors 1.82k 3
Submitted by RyanL22 35 Voost: A Unified and Scalable Diffusion Transformer for Bidirectional Virtual Try-On and Try-Off · 2 authors 58 3
Submitted by SiriusL 17 InfiGUI-G1: Advancing GUI Grounding with Adaptive Exploration Policy Optimization · 13 authors 9 2
Submitted by YerbaPage 12 Pruning the Unsurprising: Efficient Code Reasoning via First-Token Surprisal · 7 authors 3 3
Submitted by MikolajZ 6 GENIE: Gaussian Encoding for Neural Radiance Fields Interactive Editing · 4 authors 8 2
Submitted by hdong51 5 Adapting Vision-Language Models Without Labels: A Comprehensive Survey · 6 authors 16 2
Submitted by KejiaRobust 4 MELLA: Bridging Linguistic Capability and Cultural Groundedness for Low-Resource Language MLLMs · 7 authors 2
Submitted by fsk515 3 MeshLLM: Empowering Large Language Models to Progressively Understand and Generate 3D Mesh · 9 authors 2
Submitted by huxueyu 2 OS Agents: A Survey on MLLM-based Agents for General Computing Devices Use · 29 authors 326 2
Submitted by LianShuQuan 2 UI-AGILE: Advancing GUI Agents with Effective Reinforcement Learning and Precise Inference-Time Grounding · 7 authors 2
Submitted by thebluser 1 LightSwitch: Multi-view Relighting with Material-guided Diffusion · 3 authors 3
Submitted by shijiezhou 1 VLM4D: Towards Spatiotemporal Awareness in Vision Language Models · 10 authors 2