Submitted by Iceclear 44 SeedVR2: One-Step Video Restoration via Diffusion Adversarial Post-Training · 13 authors 1
Submitted by imryanxu 43 ComfyUI-Copilot: An Intelligent Assistant for Automated Workflow Development · 10 authors 1
Submitted by Zhoues 36 RoboRefer: Towards Spatial Referring with Reasoning in Vision-Language Models for Robotics · 11 authors 3
Submitted by yurakuratov 33 Diagonal Batching Unlocks Parallelism in Recurrent Memory Transformers for Long Contexts · 5 authors 3
Submitted by thenlper 31 Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation Models · 12 authors 1
Submitted by hamza-hcompany 27 Surfer-H Meets Holo1: Cost-Efficient Web Agent Powered by Open Weights · 43 authors 2
Submitted by stefan-it 26 The Common Pile v0.1: An 8TB Dataset of Public Domain and Openly Licensed Text · 27 authors 1
Submitted by Mikivis 24 VideoREPA: Learning Physics for Video Generation through Relational Alignment with Foundation Models · 7 authors 2
Submitted by Hanoona 22 VideoMathQA: Benchmarking Mathematical Reasoning via Multimodal Understanding in Videos · 7 authors 1
Submitted by lulidong 20 AV-Reasoner: Improving and Benchmarking Clue-Grounded Audio-Visual Counting for MLLMs · 5 authors 1
Submitted by kuvvi 16 Unfolding Spatial Cognition: Evaluating Multimodal Models on Visual Simulations · 8 authors 1
Submitted by Zuyan 15 SparseMM: Head Sparsity Emerges from Visual Concept Responses in MLLMs · 4 authors
Submitted by Kullpar 15 StreamBP: Memory-Efficient Exact Backpropagation for Long Sequence Training of LLMs · 4 authors 2
Submitted by CircleRadon 13 EOC-Bench: Can MLLMs Identify, Recall, and Forecast Objects in an Egocentric World? · 11 authors 1
Submitted by xy06 12 MINT-CoT: Enabling Interleaved Visual Tokens in Mathematical Chain-of-Thought Reasoning · 7 authors 1
Submitted by lhmd 11 Revisiting Depth Representations for Feed-Forward 3D Gaussian Splatting · 7 authors 1
Submitted by lincharliesun 11 Evaluation is All You Need: Strategic Overclaiming of LLM Reasoning Capabilities Through Evaluation Design · 11 authors 3
Submitted by StarYDY 11 FlexPainter: Flexible and Multi-View Consistent Texture Generation · 10 authors 2
Submitted by yiren98 8 Autoregressive Images Watermarking through Lexical Biasing: An Approach Resistant to Regeneration Attack · 6 authors 2
Submitted by wyf2020 5 FreeTimeGS: Free Gaussians at Anytime and Anywhere for Dynamic Scene Reconstruction · 9 authors 1
Submitted by diqiu7 5 SkyReels-Audio: Omni Audio-Conditioned Talking Portraits in Video Diffusion Transformers · 11 authors 2
Submitted by JJitsev 4 Scaling Laws for Robust Comparison of Open Foundation Language-Vision Models and Datasets · 7 authors 1
Submitted by wshi83 4 MedAgentGym: Training LLM Agents for Code-Based Medical Reasoning at Scale · 14 authors 1
Submitted by Eric-Lan 4 Contextual Integrity in LLMs via Reasoning and Reinforcement Learning · 8 authors 1
Submitted by gq2138 3 Micro-Act: Mitigate Knowledge Conflict in Question Answering via Actionable Self-Reasoning · 8 authors 1
Submitted by ZhangRC 3 FEAT: Full-Dimensional Efficient Attention Transformer for Medical Video Generation · 6 authors 1
Submitted by fcy99 3 RobustSplat: Decoupling Densification and Dynamics for Transient-Free 3DGS · 8 authors 2
Submitted by DrChiZhang 2 FlowDirector: Training-Free Flow Steering for Precise Text-to-Video Editing · 4 authors
Submitted by KaiChen1998 2 Perceptual Decoupling for Scalable Multi-modal Reasoning via Reward-Optimized Captioning · 8 authors 1
Submitted by Yewandou 2 BEVCALIB: LiDAR-Camera Calibration via Geometry-Guided Bird's-Eye View Representations · 6 authors 2
Submitted by gzzyyxy 2 Diffusion-Based Generative Models for 3D Occupancy Prediction in Autonomous Driving · 8 authors 2
Submitted by EdBianchi 1 PATS: Proficiency-Aware Temporal Sampling for Multi-View Sports Skill Assessment · 2 authors 1
Submitted by 0xe69756 1 Watermarking Degrades Alignment in Language Models: Analysis and Mitigation · 3 authors 1
Submitted by zzh99 1 Rethinking Whole-Body CT Image Interpretation: An Abnormality-Centric Approach · 5 authors 2
Submitted by levondang 1 SViMo: Synchronized Diffusion for Video and Motion Generation in Hand-object Interaction Scenarios · 6 authors 3
Submitted by mariannedhk 1 What do self-supervised speech models know about Dutch? Analyzing advantages of language-specific pre-training · 6 authors 2