Submitted by akhaliq 80 VideoGrain: Modulating Space-Time Attention for Multi-grained Video Editing · 4 authors 147 5
Submitted by gallilmaimon 70 Slamming: Training a Speech Language Model on One GPU in a Day · 3 authors 215 2
Submitted by Canyu 53 DICEPTION: A Generalist Diffusion Model for Visual Perceptual Tasks · 8 authors 3
Submitted by Facico 31 Make LoRA Great Again: Boosting LoRA with Adaptive Singular Values and Mixture-of-Experts Optimization Alignment · 7 authors 123 4
Submitted by CheeryLJH 27 CodeCriticBench: A Holistic Code Critique Benchmark for Large Language Models · 18 authors 3
Submitted by amphora 26 Linguistic Generalizability of Test-Time Scaling in Mathematical Reasoning · 4 authors 2
Submitted by akhaliq 20 RIFLEx: A Free Lunch for Length Extrapolation in Video Diffusion Transformers · 6 authors 3
Submitted by TianjinHuang 19 Stable-SPAM: How to Train in 4-Bit More Stably than 16-Bit Adam · 11 authors 2
Submitted by xw-eric 18 Multimodal Inconsistency Reasoning (MMIR): A New Benchmark for Multimodal Reasoning Models · 8 authors 2
Submitted by irenesolaiman 16 Beyond Release: Access Considerations for Generative AI Systems · 7 authors 4
Submitted by xhyandwyy 13 Mobile-Agent-V: Learning Mobile Device Operation Through Video-Guided Multi-Agent Collaboration · 7 authors 2
Submitted by jianlanluo 13 Reflective Planning: Vision-Language Models for Multi-Stage Long-Horizon Robotic Manipulation · 6 authors 2
Submitted by GPaolo 9 TAG: A Decentralized Framework for Multi-Agent Hierarchical Reinforcement Learning · 5 authors 2
Submitted by callanwu 8 Benchmarking Temporal Reasoning and Alignment Across Chinese Dynasties · 5 authors 4
Submitted by dalime 7 Investigating the Impact of Quantization Methods on the Safety and Reliability of Large Language Models · 6 authors 2
Submitted by peterji 6 Pandora3D: A Comprehensive Framework for High-Quality 3D Shape and Texture Generation · 10 authors 2
Submitted by codezakh 5 MutaGReP: Execution-Free Repository-Grounded Plan Search for Code-Use · 6 authors 2
Submitted by WillHeld 4 Mind the Gap! Static and Interactive Evaluations of Large Audio Models · 7 authors 2
Submitted by zouharvi 4 Early-Exit and Instant Confidence Translation Quality Estimation · 5 authors 2
Submitted by ludolara 2 Diagnosing COVID-19 Severity from Chest X-Ray Images Using ViT and CNN Architectures · 4 authors 2
Submitted by nielsr 2 M3-AGIQA: Multimodal, Multi-Round, Multi-Aspect AI-Generated Image Quality Assessment · 6 authors 2