new

Get trending papers in your email inbox once a day!

Get trending papers in your email inbox!

Daily Papers

byAK and the research community

May 30

Submitted by

yilunzhao

Table-R1: Inference-Time Scaling for Table Reasoning

·
4 authors

Submitted by

Liuff23

Spatial-MLLM: Boosting MLLM Capabilities in Visual-based Spatial Intelligence

·
4 authors

Submitted by

AngLv

The Climb Carves Wisdom Deeper Than the Summit: On the Noisy Rewards in Learning to Reason

·
5 authors

Submitted by

songtingyu

VF-Eval: Evaluating Multimodal LLMs for Generating Feedback on AIGC Videos

·
4 authors

Submitted by

cyyang822

ZeroGUI: Automating Online GUI Learning at Zero Human Cost

·
14 authors

Submitted by

shizhediao

Fast-dLLM: Training-free Acceleration of Diffusion LLM by Enabling KV Cache and Parallel Decoding

·
9 authors

Submitted by

lyx97

VideoReasonBench: Can MLLMs Perform Vision-Centric Complex Video Reasoning?

·
10 authors

Submitted by

maksimko123

cadrille: Multi-modal CAD Reconstruction with Online Reinforcement Learning

·
9 authors

3

Submitted by

sebgao

D-AR: Diffusion via Autoregressive Models

·
2 authors

Submitted by

lhjiang

AnySplat: Feed-forward 3D Gaussian Splatting from Unconstrained Views

·
12 authors

Submitted by

RicardoL1u

Are Reasoning Models More Prone to Hallucination?

·
8 authors

Submitted by

dlaptev

Train Sparse Autoencoders Efficiently by Utilizing Features Correlation

·
5 authors

2

Submitted by

chaoscodes

Satori-SWE: Evolutionary Test-Time Scaling for Sample-Efficient Software Engineering

·
11 authors

2

Submitted by

ydalva

LoRAShop: Training-Free Multi-Concept Image Generation and Editing with Rectified Flow Transformers

·
3 authors

3

Submitted by

AliBehrouz

ATLAS: Learning to Optimally Memorize the Context at Test Time

·
8 authors

Submitted by

benzweijia

UniRL: Self-Improving Unified Multimodal Models via Supervised and Reinforcement Learning

·
3 authors

Submitted by

vyokky

SWE-bench Goes Live!

·
15 authors

Submitted by

nitay

Multi-Domain Explainability of Preferences

·
3 authors

Submitted by

sy1998

VidText: Towards Comprehensive Evaluation for Video Text Understanding

·
10 authors

Submitted by

spapi

FAMA: The First Large-Scale Open-Science Speech Foundation Model for English and Italian

·
9 authors

Submitted by

gallilmaimon

StressTest: Can YOUR Speech LM Handle the Stress?

·
3 authors

Submitted by

TharinduSK

Towards Safety Reasoning in LLMs: AI-agentic Deliberation for Policy-embedded CoT Data Creation

·
9 authors

2

Submitted by

Jiahao004

DeepTheorem: Advancing LLM Reasoning for Theorem Proving Through Natural Language and Reinforcement Learning

·
13 authors

Submitted by

d3tk

REOrdering Patches Improves Vision Models

·
5 authors

Submitted by

BryanW

Muddit: Liberating Generation Beyond Text-to-Image with a Unified Discrete Diffusion Model

·
11 authors

Submitted by

unilm

On-Policy RL with Optimal Reward Baseline

·
6 authors

Submitted by

Bang-UdeM-Mila

System-1.5 Reasoning: Traversal in Language and Latent Spaces with Dynamic Shortcuts

·
4 authors

Submitted by

KunlunZhu

SafeScientist: Toward Risk-Aware Scientific Discoveries by LLM Agents

·
9 authors

Submitted by

Jang-Hyun

KVzip: Query-Agnostic KV Cache Compression with Context Reconstruction

·
6 authors

Submitted by

antonio-c

GeoDrive: 3D Geometry-Informed Driving World Model with Precise Action Control

·
8 authors

Submitted by

wangsssssss

Differentiable Solver Search for Fast Diffusion Sampling

·
8 authors

Submitted by

dek924

PatientSim: A Persona-Driven Simulator for Realistic Doctor-Patient Interactions

·
8 authors

Submitted by

jefflai

Breaking Down Video LLM Benchmarks: Knowledge, Spatial Perception, or True Temporal Understanding?

·
7 authors

Submitted by

BestWishYsh

MAGREF: Masked Guidance for Any-Reference Video Generation

·
11 authors

Submitted by

Elfsong

Afterburner: Reinforcement Learning Facilitates Self-Improving Code Efficiency Optimization

·
9 authors

Submitted by

m-serious

ToMAP: Training Opponent-Aware LLM Persuaders with Theory of Mind

·
3 authors

Submitted by

smallAI

Uni-Instruct: One-step Diffusion Model through Unified Diffusion Divergence Instruction

·
6 authors

2

Submitted by

ttumyche

CXReasonBench: A Benchmark for Evaluating Structured Diagnostic Reasoning in Chest X-rays

·
6 authors

Submitted by

ChaoHuangCS

ZeroSep: Separate Anything in Audio with Zero Training

·
9 authors

Submitted by

angtian

ATI: Any Trajectory Instruction for Controllable Video Generation

·
5 authors

Submitted by

crc5577

Re-ttention: Ultra Sparse Visual Generation via Attention Statistical Reshape

·
5 authors

2

Submitted by

zgao3186

One-shot Entropy Minimization

·
4 authors

Submitted by

davidchan

Puzzled by Puzzles: When Vision-Language Models Can't Take a Hint

·
6 authors

Submitted by

JRQi

When Models Reason in Your Language: Controlling Thinking Trace Language Comes at the Cost of Accuracy

·
6 authors

Submitted by

hdong51

To Trust Or Not To Trust Your Vision-Language Model's Prediction

·
5 authors

Submitted by

lyxun

UniTEX: Universal High Fidelity Generative Texturing for 3D Shapes

·
8 authors

Submitted by

kornelhowil

CLIPGaussian: Universal and Multimodal Style Transfer Based on Gaussian Splatting

·
6 authors

Submitted by

JingzeShi

Concise Reasoning, Big Gains: Pruning Long Reasoning Trace with Difficulty-Aware Prompting

·
7 authors

2

Submitted by

RunsenXu

MMSI-Bench: A Benchmark for Multi-Image Spatial Intelligence

·
13 authors

Submitted by

xiaojwan

How Animals Dance (When You're Not Looking)

·
5 authors

2

Submitted by

lhmd

ZPressor: Bottleneck-Aware Compression for Scalable Feed-Forward 3DGS

·
6 authors

Submitted by

StringChaos

GSO: Challenging Software Optimization Tasks for Evaluating SWE-Agents

·
6 authors

2

Submitted by

ahnpersie

Can LLMs Deceive CLIP? Benchmarking Adversarial Compositionality of Pre-trained Multimodal Representation via Text Updates

·
4 authors

Submitted by

kpzhang996

SridBench: Benchmark of Scientific Research Illustration Drawing of Image Generation Model

·
7 authors

2

Submitted by

SuperSupermoon

Lunguage: A Benchmark for Structured and Sequential Chest X-ray Interpretation

·
13 authors

2

Submitted by

Franck-Dernoncourt

ChartLens: Fine-grained Visual Attribution in Charts

·
6 authors

2

Submitted by

Franck-Dernoncourt

A Graph Perspective to Probe Structural Patterns of Knowledge in Large Language Models

·
9 authors

2

Submitted by

yunjae-won

Differential Information: An Information-Theoretic Perspective on Preference Optimization

·
4 authors

2

Submitted by

gsarch

Grounded Reinforcement Learning for Visual Reasoning

·
7 authors

Submitted by

at676

Model-Preserving Adaptive Rounding

·
3 authors

Submitted by

Aman

Evaluating Text Creativity across Diverse Domains: A Dataset and Large Language Model Evaluator

·
6 authors

2

Submitted by

Junfeng5

TokBench: Evaluating Your Visual Tokenizer before Visual Generation

·
9 authors

Submitted by

TeddyXGZ

Toward Reliable Biomedical Hypothesis Generation: Evaluating Truthfulness and Hallucination in Large Language Models

·
8 authors

Submitted by

gsarti

Unsupervised Word-level Quality Estimation for Machine Translation Through the Lens of Annotators (Dis)agreement

·
4 authors

Submitted by

pengxiang

Adaptive Classifier-Free Guidance via Dynamic Low-Confidence Masking

·
7 authors

Submitted by

ctma

Large Language Models Meet Knowledge Graphs for Question Answering: Synthesis and Opportunities

·
5 authors