BLEUBERI: BLEU is a surprisingly effective reward for instruction following Paper • 2505.11080 • Published 22 days ago • 5
Soft Thinking: Unlocking the Reasoning Potential of LLMs in Continuous Concept Space Paper • 2505.15778 • Published 17 days ago • 15
HoPE: Hybrid of Position Embedding for Length Generalization in Vision-Language Models Paper • 2505.20444 • Published 12 days ago • 3
ATLAS: Learning to Optimally Memorize the Context at Test Time Paper • 2505.23735 • Published 9 days ago • 22
Exploring the Latent Capacity of LLMs for One-Step Text Generation Paper • 2505.21189 • Published 11 days ago • 60
SynLogic: Synthesizing Verifiable Reasoning Data at Scale for Learning Logical Reasoning and Beyond Paper • 2505.19641 • Published 12 days ago • 64
RADLADS: Rapid Attention Distillation to Linear Attention Decoders at Scale Paper • 2505.03005 • Published May 5 • 31
AdaR1: From Long-CoT to Hybrid-CoT via Bi-Level Adaptive Reasoning Optimization Paper • 2504.21659 • Published Apr 30 • 12
TF1-EN-3M: Three Million Synthetic Moral Fables for Training Small, Open Language Models Paper • 2504.20605 • Published Apr 29 • 13