FutureX: An Advanced Live Benchmark for LLM Agents in Future Prediction Paper • 2508.11987 • Published 4 days ago • 1
Chain-of-Agents: End-to-End Agent Foundation Models via Multi-Agent Distillation and Agentic RL Paper • 2508.13167 • Published 14 days ago • 62
Seed-Prover: Deep and Broad Reasoning for Automated Theorem Proving Paper • 2507.23726 • Published 20 days ago • 107
Seed Diffusion: A Large-Scale Diffusion Language Model with High-Speed Inference Paper • 2508.02193 • Published 16 days ago • 126
Efficient Agents: Building Effective Agents While Reducing Cost Paper • 2508.02694 • Published 27 days ago • 81
Agent KB: Leveraging Cross-Domain Experience for Agentic Problem Solving Paper • 2507.06229 • Published Jul 8 • 73
CriticLean: Critic-Guided Reinforcement Learning for Mathematical Formalization Paper • 2507.06181 • Published Jul 8 • 41
Seed1.5-Thinking: Advancing Superb Reasoning Models with Reinforcement Learning Paper • 2504.13914 • Published Apr 10 • 4
VideoEval-Pro: Robust and Realistic Long Video Understanding Evaluation Paper • 2505.14640 • Published May 20 • 15
ScaleLong: A Multi-Timescale Benchmark for Long Video Understanding Paper • 2505.23922 • Published May 29
P2P: Automated Paper-to-Poster Generation and Fine-Grained Benchmark Paper • 2505.17104 • Published May 21