ResearchCodeBench: Benchmarking LLMs on Implementing Novel Machine Learning Research Code Paper • 2506.02314 • Published Jun 2
Cognitive Behaviors that Enable Self-Improving Reasoners, or, Four Habits of Highly Effective STaRs Paper • 2503.01307 • Published Mar 3 • 39
Hypothetical Minds: Scaffolding Theory of Mind for Multi-Agent Tasks with Large Language Models Paper • 2407.07086 • Published Jul 9, 2024
Towards System 2 Reasoning in LLMs: Learning How to Think With Meta Chain-of-Though Paper • 2501.04682 • Published Jan 8 • 99
Towards System 2 Reasoning in LLMs: Learning How to Think With Meta Chain-of-Though Paper • 2501.04682 • Published Jan 8 • 99
Adaptive Inference-Time Compute: LLMs Can Predict if They Can Do Better, Even Mid-Generation Paper • 2410.02725 • Published Oct 3, 2024 • 1
Open X-Embodiment: Robotic Learning Datasets and RT-X Models Paper • 2310.08864 • Published Oct 13, 2023 • 2
RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control Paper • 2307.15818 • Published Jul 28, 2023 • 30
D5RL: Diverse Datasets for Data-Driven Deep Reinforcement Learning Paper • 2408.08441 • Published Aug 15, 2024 • 8
Preference Fine-Tuning of LLMs Should Leverage Suboptimal, On-Policy Data Paper • 2404.14367 • Published Apr 22, 2024 • 1
Crossing Linguistic Horizons: Finetuning and Comprehensive Evaluation of Vietnamese Large Language Models Paper • 2403.02715 • Published Mar 5, 2024 • 3
Crossing Linguistic Horizons: Finetuning and Comprehensive Evaluation of Vietnamese Large Language Models Paper • 2403.02715 • Published Mar 5, 2024 • 3
Robotic Offline RL from Internet Videos via Value-Function Pre-Training Paper • 2309.13041 • Published Sep 22, 2023 • 8
DecodingTrust: A Comprehensive Assessment of Trustworthiness in GPT Models Paper • 2306.11698 • Published Jun 20, 2023 • 12