view article Article StackLLaMA: A hands-on guide to train LLaMA with RLHF By edbeeching and 6 others • Apr 5, 2023 • 37
view article Article DeepSeek-R1 Dissection: Understanding PPO & GRPO Without Any Prior Reinforcement Learning Knowledge By NormalUhr • Feb 7 • 148
100 Days After DeepSeek-R1: A Survey on Replication Studies and More Directions for Reasoning Language Models Paper • 2505.00551 • Published May 1 • 37
LLMs for Engineering: Teaching Models to Design High Powered Rockets Paper • 2504.19394 • Published Apr 27 • 13
AdaR1: From Long-CoT to Hybrid-CoT via Bi-Level Adaptive Reasoning Optimization Paper • 2504.21659 • Published Apr 30 • 12
Grokking in the Wild: Data Augmentation for Real-World Multi-Hop Reasoning with Transformers Paper • 2504.20752 • Published Apr 29 • 91
Are VLMs Ready for Autonomous Driving? An Empirical Study from the Reliability, Data, and Metric Perspectives Paper • 2501.04003 • Published Jan 7 • 28
VLsI: Verbalized Layers-to-Interactions from Large to Small Vision Language Models Paper • 2412.01822 • Published Dec 2, 2024 • 15
DiffusionDrive: Truncated Diffusion Model for End-to-End Autonomous Driving Paper • 2411.15139 • Published Nov 22, 2024 • 15
LLaVA-o1: Let Vision Language Models Reason Step-by-Step Paper • 2411.10440 • Published Nov 15, 2024 • 125
view article Article LLaVA-o1: Let Vision Language Models Reason Step-by-Step By mikelabs • Nov 19, 2024 • 12
Medical QA Datasets Collection A collection of medical question answering (QA) datasets • 23 items • Updated Feb 22 • 41
Precise and Dexterous Robotic Manipulation via Human-in-the-Loop Reinforcement Learning Paper • 2410.21845 • Published Oct 29, 2024 • 14