20 50 220

Yinxu Pan

cppowboy

https://github.com/Cppowboy

AI & ML interests

RL for LLM, Code&Math Reasoning, Function Calling, Code Interpreter, Vision-Language Pretraining

Recent Activity

liked a model 1 day ago

mistralai/Devstral-Small-2507

liked a dataset 12 days ago

princeton-nlp/prolong-data-512K

liked a dataset 13 days ago

internlm/SWE-Fixer-Train-110K

View all activity

Organizations

upvoted a paper 19 days ago

Group Sequence Policy Optimization

Paper • 2507.18071 • Published 20 days ago • 281

upvoted a paper 20 days ago

RAVine: Reality-Aligned Evaluation for Agentic Search

Paper • 2507.16725 • Published 22 days ago • 28

upvoted a paper 21 days ago

MegaScience: Pushing the Frontiers of Post-Training Datasets for Science Reasoning

Paper • 2507.16812 • Published 22 days ago • 60

upvoted a paper 22 days ago

GR-3 Technical Report

Paper • 2507.15493 • Published 23 days ago • 45

upvoted a paper about 1 month ago

AsyncFlow: An Asynchronous Streaming RL Framework for Efficient LLM Post-Training

Paper • 2507.01663 • Published Jul 2 • 5

upvoted a collection about 1 month ago

Kimina Prover Preview

Collection

State-of-the-Art Models for Formal Mathematical Reasoning • 5 items • Updated Apr 28 • 33

upvoted a paper about 1 month ago

Does Math Reasoning Improve General LLM Capabilities? Understanding Transferability of LLM Reasoning

Paper • 2507.00432 • Published Jul 1 • 73

upvoted 4 papers about 2 months ago

V-JEPA 2: Self-Supervised Video Models Enable Understanding, Prediction and Planning

Paper • 2506.09985 • Published Jun 11 • 30

MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning Attention

Paper • 2506.13585 • Published Jun 16 • 261

DeepResearch Bench: A Comprehensive Benchmark for Deep Research Agents

Paper • 2506.11763 • Published Jun 13 • 69

LiveCodeBench Pro: How Do Olympiad Medalists Judge LLMs in Competitive Programming?

Paper • 2506.11928 • Published Jun 13 • 24

upvoted a paper 2 months ago

MiniCPM4: Ultra-Efficient LLMs on End Devices

Paper • 2506.07900 • Published Jun 9 • 89

upvoted a collection 2 months ago

MiniCPM4

Collection

MiniCPM4: Ultra-Efficient LLMs on End Devices • 22 items • Updated 6 days ago • 71

upvoted 4 papers 2 months ago

MiMo-VL Technical Report

Paper • 2506.03569 • Published Jun 4 • 77

AReaL: A Large-Scale Asynchronous Reinforcement Learning System for Language Reasoning

Paper • 2505.24298 • Published May 30 • 27

Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning

Paper • 2506.01939 • Published Jun 2 • 177

ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models

Paper • 2505.24864 • Published May 30 • 134

upvoted a collection 2 months ago

Web-Shepherd: Advancing PRMs for Reinforcing Web Agents

Collection

7 items • Updated May 22 • 3

upvoted 2 papers 3 months ago

QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning

Paper • 2505.17667 • Published May 23 • 89

Emerging Properties in Unified Multimodal Pretraining

Paper • 2505.14683 • Published May 20 • 133

Yinxu Pan

AI & ML interests

Recent Activity

Organizations

cppowboy's activity