3 18 22

Andrew Zhao

andrewzh

https://andrewzh112.github.io/

AI & ML interests

Reinforcement Learning, Agents

Recent Activity

upvoted a paper 4 days ago

Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning

upvoted a paper 18 days ago

Seek in the Dark: Reasoning via Test-Time Instance-Level Policy Gradient in Latent Space

authored a paper 24 days ago

Absolute Zero: Reinforced Self-play Reasoning with Zero Data

View all activity

Organizations

None yet

andrewzh's activity

upvoted a paper 4 days ago

Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning

Paper • 2506.01939 • Published 5 days ago • 129

upvoted a paper 18 days ago

Seek in the Dark: Reasoning via Test-Time Instance-Level Policy Gradient in Latent Space

Paper • 2505.13308 • Published 19 days ago • 26

upvoted a paper about 1 month ago

Absolute Zero: Reinforced Self-play Reasoning with Zero Data

Paper • 2505.03335 • Published May 6 • 169

upvoted a collection about 1 month ago

Absolute Zero Reasoner

Collection

6 items • Updated 29 days ago • 53

upvoted 2 papers about 2 months ago

CheXWorld: Exploring Image World Modeling for Radiograph Representation Learning

Paper • 2504.13820 • Published Apr 18 • 17

Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?

Paper • 2504.13837 • Published Apr 18 • 127

upvoted a paper 3 months ago

ART: Anonymous Region Transformer for Variable Multi-Layer Transparent Image Generation

Paper • 2502.18364 • Published Feb 25 • 36

upvoted a paper 4 months ago

Building A Proof-Oriented Programmer That Is 64% Better Than GPT-4o Under Data Scarsity

Paper • 2502.11901 • Published Feb 17 • 6

upvoted 2 papers 7 months ago

DeeR-VLA: Dynamic Inference of Multimodal Large Language Models for Efficient Robot Execution

Paper • 2411.02359 • Published Nov 4, 2024 • 13

How Far is Video Generation from World Model: A Physical Law Perspective

Paper • 2411.02385 • Published Nov 4, 2024 • 36

upvoted a paper 8 months ago

LLM-based Optimization of Compound AI Systems: A Survey

Paper • 2410.16392 • Published Oct 21, 2024 • 16

upvoted a paper 11 months ago

Model Surgery: Modulating LLM's Behavior Via Simple Parameter Editing

Paper • 2407.08770 • Published Jul 11, 2024 • 21

upvoted a paper 12 months ago

Multimodal Needle in a Haystack: Benchmarking Long-Context Capability of Multimodal Large Language Models

Paper • 2406.11230 • Published Jun 17, 2024 • 35

upvoted a paper about 1 year ago

DiveR-CT: Diversity-enhanced Red Teaming with Relaxing Constraints

Paper • 2405.19026 • Published May 29, 2024 • 8

upvoted 4 papers over 1 year ago

Avalon's Game of Thoughts: Battle Against Deception through Recursive Contemplation

Paper • 2310.01320 • Published Oct 2, 2023 • 9