Adam Yanxiao Zhao's picture

1 13 8

Adam Yanxiao Zhao

sdpkjc

·

https://sdpkjc.com

AI & ML interests

Reinforcement Learning

Recent Activity

new activity about 5 hours ago

Qwen/Qwen3-1.7B:Fix chat template in case of multiple assistant messages and no thinking

updated a model 16 days ago

sdpkjc/Qwen2.5-0.5B-SFT-24quiz-checkpoint-800

published a model 16 days ago

sdpkjc/Qwen2.5-0.5B-SFT-24quiz-checkpoint-800

View all activity

Organizations

Collections 1

Papers 2

arxiv:2403.00673

arxiv:2402.03046

models 100

sdpkjc/Qwen2.5-0.5B-SFT-24quiz-checkpoint-800

Text Generation • Updated 16 days ago • 2

sdpkjc/Qwen2.5-0.5B-SFT-24quiz-checkpoint-300

Text Generation • Updated 16 days ago • 2

sdpkjc/Qwen2.5-1.5B-Instruct-FT-DPO

Text Generation • Updated Jan 22 • 35

sdpkjc/SmolLM2-FT-DPO

Text Generation • Updated Jan 22 • 11

sdpkjc/SmolLM2-FT-MyDataset

Text Generation • Updated Jan 21 • 8

sdpkjc/Ant-v4-ppo_fix_continuous_action-seed5

Reinforcement Learning • Updated Jan 20, 2024

sdpkjc/Ant-v4-ppo_fix_continuous_action-seed4

Reinforcement Learning • Updated Jan 20, 2024

sdpkjc/Ant-v4-ppo_fix_continuous_action-seed3

Reinforcement Learning • Updated Jan 20, 2024

sdpkjc/Ant-v4-ppo_fix_continuous_action-seed2

Reinforcement Learning • Updated Jan 20, 2024

sdpkjc/Ant-v4-ppo_fix_continuous_action-seed1

Reinforcement Learning • Updated Jan 20, 2024

datasets 17

sdpkjc/24problems_quiz-eval-n4-1-10-24

Viewer • Updated 16 days ago • 55.5k • 66

sdpkjc/24problems_quiz-eval-5

Viewer • Updated 16 days ago • 100k • 78

sdpkjc/24problems_quiz

Viewer • Updated 16 days ago • 85.6k • 158

sdpkjc/SATQuest-RFT-3k

Viewer • Updated Apr 27 • 3k • 19

sdpkjc/SATQuest-RFT-1k

Viewer • Updated Apr 23 • 1k • 9

sdpkjc/SATQuest-Tiny

Viewer • Updated Apr 20 • 10 • 13

sdpkjc/SATQuest

Viewer • Updated Apr 17 • 140 • 16

sdpkjc/SATQuest-G

Viewer • Updated Mar 28 • 963 • 14

sdpkjc/NumBase-N01-S2g-B2g

Viewer • Updated Feb 26 • 983k • 22

sdpkjc/NumBase-N01-S2g-B28

Viewer • Updated Feb 26 • 459k • 22