Edit Models filters

Apps

Docker Model Runner

Inference Providers

HF Inference API

Misc

arxiv: 2402.03300

Inference Endpoints

text-generation-inference

4-bit precision

8-bit precision

text-embeddings-inference

Carbon Emissions

Mixture of Experts

Models

7,535

Full-text search

Active filters: 2402.03300

spinech/qwen2.5-3b-r1-arc-train-synthetic

Text Generation • 3B • Updated Feb 4 • 4

laolaorkk/Qwen2.5-1.5B-R1-GRPO-debug

Text Generation • 2B • Updated Feb 6 • 2

Dongwei/DeepSeek-R1-Distill-Qwen-7B-GRPO_Math

Text Generation • 8B • Updated Feb 4 • 8

Dongwei/Qwen-2.5-7B_Math

Text Generation • 8B • Updated Feb 4 • 2

Dongwei/Qwen2.5-1.5B-Open-R1-GRPO_Math

Text Generation • 2B • Updated Feb 3 • 2

Dongwei/DeepSeek-R1-Distill-Qwen-1.5B-GRPO_Math

Text Generation • 2B • Updated Feb 3 • 3

skzxjus/Qwen2.5-7B-Open-R1-GRPO

Text Generation • 8B • Updated Feb 8 • 2

AndreasX1206/Qwen2-0.5B-countdown

Text Generation • 0.5B • Updated Feb 4 • 3 •

alicogniai/Qwen2.5-1.5B-Open-R1-GRPO

Text Generation • 2B • Updated Feb 16 • 2

ununtrium/Qwen2.5-1.5B-Open-R1-GRPO

Text Generation • 2B • Updated Feb 11 • 2

yuta0x89/llmjp13b-numinacot-epoch2-GRPO

Text Generation • 14B • Updated Feb 11 • 2

yeshsurya/Qwen2.5-7B-Math-with_50stepGRPO

Text Generation • 8B • Updated Feb 12 • 3

hyunw3/qwen-2.5-0.5b-r1-countdown_lr5e-6

Text Generation • 0.5B • Updated Jun 3 • 7

khuang2/qwen-2.5-3b-r1-countdown

Text Generation • 3B • Updated Feb 5 • 7 • 2

spinech/qwen2.5-3b-r1-arc-train-thinker

Text Generation • 3B • Updated Feb 5 • 3 • 1

Dongwei/DeepSeek-R1-Distill-Qwen-7B-GRPO_Math_lowlr

Text Generation • 8B • Updated Feb 4 • 1

Dongwei/Qwen-2.5-7B_Math_smalllr

Text Generation • 8B • Updated Feb 4 • 1

Dongwei/Qwen2.5-1.5B-Open-R1-GRPO_Math_smalllr

Text Generation • 2B • Updated Feb 4 • 2

Dongwei/DeepSeek-R1-Distill-Qwen-1.5B-GRPO_Math_smalllr

Text Generation • 2B • Updated Feb 4 • 1

Dongwei/Qwen-2.5-7B_Base_Math_smalllr

Text Generation • 8B • Updated Feb 5 • 1 • 6

jeremierostan/qwen-guiding-question

May811/Qwen2.5-1.5B-Open-R1-GRPO

Text Generation • 2B • Updated Feb 15 • 2

spinech/qwen2.5-3b-r1-arc-train-thinker-2

Text Generation • 3B • Updated Feb 5 • 2

Dongwei/Qwen-2.5-7B_Base_Math_smallestlr

Text Generation • 8B • Updated Feb 11 • 2

Dongwei/Qwen-2.5-7B_Base_Math_smallestlr_newdata

Text Generation • 8B • Updated Feb 5 • 2

sohyunan/gemma-2-2b-it_controller-grpo

Text Generation • 3B • Updated Feb 6 • 1

zzhang1987/Qwen2.5-VL-3B-Instruct-Open-R1-Distill

Image-to-Text • 4B • Updated Mar 13 • 10

rzhao17/qwen-2.5-3b-r1-countdown

Text Generation • 3B • Updated Feb 7 • 7

schwamaths/Qwen2.5-1.5B-Open-R1-GRPO

Text Generation • 2B • Updated Feb 5 • 2

Chris126/qwen-r1-aha-moment