qwen-3b-reasoning / README.md
TheJoeZenOne's picture
Trained with Unsloth
b2103b4 verified
metadata
license: apache-2.0
base_model:
  - Qwen/Qwen2.5-VL-3B-Instruct
tags:
  - unsloth
  - trl
  - grpo