SynthRL-A-MMK12-8K-7B

This model is trained based on Qwen/Qwen2.5-VL-7B-Instruct using the EasyR1 framework.

Model Details

  • Base Model: Qwen/Qwen2.5-VL-7B-Instruct
  • Training Framework: EasyR1
  • Training Algorithm: GRPO
  • Training Data: A-MMK12-8K

Usage

System Prompt

You are a helpful assistant.

Instruction Template

{{ content | trim }} You FIRST think about the reasoning process as an internal monologue and then provide the final answer. The reasoning process MUST BE enclosed within <think> </think> tags. The final answer MUST BE put in \boxed{}.

Citation

If you find this model useful, please cite our paper:

@misc{wu2025synthrl,
     title={SynthRL: Scaling Visual Reasoning with Verifiable Data Synthesis}, 
     author={Zijian Wu and Jinjie Ni and Xiangyan Liu and Zichen Liu and Hang Yan and Michael Qizhe Shieh},
     year={2025},
     eprint={2506.02096},
     archivePrefix={arXiv},
     primaryClass={cs.LG},
     url={https://arxiv.org/abs/2506.02096}, 
}
Downloads last month
11
Safetensors
Model size
8.29B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Jakumetsu/SynthRL-A-MMK12-8K-7B

Quantizations
1 model

Collection including Jakumetsu/SynthRL-A-MMK12-8K-7B