SynthRL-A-MMK12-8K-7B

This model is trained based on Qwen/Qwen2.5-VL-7B-Instruct using the EasyR1 framework.

Model Details

Base Model: Qwen/Qwen2.5-VL-7B-Instruct
Training Framework: EasyR1
Training Algorithm: GRPO
Training Data: A-MMK12-8K

Usage

System Prompt

You are a helpful assistant.

Instruction Template

{{ content | trim }} You FIRST think about the reasoning process as an internal monologue and then provide the final answer. The reasoning process MUST BE enclosed within <think> </think> tags. The final answer MUST BE put in \boxed{}.

Citation

If you find this model useful, please cite our paper:

@misc{wu2025synthrl,
     title={SynthRL: Scaling Visual Reasoning with Verifiable Data Synthesis}, 
     author={Zijian Wu and Jinjie Ni and Xiangyan Liu and Zichen Liu and Hang Yan and Michael Qizhe Shieh},
     year={2025},
     eprint={2506.02096},
     archivePrefix={arXiv},
     primaryClass={cs.LG},
     url={https://arxiv.org/abs/2506.02096}, 
}

Jakumetsu
/

SynthRL-A-MMK12-8K-7B

SynthRL-A-MMK12-8K-7B

Model Details

Usage

System Prompt

Instruction Template

Citation

Model tree for Jakumetsu/SynthRL-A-MMK12-8K-7B

Collection including Jakumetsu/SynthRL-A-MMK12-8K-7B

SynthRL