gijs commited on
Commit
01a7248
·
verified ·
1 Parent(s): 19d9241

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -0
README.md CHANGED
@@ -19,6 +19,8 @@ datasets:
19
 
20
  # AudSemThinker-QA-GRPO
21
 
 
 
22
  ## Model Description
23
  `AudSemThinker-QA-GRPO` is an advanced variant of `AudSemThinker`, fine-tuned using Group Relative Policy Optimization (GRPO) with Verifiable Rewards (RLVR). This approach enhances reasoning capabilities and allows for controlled thinking budget during generation. It leverages the structured reasoning framework of `AudSemThinker` (thinking, semantic elements, answer phases) but is specifically optimized for multiple-choice audio question answering. This model is designed to produce accurate answers while maintaining a controlled reasoning length in its `<think>` section.
24
 
 
19
 
20
  # AudSemThinker-QA-GRPO
21
 
22
+ Corresponding paper: https://arxiv.org/abs/2505.14142
23
+
24
  ## Model Description
25
  `AudSemThinker-QA-GRPO` is an advanced variant of `AudSemThinker`, fine-tuned using Group Relative Policy Optimization (GRPO) with Verifiable Rewards (RLVR). This approach enhances reasoning capabilities and allows for controlled thinking budget during generation. It leverages the structured reasoning framework of `AudSemThinker` (thinking, semantic elements, answer phases) but is specifically optimized for multiple-choice audio question answering. This model is designed to produce accurate answers while maintaining a controlled reasoning length in its `<think>` section.
26