Update README.md
Browse files
README.md
CHANGED
@@ -19,6 +19,8 @@ datasets:
|
|
19 |
|
20 |
# AudSemThinker-QA-GRPO
|
21 |
|
|
|
|
|
22 |
## Model Description
|
23 |
`AudSemThinker-QA-GRPO` is an advanced variant of `AudSemThinker`, fine-tuned using Group Relative Policy Optimization (GRPO) with Verifiable Rewards (RLVR). This approach enhances reasoning capabilities and allows for controlled thinking budget during generation. It leverages the structured reasoning framework of `AudSemThinker` (thinking, semantic elements, answer phases) but is specifically optimized for multiple-choice audio question answering. This model is designed to produce accurate answers while maintaining a controlled reasoning length in its `<think>` section.
|
24 |
|
|
|
19 |
|
20 |
# AudSemThinker-QA-GRPO
|
21 |
|
22 |
+
Corresponding paper: https://arxiv.org/abs/2505.14142
|
23 |
+
|
24 |
## Model Description
|
25 |
`AudSemThinker-QA-GRPO` is an advanced variant of `AudSemThinker`, fine-tuned using Group Relative Policy Optimization (GRPO) with Verifiable Rewards (RLVR). This approach enhances reasoning capabilities and allows for controlled thinking budget during generation. It leverages the structured reasoning framework of `AudSemThinker` (thinking, semantic elements, answer phases) but is specifically optimized for multiple-choice audio question answering. This model is designed to produce accurate answers while maintaining a controlled reasoning length in its `<think>` section.
|
26 |
|