Spaces:
Running
Running
title: README | |
emoji: π | |
colorFrom: green | |
colorTo: purple | |
sdk: static | |
pinned: false | |
<!-- Banner -------------------------------------------------------------- --> | |
<p align="center"> | |
<b>Fine-grain evaluation & Large Reasoning Models that <i>fails in reasoning</i> due to <i>reasoning rigidity</i>.</b><br/> | |
ConditionedMath (AIME & MATH500) Β· PuzzleTrivial Β· Zero-shot pipelines | |
</p> | |
--- | |
## π Why ReasoningTrap? | |
> Current RL-tuned Reasoning LLMs excel at *producing* answers but often ignore explicit user constraints. | |
> **ReasoningTrap** surfaces these failure modes with carefully crafted, *conditioned* problems. | |
* **Modified from Famous MATH Reasoning Benchmark** β AIME & MATH500 problems altered with minimal constraints to divert reasoning paths. | |
* **Puzzles Trivialized by Subtle Modifications** - Well-known puzzles where a small change transforms a challenging problem into a trivial one. | |
* **Plug-and-play** β evaluate any π€ Transformers model with vLLM in simple instructions. | |