README / README.md
jadohu's picture
Update README.md
d30ba4f verified
---
title: README
emoji: πŸ‘
colorFrom: green
colorTo: purple
sdk: static
pinned: false
---
<!-- Banner -------------------------------------------------------------- -->
<p align="center">
<b>Fine-grain evaluation &amp; Large Reasoning Models that <i>fails in reasoning</i> due to <i>reasoning rigidity</i>.</b><br/>
ConditionedMath (AIME &amp; MATH500) Β· PuzzleTrivial Β· Zero-shot pipelines
</p>
---
## πŸ“œ Why ReasoningTrap?
> Current RL-tuned Reasoning LLMs excel at *producing* answers but often ignore explicit user constraints.
> **ReasoningTrap** surfaces these failure modes with carefully crafted, *conditioned* problems.
* **Modified from Famous MATH Reasoning Benchmark** – AIME & MATH500 problems altered with minimal constraints to divert reasoning paths.
* **Puzzles Trivialized by Subtle Modifications** - Well-known puzzles where a small change transforms a challenging problem into a trivial one.
* **Plug-and-play** – evaluate any πŸ€— Transformers model with vLLM in simple instructions.