Spaces:
Running
Running
metadata
title: README
emoji: π
colorFrom: green
colorTo: purple
sdk: static
pinned: false
Fine-grain evaluation & Large Reasoning Models that fails in reasoning due to reasoning rigidity.
ConditionedMath (AIME & MATH500) Β· PuzzleTrivial Β· Zero-shot pipelines
π Why ReasoningTrap?
Current RL-tuned Reasoning LLMs excel at producing answers but often ignore explicit user constraints.
ReasoningTrap surfaces these failure modes with carefully crafted, conditioned problems.
- Modified from Famous MATH Reasoning Benchmark β AIME & MATH500 problems altered with minimal constraints to divert reasoning paths.
- Puzzles Trivialized by Subtle Modifications - Well-known puzzles where a small change transforms a challenging problem into a trivial one.
- Plug-and-play β evaluate any π€ Transformers model with vLLM in simple instructions.