Spaces:

ReasoningTrap
/

README

Running

README / README.md

Update README.md

d30ba4f verified 3 months ago

1.03 kB

	---
	title: README
	emoji: 👁
	colorFrom: green
	colorTo: purple
	sdk: static
	pinned: false
	---

	<!-- Banner -------------------------------------------------------------- -->
	<p align="center">
	<b>Fine-grain evaluation & Large Reasoning Models that <i>fails in reasoning</i> due to <i>reasoning rigidity</i>.</b><br/>
	ConditionedMath (AIME & MATH500) · PuzzleTrivial · Zero-shot pipelines
	</p>

	---

	## 📜 Why ReasoningTrap?

	> Current RL-tuned Reasoning LLMs excel at producing answers but often ignore explicit user constraints.
	> ReasoningTrap surfaces these failure modes with carefully crafted, conditioned problems.
	* Modified from Famous MATH Reasoning Benchmark – AIME & MATH500 problems altered with minimal constraints to divert reasoning paths.
	* Puzzles Trivialized by Subtle Modifications - Well-known puzzles where a small change transforms a challenging problem into a trivial one.
	* Plug-and-play – evaluate any 🤗 Transformers model with vLLM in simple instructions.