Value-Guided Search for Efficient Chain-of-Thought Reasoning Paper • 2505.17373 • Published 15 days ago • 4
Conditioned Language Policy: A General Framework for Steerable Multi-Objective Finetuning Paper • 2407.15762 • Published Jul 22, 2024 • 10