view article Article What is test-time compute and how to scale it? By Kseniase and 1 other • Feb 6 • 89
Fast Transformer Decoding: One Write-Head is All You Need Paper • 1911.02150 • Published Nov 6, 2019 • 6