-
The Impact of Depth and Width on Transformer Language Model Generalization
Paper • 2310.19956 • Published • 10 -
Retentive Network: A Successor to Transformer for Large Language Models
Paper • 2307.08621 • Published • 171 -
RWKV: Reinventing RNNs for the Transformer Era
Paper • 2305.13048 • Published • 19 -
Attention Is All You Need
Paper • 1706.03762 • Published • 64
Collections
Discover the best community collections!
Collections including paper arxiv:2402.18668
-
Efficient Memory Management for Large Language Model Serving with PagedAttention
Paper • 2309.06180 • Published • 25 -
LM-Infinite: Simple On-the-Fly Length Generalization for Large Language Models
Paper • 2308.16137 • Published • 40 -
Scaling Transformer to 1M tokens and beyond with RMT
Paper • 2304.11062 • Published • 3 -
DeepSpeed Ulysses: System Optimizations for Enabling Training of Extreme Long Sequence Transformer Models
Paper • 2309.14509 • Published • 19