Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2402.18668

LLM architecture

The Impact of Depth and Width on Transformer Language Model Generalization

Paper • 2310.19956 • Published Oct 30, 2023 • 10
Retentive Network: A Successor to Transformer for Large Language Models

Paper • 2307.08621 • Published Jul 17, 2023 • 171
RWKV: Reinventing RNNs for the Transformer Era

Paper • 2305.13048 • Published May 22, 2023 • 19
Attention Is All You Need

Paper • 1706.03762 • Published Jun 12, 2017 • 64

Efficient Memory Management for Large Language Model Serving with PagedAttention

Paper • 2309.06180 • Published Sep 12, 2023 • 25
LM-Infinite: Simple On-the-Fly Length Generalization for Large Language Models

Paper • 2308.16137 • Published Aug 30, 2023 • 40
Scaling Transformer to 1M tokens and beyond with RMT

Paper • 2304.11062 • Published Apr 19, 2023 • 3
DeepSpeed Ulysses: System Optimizations for Enabling Training of Extreme Long Sequence Transformer Models

Paper • 2309.14509 • Published Sep 25, 2023 • 19

Previous
1
2
Next

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs