Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2402.18668
LLM architecture
Collection by Jul 18, 2024
7
  • The Impact of Depth and Width on Transformer Language Model Generalization

    Paper • 2310.19956 • Published Oct 30, 2023 • 10
  • Retentive Network: A Successor to Transformer for Large Language Models

    Paper • 2307.08621 • Published Jul 17, 2023 • 171
  • RWKV: Reinventing RNNs for the Transformer Era

    Paper • 2305.13048 • Published May 22, 2023 • 19
  • Attention Is All You Need

    Paper • 1706.03762 • Published Jun 12, 2017 • 64
Attention
Collection by Mar 15
4
  • Efficient Memory Management for Large Language Model Serving with PagedAttention

    Paper • 2309.06180 • Published Sep 12, 2023 • 25
  • LM-Infinite: Simple On-the-Fly Length Generalization for Large Language Models

    Paper • 2308.16137 • Published Aug 30, 2023 • 40
  • Scaling Transformer to 1M tokens and beyond with RMT

    Paper • 2304.11062 • Published Apr 19, 2023 • 3
  • DeepSpeed Ulysses: System Optimizations for Enabling Training of Extreme Long Sequence Transformer Models

    Paper • 2309.14509 • Published Sep 25, 2023 • 19
  • Previous
  • 1
  • 2
  • Next
Company
TOS Privacy About Jobs
Website
Models Datasets Spaces Pricing Docs