-
LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale
Paper • 2208.07339 • Published • 5 -
GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers
Paper • 2210.17323 • Published • 8 -
SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models
Paper • 2211.10438 • Published • 6 -
QLoRA: Efficient Finetuning of Quantized LLMs
Paper • 2305.14314 • Published • 55
Collections
Discover the best community collections!
Collections including paper arxiv:2310.11453
-
microsoft/bitnet-b1.58-2B-4T
Text Generation • 0.8B • Updated • 5.03k • 1.15k -
microsoft/bitnet-b1.58-2B-4T-bf16
Text Generation • 2B • Updated • 2.21k • 33 -
microsoft/bitnet-b1.58-2B-4T-gguf
Text Generation • 2B • Updated • 4.3k • 192 -
BitNet b1.58 2B4T Technical Report
Paper • 2504.12285 • Published • 74
-
1.58-bit FLUX
Paper • 2412.18653 • Published • 85 -
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
Paper • 2402.17764 • Published • 624 -
BitNet a4.8: 4-bit Activations for 1-bit LLMs
Paper • 2411.04965 • Published • 69 -
BitNet: Scaling 1-bit Transformers for Large Language Models
Paper • 2310.11453 • Published • 104
-
Self-Play Preference Optimization for Language Model Alignment
Paper • 2405.00675 • Published • 28 -
FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness
Paper • 2205.14135 • Published • 15 -
Attention Is All You Need
Paper • 1706.03762 • Published • 79 -
FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning
Paper • 2307.08691 • Published • 9
-
DeepSeek-R1 Thoughtology: Let's <think> about LLM Reasoning
Paper • 2504.07128 • Published • 86 -
Byte Latent Transformer: Patches Scale Better Than Tokens
Paper • 2412.09871 • Published • 109 -
BitNet b1.58 2B4T Technical Report
Paper • 2504.12285 • Published • 74 -
FAST: Efficient Action Tokenization for Vision-Language-Action Models
Paper • 2501.09747 • Published • 25
-
Will we run out of data? An analysis of the limits of scaling datasets in Machine Learning
Paper • 2211.04325 • Published • 1 -
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Paper • 1810.04805 • Published • 20 -
On the Opportunities and Risks of Foundation Models
Paper • 2108.07258 • Published • 1 -
Super-NaturalInstructions: Generalization via Declarative Instructions on 1600+ NLP Tasks
Paper • 2204.07705 • Published • 2
-
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
Paper • 2402.17764 • Published • 624 -
BitNet: Scaling 1-bit Transformers for Large Language Models
Paper • 2310.11453 • Published • 104 -
Mixture-of-Depths: Dynamically allocating compute in transformer-based language models
Paper • 2404.02258 • Published • 107 -
TransformerFAM: Feedback attention is working memory
Paper • 2404.09173 • Published • 44
-
LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale
Paper • 2208.07339 • Published • 5 -
GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers
Paper • 2210.17323 • Published • 8 -
SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models
Paper • 2211.10438 • Published • 6 -
QLoRA: Efficient Finetuning of Quantized LLMs
Paper • 2305.14314 • Published • 55
-
DeepSeek-R1 Thoughtology: Let's <think> about LLM Reasoning
Paper • 2504.07128 • Published • 86 -
Byte Latent Transformer: Patches Scale Better Than Tokens
Paper • 2412.09871 • Published • 109 -
BitNet b1.58 2B4T Technical Report
Paper • 2504.12285 • Published • 74 -
FAST: Efficient Action Tokenization for Vision-Language-Action Models
Paper • 2501.09747 • Published • 25
-
microsoft/bitnet-b1.58-2B-4T
Text Generation • 0.8B • Updated • 5.03k • 1.15k -
microsoft/bitnet-b1.58-2B-4T-bf16
Text Generation • 2B • Updated • 2.21k • 33 -
microsoft/bitnet-b1.58-2B-4T-gguf
Text Generation • 2B • Updated • 4.3k • 192 -
BitNet b1.58 2B4T Technical Report
Paper • 2504.12285 • Published • 74
-
Will we run out of data? An analysis of the limits of scaling datasets in Machine Learning
Paper • 2211.04325 • Published • 1 -
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Paper • 1810.04805 • Published • 20 -
On the Opportunities and Risks of Foundation Models
Paper • 2108.07258 • Published • 1 -
Super-NaturalInstructions: Generalization via Declarative Instructions on 1600+ NLP Tasks
Paper • 2204.07705 • Published • 2
-
1.58-bit FLUX
Paper • 2412.18653 • Published • 85 -
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
Paper • 2402.17764 • Published • 624 -
BitNet a4.8: 4-bit Activations for 1-bit LLMs
Paper • 2411.04965 • Published • 69 -
BitNet: Scaling 1-bit Transformers for Large Language Models
Paper • 2310.11453 • Published • 104
-
Self-Play Preference Optimization for Language Model Alignment
Paper • 2405.00675 • Published • 28 -
FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness
Paper • 2205.14135 • Published • 15 -
Attention Is All You Need
Paper • 1706.03762 • Published • 79 -
FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning
Paper • 2307.08691 • Published • 9
-
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
Paper • 2402.17764 • Published • 624 -
BitNet: Scaling 1-bit Transformers for Large Language Models
Paper • 2310.11453 • Published • 104 -
Mixture-of-Depths: Dynamically allocating compute in transformer-based language models
Paper • 2404.02258 • Published • 107 -
TransformerFAM: Feedback attention is working memory
Paper • 2404.09173 • Published • 44