-
Will we run out of data? An analysis of the limits of scaling datasets in Machine Learning
Paper • 2211.04325 • Published • 1 -
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Paper • 1810.04805 • Published • 20 -
On the Opportunities and Risks of Foundation Models
Paper • 2108.07258 • Published • 1 -
Super-NaturalInstructions: Generalization via Declarative Instructions on 1600+ NLP Tasks
Paper • 2204.07705 • Published • 2
Collections
Discover the best community collections!
Collections including paper arxiv:2212.05055
-
Beyond A*: Better Planning with Transformers via Search Dynamics Bootstrapping
Paper • 2402.14083 • Published • 49 -
GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints
Paper • 2305.13245 • Published • 6 -
Training a T5 Using Lab-sized Resources
Paper • 2208.12097 • Published • 1 -
Sparse Upcycling: Training Mixture-of-Experts from Dense Checkpoints
Paper • 2212.05055 • Published • 5
-
Robust Mixture-of-Expert Training for Convolutional Neural Networks
Paper • 2308.10110 • Published • 2 -
HyperFormer: Enhancing Entity and Relation Interaction for Hyper-Relational Knowledge Graph Completion
Paper • 2308.06512 • Published • 2 -
Mobile V-MoEs: Scaling Down Vision Transformers via Sparse Mixture-of-Experts
Paper • 2309.04354 • Published • 15 -
Sparse Upcycling: Training Mixture-of-Experts from Dense Checkpoints
Paper • 2212.05055 • Published • 5
-
Lumiere: A Space-Time Diffusion Model for Video Generation
Paper • 2401.12945 • Published • 86 -
Long-form factuality in large language models
Paper • 2403.18802 • Published • 27 -
ObjectDrop: Bootstrapping Counterfactuals for Photorealistic Object Removal and Insertion
Paper • 2403.18818 • Published • 29 -
TC4D: Trajectory-Conditioned Text-to-4D Generation
Paper • 2403.17920 • Published • 18
-
Larimar: Large Language Models with Episodic Memory Control
Paper • 2403.11901 • Published • 34 -
Sparse Upcycling: Training Mixture-of-Experts from Dense Checkpoints
Paper • 2212.05055 • Published • 5 -
Mixture-of-Depths: Dynamically allocating compute in transformer-based language models
Paper • 2404.02258 • Published • 107 -
Multi-Head Mixture-of-Experts
Paper • 2404.15045 • Published • 61
-
Adaptive sequential Monte Carlo by means of mixture of experts
Paper • 1108.2836 • Published • 2 -
Convergence Rates for Mixture-of-Experts
Paper • 1110.2058 • Published • 2 -
Multi-view Contrastive Learning for Entity Typing over Knowledge Graphs
Paper • 2310.12008 • Published • 2 -
Enhancing NeRF akin to Enhancing LLMs: Generalizable NeRF Transformer with Mixture-of-View-Experts
Paper • 2308.11793 • Published • 2
-
Will we run out of data? An analysis of the limits of scaling datasets in Machine Learning
Paper • 2211.04325 • Published • 1 -
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Paper • 1810.04805 • Published • 20 -
On the Opportunities and Risks of Foundation Models
Paper • 2108.07258 • Published • 1 -
Super-NaturalInstructions: Generalization via Declarative Instructions on 1600+ NLP Tasks
Paper • 2204.07705 • Published • 2
-
Lumiere: A Space-Time Diffusion Model for Video Generation
Paper • 2401.12945 • Published • 86 -
Long-form factuality in large language models
Paper • 2403.18802 • Published • 27 -
ObjectDrop: Bootstrapping Counterfactuals for Photorealistic Object Removal and Insertion
Paper • 2403.18818 • Published • 29 -
TC4D: Trajectory-Conditioned Text-to-4D Generation
Paper • 2403.17920 • Published • 18
-
Beyond A*: Better Planning with Transformers via Search Dynamics Bootstrapping
Paper • 2402.14083 • Published • 49 -
GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints
Paper • 2305.13245 • Published • 6 -
Training a T5 Using Lab-sized Resources
Paper • 2208.12097 • Published • 1 -
Sparse Upcycling: Training Mixture-of-Experts from Dense Checkpoints
Paper • 2212.05055 • Published • 5
-
Larimar: Large Language Models with Episodic Memory Control
Paper • 2403.11901 • Published • 34 -
Sparse Upcycling: Training Mixture-of-Experts from Dense Checkpoints
Paper • 2212.05055 • Published • 5 -
Mixture-of-Depths: Dynamically allocating compute in transformer-based language models
Paper • 2404.02258 • Published • 107 -
Multi-Head Mixture-of-Experts
Paper • 2404.15045 • Published • 61
-
Robust Mixture-of-Expert Training for Convolutional Neural Networks
Paper • 2308.10110 • Published • 2 -
HyperFormer: Enhancing Entity and Relation Interaction for Hyper-Relational Knowledge Graph Completion
Paper • 2308.06512 • Published • 2 -
Mobile V-MoEs: Scaling Down Vision Transformers via Sparse Mixture-of-Experts
Paper • 2309.04354 • Published • 15 -
Sparse Upcycling: Training Mixture-of-Experts from Dense Checkpoints
Paper • 2212.05055 • Published • 5
-
Adaptive sequential Monte Carlo by means of mixture of experts
Paper • 1108.2836 • Published • 2 -
Convergence Rates for Mixture-of-Experts
Paper • 1110.2058 • Published • 2 -
Multi-view Contrastive Learning for Entity Typing over Knowledge Graphs
Paper • 2310.12008 • Published • 2 -
Enhancing NeRF akin to Enhancing LLMs: Generalizable NeRF Transformer with Mixture-of-View-Experts
Paper • 2308.11793 • Published • 2