Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2212.05055

A collection of arXiv papers from Chip Huyen's AI Engineering organized by chapter and ordered by when each appears in the book.

Will we run out of data? An analysis of the limits of scaling datasets in Machine Learning

Paper • 2211.04325 • Published Oct 26, 2022 • 1
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Paper • 1810.04805 • Published Oct 11, 2018 • 20
On the Opportunities and Risks of Foundation Models

Paper • 2108.07258 • Published Aug 16, 2021 • 1
Super-NaturalInstructions: Generalization via Declarative Instructions on 1600+ NLP Tasks

Paper • 2204.07705 • Published Apr 16, 2022 • 2

Papers - University of Georgia Tech

Sparse Upcycling: Training Mixture-of-Experts from Dense Checkpoints

Paper • 2212.05055 • Published Dec 9, 2022 • 5
Social Skill Training with Large Language Models

Paper • 2404.04204 • Published Apr 5, 2024 • 16

Upcycling Papers

Sparse Upcycling: Training Mixture-of-Experts from Dense Checkpoints

Paper • 2212.05055 • Published Dec 9, 2022 • 5
Parameter-Efficient Sparsity Crafting from Dense to Mixture-of-Experts for Instruction Tuning on General Tasks

Paper • 2401.02731 • Published Jan 5, 2024 • 2

Beyond A*: Better Planning with Transformers via Search Dynamics Bootstrapping

Paper • 2402.14083 • Published Feb 21, 2024 • 49
GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints

Paper • 2305.13245 • Published May 22, 2023 • 6
Training a T5 Using Lab-sized Resources

Paper • 2208.12097 • Published Aug 25, 2022 • 1
Sparse Upcycling: Training Mixture-of-Experts from Dense Checkpoints

Paper • 2212.05055 • Published Dec 9, 2022 • 5

Papers - Image - MoE

Robust Mixture-of-Expert Training for Convolutional Neural Networks

Paper • 2308.10110 • Published Aug 19, 2023 • 2
HyperFormer: Enhancing Entity and Relation Interaction for Hyper-Relational Knowledge Graph Completion

Paper • 2308.06512 • Published Aug 12, 2023 • 2
Mobile V-MoEs: Scaling Down Vision Transformers via Sparse Mixture-of-Experts

Paper • 2309.04354 • Published Sep 8, 2023 • 15
Sparse Upcycling: Training Mixture-of-Experts from Dense Checkpoints

Paper • 2212.05055 • Published Dec 9, 2022 • 5

Model Architecture

Sparse Upcycling: Training Mixture-of-Experts from Dense Checkpoints

Paper • 2212.05055 • Published Dec 9, 2022 • 5

Papers - T5 - MoE

Sparse Upcycling: Training Mixture-of-Experts from Dense Checkpoints

Paper • 2212.05055 • Published Dec 9, 2022 • 5

Papers - Google

Lumiere: A Space-Time Diffusion Model for Video Generation

Paper • 2401.12945 • Published Jan 23, 2024 • 86
Long-form factuality in large language models

Paper • 2403.18802 • Published Mar 27, 2024 • 27
ObjectDrop: Bootstrapping Counterfactuals for Photorealistic Object Removal and Insertion

Paper • 2403.18818 • Published Mar 27, 2024 • 29
TC4D: Trajectory-Conditioned Text-to-4D Generation

Paper • 2403.17920 • Published Mar 26, 2024 • 18

Larimar: Large Language Models with Episodic Memory Control

Paper • 2403.11901 • Published Mar 18, 2024 • 34
Sparse Upcycling: Training Mixture-of-Experts from Dense Checkpoints

Paper • 2212.05055 • Published Dec 9, 2022 • 5
Mixture-of-Depths: Dynamically allocating compute in transformer-based language models

Paper • 2404.02258 • Published Apr 2, 2024 • 107
Multi-Head Mixture-of-Experts

Paper • 2404.15045 • Published Apr 23, 2024 • 61

Papers - MoE - Research

Adaptive sequential Monte Carlo by means of mixture of experts

Paper • 1108.2836 • Published Aug 14, 2011 • 2
Convergence Rates for Mixture-of-Experts

Paper • 1110.2058 • Published Oct 10, 2011 • 2
Multi-view Contrastive Learning for Entity Typing over Knowledge Graphs

Paper • 2310.12008 • Published Oct 18, 2023 • 2
Enhancing NeRF akin to Enhancing LLMs: Generalizable NeRF Transformer with Mixture-of-View-Experts

Paper • 2308.11793 • Published Aug 22, 2023 • 2

A collection of arXiv papers from Chip Huyen's AI Engineering organized by chapter and ordered by when each appears in the book.

Will we run out of data? An analysis of the limits of scaling datasets in Machine Learning

Paper • 2211.04325 • Published Oct 26, 2022 • 1
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Paper • 1810.04805 • Published Oct 11, 2018 • 20
On the Opportunities and Risks of Foundation Models

Paper • 2108.07258 • Published Aug 16, 2021 • 1
Super-NaturalInstructions: Generalization via Declarative Instructions on 1600+ NLP Tasks

Paper • 2204.07705 • Published Apr 16, 2022 • 2

Model Architecture

Sparse Upcycling: Training Mixture-of-Experts from Dense Checkpoints

Paper • 2212.05055 • Published Dec 9, 2022 • 5

Papers - University of Georgia Tech

Sparse Upcycling: Training Mixture-of-Experts from Dense Checkpoints

Paper • 2212.05055 • Published Dec 9, 2022 • 5
Social Skill Training with Large Language Models

Paper • 2404.04204 • Published Apr 5, 2024 • 16

Papers - T5 - MoE

Sparse Upcycling: Training Mixture-of-Experts from Dense Checkpoints

Paper • 2212.05055 • Published Dec 9, 2022 • 5

Upcycling Papers

Sparse Upcycling: Training Mixture-of-Experts from Dense Checkpoints

Paper • 2212.05055 • Published Dec 9, 2022 • 5
Parameter-Efficient Sparsity Crafting from Dense to Mixture-of-Experts for Instruction Tuning on General Tasks

Paper • 2401.02731 • Published Jan 5, 2024 • 2

Papers - Google

Lumiere: A Space-Time Diffusion Model for Video Generation

Paper • 2401.12945 • Published Jan 23, 2024 • 86
Long-form factuality in large language models

Paper • 2403.18802 • Published Mar 27, 2024 • 27
ObjectDrop: Bootstrapping Counterfactuals for Photorealistic Object Removal and Insertion

Paper • 2403.18818 • Published Mar 27, 2024 • 29
TC4D: Trajectory-Conditioned Text-to-4D Generation

Paper • 2403.17920 • Published Mar 26, 2024 • 18

Beyond A*: Better Planning with Transformers via Search Dynamics Bootstrapping

Paper • 2402.14083 • Published Feb 21, 2024 • 49
GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints

Paper • 2305.13245 • Published May 22, 2023 • 6
Training a T5 Using Lab-sized Resources

Paper • 2208.12097 • Published Aug 25, 2022 • 1
Sparse Upcycling: Training Mixture-of-Experts from Dense Checkpoints

Paper • 2212.05055 • Published Dec 9, 2022 • 5

Larimar: Large Language Models with Episodic Memory Control

Paper • 2403.11901 • Published Mar 18, 2024 • 34
Sparse Upcycling: Training Mixture-of-Experts from Dense Checkpoints

Paper • 2212.05055 • Published Dec 9, 2022 • 5
Mixture-of-Depths: Dynamically allocating compute in transformer-based language models

Paper • 2404.02258 • Published Apr 2, 2024 • 107
Multi-Head Mixture-of-Experts

Paper • 2404.15045 • Published Apr 23, 2024 • 61

Papers - Image - MoE

Robust Mixture-of-Expert Training for Convolutional Neural Networks

Paper • 2308.10110 • Published Aug 19, 2023 • 2
HyperFormer: Enhancing Entity and Relation Interaction for Hyper-Relational Knowledge Graph Completion

Paper • 2308.06512 • Published Aug 12, 2023 • 2
Mobile V-MoEs: Scaling Down Vision Transformers via Sparse Mixture-of-Experts

Paper • 2309.04354 • Published Sep 8, 2023 • 15
Sparse Upcycling: Training Mixture-of-Experts from Dense Checkpoints

Paper • 2212.05055 • Published Dec 9, 2022 • 5

Papers - MoE - Research

Adaptive sequential Monte Carlo by means of mixture of experts

Paper • 1108.2836 • Published Aug 14, 2011 • 2
Convergence Rates for Mixture-of-Experts

Paper • 1110.2058 • Published Oct 10, 2011 • 2
Multi-view Contrastive Learning for Entity Typing over Knowledge Graphs

Paper • 2310.12008 • Published Oct 18, 2023 • 2
Enhancing NeRF akin to Enhancing LLMs: Generalizable NeRF Transformer with Mixture-of-View-Experts

Paper • 2308.11793 • Published Aug 22, 2023 • 2

Previous
1
2
Next

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs