-
Memory Augmented Language Models through Mixture of Word Experts
Paper • 2311.10768 • Published • 18 -
System 2 Attention (is something you might need too)
Paper • 2311.11829 • Published • 44 -
Fine-tuning Language Models for Factuality
Paper • 2311.08401 • Published • 30 -
Orca 2: Teaching Small Language Models How to Reason
Paper • 2311.11045 • Published • 77
Collections
Discover the best community collections!
Collections including paper arxiv:2401.02412
-
RLHF Workflow: From Reward Modeling to Online RLHF
Paper • 2405.07863 • Published • 72 -
Chameleon: Mixed-Modal Early-Fusion Foundation Models
Paper • 2405.09818 • Published • 131 -
Meteor: Mamba-based Traversal of Rationale for Large Language and Vision Models
Paper • 2405.15574 • Published • 56 -
An Introduction to Vision-Language Modeling
Paper • 2405.17247 • Published • 90
-
Self-Rewarding Language Models
Paper • 2401.10020 • Published • 152 -
ReFT: Reasoning with Reinforced Fine-Tuning
Paper • 2401.08967 • Published • 32 -
Tuning Language Models by Proxy
Paper • 2401.08565 • Published • 24 -
TrustLLM: Trustworthiness in Large Language Models
Paper • 2401.05561 • Published • 70
-
LLM Augmented LLMs: Expanding Capabilities through Composition
Paper • 2401.02412 • Published • 39 -
LongAgent: Scaling Language Models to 128k Context through Multi-Agent Collaboration
Paper • 2402.11550 • Published • 18 -
A Multimodal Automated Interpretability Agent
Paper • 2404.14394 • Published • 22
-
QLoRA: Efficient Finetuning of Quantized LLMs
Paper • 2305.14314 • Published • 55 -
Training Transformers with 4-bit Integers
Paper • 2306.11987 • Published • 22 -
FasterViT: Fast Vision Transformers with Hierarchical Attention
Paper • 2306.06189 • Published • 30 -
DeepSpeed Ulysses: System Optimizations for Enabling Training of Extreme Long Sequence Transformer Models
Paper • 2309.14509 • Published • 20
-
LLM Augmented LLMs: Expanding Capabilities through Composition
Paper • 2401.02412 • Published • 39 -
Generative Representational Instruction Tuning
Paper • 2402.09906 • Published • 55 -
Distilling Step-by-Step! Outperforming Larger Language Models with Less Training Data and Smaller Model Sizes
Paper • 2305.02301 • Published • 5 -
Evolutionary Optimization of Model Merging Recipes
Paper • 2403.13187 • Published • 58
-
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models
Paper • 2401.06066 • Published • 56 -
Mixtral of Experts
Paper • 2401.04088 • Published • 160 -
Blending Is All You Need: Cheaper, Better Alternative to Trillion-Parameters LLM
Paper • 2401.02994 • Published • 52 -
LLM Augmented LLMs: Expanding Capabilities through Composition
Paper • 2401.02412 • Published • 39
-
Memory Augmented Language Models through Mixture of Word Experts
Paper • 2311.10768 • Published • 18 -
System 2 Attention (is something you might need too)
Paper • 2311.11829 • Published • 44 -
Fine-tuning Language Models for Factuality
Paper • 2311.08401 • Published • 30 -
Orca 2: Teaching Small Language Models How to Reason
Paper • 2311.11045 • Published • 77
-
QLoRA: Efficient Finetuning of Quantized LLMs
Paper • 2305.14314 • Published • 55 -
Training Transformers with 4-bit Integers
Paper • 2306.11987 • Published • 22 -
FasterViT: Fast Vision Transformers with Hierarchical Attention
Paper • 2306.06189 • Published • 30 -
DeepSpeed Ulysses: System Optimizations for Enabling Training of Extreme Long Sequence Transformer Models
Paper • 2309.14509 • Published • 20
-
RLHF Workflow: From Reward Modeling to Online RLHF
Paper • 2405.07863 • Published • 72 -
Chameleon: Mixed-Modal Early-Fusion Foundation Models
Paper • 2405.09818 • Published • 131 -
Meteor: Mamba-based Traversal of Rationale for Large Language and Vision Models
Paper • 2405.15574 • Published • 56 -
An Introduction to Vision-Language Modeling
Paper • 2405.17247 • Published • 90
-
LLM Augmented LLMs: Expanding Capabilities through Composition
Paper • 2401.02412 • Published • 39 -
Generative Representational Instruction Tuning
Paper • 2402.09906 • Published • 55 -
Distilling Step-by-Step! Outperforming Larger Language Models with Less Training Data and Smaller Model Sizes
Paper • 2305.02301 • Published • 5 -
Evolutionary Optimization of Model Merging Recipes
Paper • 2403.13187 • Published • 58
-
Self-Rewarding Language Models
Paper • 2401.10020 • Published • 152 -
ReFT: Reasoning with Reinforced Fine-Tuning
Paper • 2401.08967 • Published • 32 -
Tuning Language Models by Proxy
Paper • 2401.08565 • Published • 24 -
TrustLLM: Trustworthiness in Large Language Models
Paper • 2401.05561 • Published • 70
-
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models
Paper • 2401.06066 • Published • 56 -
Mixtral of Experts
Paper • 2401.04088 • Published • 160 -
Blending Is All You Need: Cheaper, Better Alternative to Trillion-Parameters LLM
Paper • 2401.02994 • Published • 52 -
LLM Augmented LLMs: Expanding Capabilities through Composition
Paper • 2401.02412 • Published • 39
-
LLM Augmented LLMs: Expanding Capabilities through Composition
Paper • 2401.02412 • Published • 39 -
LongAgent: Scaling Language Models to 128k Context through Multi-Agent Collaboration
Paper • 2402.11550 • Published • 18 -
A Multimodal Automated Interpretability Agent
Paper • 2404.14394 • Published • 22