-
Arcee's MergeKit: A Toolkit for Merging Large Language Models
Paper • 2403.13257 • Published • 20 -
Model Stock: All we need is just a few fine-tuned models
Paper • 2403.19522 • Published • 12 -
Mergenetic: a Simple Evolutionary Model Merging Library
Paper • 2505.11427 • Published • 12 -
Layer Swapping for Zero-Shot Cross-Lingual Transfer in Large Language Models
Paper • 2410.01335 • Published • 5
Collections
Discover the best community collections!
Collections including paper arxiv:2403.13257
-
INT-FP-QSim: Mixed Precision and Formats For Large Language Models and Vision Transformers
Paper • 2307.03712 • Published • 1 -
Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clusters
Paper • 2408.04093 • Published • 4 -
Arcee's MergeKit: A Toolkit for Merging Large Language Models
Paper • 2403.13257 • Published • 20 -
LongVILA: Scaling Long-Context Visual Language Models for Long Videos
Paper • 2408.10188 • Published • 53
-
ShortGPT: Layers in Large Language Models are More Redundant Than You Expect
Paper • 2403.03853 • Published • 65 -
SliceGPT: Compress Large Language Models by Deleting Rows and Columns
Paper • 2401.15024 • Published • 74 -
Your Transformer is Secretly Linear
Paper • 2405.12250 • Published • 159 -
Yi: Open Foundation Models by 01.AI
Paper • 2403.04652 • Published • 66
-
LongLoRA: Efficient Fine-tuning of Long-Context Large Language Models
Paper • 2309.12307 • Published • 89 -
LMDX: Language Model-based Document Information Extraction and Localization
Paper • 2309.10952 • Published • 66 -
Table-GPT: Table-tuned GPT for Diverse Table Tasks
Paper • 2310.09263 • Published • 41 -
BitNet: Scaling 1-bit Transformers for Large Language Models
Paper • 2310.11453 • Published • 103
-
Qualitatively characterizing neural network optimization problems
Paper • 1412.6544 • Published • 4 -
Convergent Learning: Do different neural networks learn the same representations?
Paper • 1511.07543 • Published • 2 -
Mixout: Effective Regularization to Finetune Large-scale Pretrained Language Models
Paper • 1909.11299 • Published • 2 -
Model Fusion via Optimal Transport
Paper • 1910.05653 • Published • 1