-
Can LLMs Follow Simple Rules?
Paper • 2311.04235 • Published • 14 -
The Unreasonable Ineffectiveness of the Deeper Layers
Paper • 2403.17887 • Published • 82 -
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection
Paper • 2403.03507 • Published • 189 -
Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models
Paper • 2402.17177 • Published • 89
Collections
Discover the best community collections!
Collections including paper arxiv:2403.13187
-
A Zero-Shot Language Agent for Computer Control with Structured Reflection
Paper • 2310.08740 • Published • 16 -
AgentTuning: Enabling Generalized Agent Abilities for LLMs
Paper • 2310.12823 • Published • 36 -
AgentVerse: Facilitating Multi-Agent Collaboration and Exploring Emergent Behaviors
Paper • 2308.10848 • Published • 1 -
CLEX: Continuous Length Extrapolation for Large Language Models
Paper • 2310.16450 • Published • 10
-
Ultra-Long Sequence Distributed Transformer
Paper • 2311.02382 • Published • 6 -
Ziya2: Data-centric Learning is All LLMs Need
Paper • 2311.03301 • Published • 20 -
Relax: Composable Abstractions for End-to-End Dynamic Machine Learning
Paper • 2311.02103 • Published • 22 -
Extending Context Window of Large Language Models via Semantic Compression
Paper • 2312.09571 • Published • 15
-
Qualitatively characterizing neural network optimization problems
Paper • 1412.6544 • Published • 4 -
Convergent Learning: Do different neural networks learn the same representations?
Paper • 1511.07543 • Published • 2 -
Mixout: Effective Regularization to Finetune Large-scale Pretrained Language Models
Paper • 1909.11299 • Published • 2 -
Model Fusion via Optimal Transport
Paper • 1910.05653 • Published • 1
-
Can LLMs Follow Simple Rules?
Paper • 2311.04235 • Published • 14 -
The Unreasonable Ineffectiveness of the Deeper Layers
Paper • 2403.17887 • Published • 82 -
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection
Paper • 2403.03507 • Published • 189 -
Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models
Paper • 2402.17177 • Published • 89
-
Ultra-Long Sequence Distributed Transformer
Paper • 2311.02382 • Published • 6 -
Ziya2: Data-centric Learning is All LLMs Need
Paper • 2311.03301 • Published • 20 -
Relax: Composable Abstractions for End-to-End Dynamic Machine Learning
Paper • 2311.02103 • Published • 22 -
Extending Context Window of Large Language Models via Semantic Compression
Paper • 2312.09571 • Published • 15
-
A Zero-Shot Language Agent for Computer Control with Structured Reflection
Paper • 2310.08740 • Published • 16 -
AgentTuning: Enabling Generalized Agent Abilities for LLMs
Paper • 2310.12823 • Published • 36 -
AgentVerse: Facilitating Multi-Agent Collaboration and Exploring Emergent Behaviors
Paper • 2308.10848 • Published • 1 -
CLEX: Continuous Length Extrapolation for Large Language Models
Paper • 2310.16450 • Published • 10
-
Qualitatively characterizing neural network optimization problems
Paper • 1412.6544 • Published • 4 -
Convergent Learning: Do different neural networks learn the same representations?
Paper • 1511.07543 • Published • 2 -
Mixout: Effective Regularization to Finetune Large-scale Pretrained Language Models
Paper • 1909.11299 • Published • 2 -
Model Fusion via Optimal Transport
Paper • 1910.05653 • Published • 1