Insights into DeepSeek-V3: Scaling Challenges and Reflections on Hardware for AI Architectures Paper • 2505.09343 • Published 24 days ago • 63
Unified Multimodal Understanding and Generation Models: Advances, Challenges, and Opportunities Paper • 2505.02567 • Published May 5 • 74
A Survey on Inference Engines for Large Language Models: Perspectives on Optimization and Efficiency Paper • 2505.01658 • Published May 3 • 35
Low-Precision Training of Large Language Models: Methods, Challenges, and Opportunities Paper • 2505.01043 • Published May 2 • 10
Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction Paper • 2404.02905 • Published Apr 3, 2024 • 71
MiniMax-01: Scaling Foundation Models with Lightning Attention Paper • 2501.08313 • Published Jan 14 • 293
Kimi k1.5: Scaling Reinforcement Learning with LLMs Paper • 2501.12599 • Published Jan 22 • 118
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning Paper • 2501.12948 • Published Jan 22 • 400
SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model Paper • 2502.02737 • Published Feb 4 • 232
Sleep-time Compute: Beyond Inference Scaling at Test-time Paper • 2504.13171 • Published Apr 17 • 15
view article Article PyTorchModelHubMixin: Bridging the Gap for Custom AI Models on Hugging Face By not-lain and 1 other • Nov 11, 2024 • 18
view article Article Don't repeat yourself - 🤗 Transformers Design Philosophy By patrickvonplaten • Apr 5, 2022 • 34
MobileLLM Collection Optimizing Sub-billion Parameter Language Models for On-Device Use Cases (ICML 2024) https://arxiv.org/abs/2402.14905 • 40 items • Updated 5 days ago • 116
SmolVLM: Redefining small and efficient multimodal models Paper • 2504.05299 • Published Apr 7 • 188
OmniSVG: A Unified Scalable Vector Graphics Generation Model Paper • 2504.06263 • Published Apr 8 • 168
Advances and Challenges in Foundation Agents: From Brain-Inspired Intelligence to Evolutionary, Collaborative, and Safe Systems Paper • 2504.01990 • Published Mar 31 • 285