view article Article Open-source DeepResearch ā Freeing our search agents By m-ric and 4 others ⢠Feb 4 ⢠1.25k
CodeI/O: Condensing Reasoning Patterns via Code Input-Output Prediction Paper ⢠2502.07316 ⢠Published Feb 11 ⢠50
Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach Paper ⢠2502.05171 ⢠Published Feb 7 ⢠142
Streaming DiLoCo with overlapping communication: Towards a Distributed Free Lunch Paper ⢠2501.18512 ⢠Published Jan 30 ⢠30
view article Article Open-R1: a fully open reproduction of DeepSeek-R1 By eliebak and 2 others ⢠Jan 28 ⢠862
view article Article Welcome to Inference Providers on the Hub š„ By julien-c and 6 others ⢠Jan 28 ⢠483
Structured 3D Latents for Scalable and Versatile 3D Generation Paper ⢠2412.01506 ⢠Published Dec 2, 2024 ⢠77
On-Policy Distillation of Language Models: Learning from Self-Generated Mistakes Paper ⢠2306.13649 ⢠Published Jun 23, 2023 ⢠22
Cautious Optimizers: Improving Training with One Line of Code Paper ⢠2411.16085 ⢠Published Nov 25, 2024 ⢠21
Loopy: Taming Audio-Driven Portrait Avatar with Long-Term Motion Dependency Paper ⢠2409.02634 ⢠Published Sep 4, 2024 ⢠98
Memory-Efficient LLM Training with Online Subspace Descent Paper ⢠2408.12857 ⢠Published Aug 23, 2024 ⢠14
view article Article Introducing Idefics2: A Powerful 8B Vision-Language Model for the community By Leyo and 2 others ⢠Apr 15, 2024 ⢠180
Longhorn: State Space Models are Amortized Online Learners Paper ⢠2407.14207 ⢠Published Jul 19, 2024 ⢠18
Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks Paper ⢠2311.06242 ⢠Published Nov 10, 2023 ⢠94
The Hedgehog & the Porcupine: Expressive Linear Attentions with Softmax Mimicry Paper ⢠2402.04347 ⢠Published Feb 6, 2024 ⢠15
Towards Modular LLMs by Building and Reusing a Library of LoRAs Paper ⢠2405.11157 ⢠Published May 18, 2024 ⢠31
SambaNova SN40L: Scaling the AI Memory Wall with Dataflow and Composition of Experts Paper ⢠2405.07518 ⢠Published May 13, 2024 ⢠28