FlowMo: Variance-Based Flow Guidance for Coherent Motion in Video Generation Paper • 2506.01144 • Published 5 days ago • 14
view article Article SmolVLA: Efficient Vision-Language-Action Model trained on Lerobot Community Data By danaaubakirova and 8 others • 4 days ago • 93
SmolVLA: A Vision-Language-Action Model for Affordable and Efficient Robotics Paper • 2506.01844 • Published 4 days ago • 74
view changelog Changelog Xet is now the default storage option for new users and organizations 14 days ago • 58
view article Article The Transformers Library: standardizing model definitions By lysandre and 3 others • 23 days ago • 111
view article Article nanoVLM: The simplest repository to train your VLM in pure PyTorch By ariG23498 and 6 others • 17 days ago • 139
view article Article Microsoft and Hugging Face expand collaboration By jeffboudier and 2 others • 19 days ago • 20
MobileCLIP Models + DataCompDR Data Collection MobileCLIP: Mobile-friendly image-text models with SOTA zero-shot capabilities. DataCompDR: Improved datasets for training image-text SOTA models. • 22 items • Updated Oct 4, 2024 • 29
view article Article Improving Hugging Face Model Access for Kaggle Users By roseberryv and 4 others • 24 days ago • 27
view article Article Vision Language Models (Better, Faster, Stronger) By merve and 4 others • 26 days ago • 417
MoE-LLaVA: Mixture of Experts for Large Vision-Language Models Paper • 2401.15947 • Published Jan 29, 2024 • 53
view article Article A Dive into Pretraining Strategies for Vision-Language Models By adirik and 1 other • Feb 3, 2023 • 67
CubeDiff: Repurposing Diffusion-Based Image Models for Panorama Generation Paper • 2501.17162 • Published Jan 28 • 1
D-FINE Collection State-of-the-art real-time object detection model with Apache 2.0 licence • 15 items • Updated May 5 • 55