Neural Vocoder is All You Need for Speech Super-resolution Paper • 2203.14941 • Published Mar 28, 2022 • 1
view article Article Open-Source Handwritten Signature Detection Model By samuellimabraz • Mar 14 • 113
view article Article Welcome Gemma 3: Google's all new multimodal, multilingual, long context open LLM By ariG23498 and 3 others • Mar 12 • 426
Audio Flamingo 2: An Audio-Language Model with Long-Audio Understanding and Expert Reasoning Abilities Paper • 2503.03983 • Published Mar 6 • 24
view article Article Using LoRA for Efficient Stable Diffusion Fine-Tuning By pcuenq and 1 other • Jan 26, 2023 • 66
Qwen2.5 Collection Qwen2.5 language models, including pretrained and instruction-tuned models of 7 sizes, including 0.5B, 1.5B, 3B, 7B, 14B, 32B, and 72B. • 46 items • Updated Apr 28 • 616
CLAP: Contrastive Language-Audio Pretraining Collection CLAP is to audio what CLIP is to image. • 5 items • Updated Oct 31, 2023 • 11
Audio Flamingo: A Novel Audio Language Model with Few-Shot Learning and Dialogue Abilities Paper • 2402.01831 • Published Feb 2, 2024 • 15
view article Article SmolVLM - small yet mighty Vision Language Model By andito and 4 others • Nov 26, 2024 • 306
view article Article SmolVLM Grows Smaller – Introducing the 250M & 500M Models! By andito and 2 others • Jan 23 • 180
SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model Paper • 2502.02737 • Published Feb 4 • 232
view article Article State of open video generation models in Diffusers By sayakpaul and 2 others • Jan 27 • 53
Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models Paper • 2409.17146 • Published Sep 25, 2024 • 115
Cosmos Tokenizer Collection A suite of image and video tokenizers • 13 items • Updated about 15 hours ago • 40
MobileVLM V2: Faster and Stronger Baseline for Vision Language Model Paper • 2402.03766 • Published Feb 6, 2024 • 15