view reply Does Liger Kernel affect training speed at all? Is it faster, slower, or no difference compared to regular GRPO?
BLIP3-o: A Family of Fully Open Unified Multimodal Models-Architecture, Training and Dataset Paper โข 2505.09568 โข Published 23 days ago โข 90
Tiny Series Collection Tiny datasets that empower the foundation of Small Language Model! โข 11 items โข Updated Jan 26, 2024 โข 38