Seeing Voices: Generating A-Roll Video from Audio with Mirage Paper • 2506.08279 • Published Jun 9 • 28
SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features Paper • 2502.14786 • Published Feb 20 • 146
LatentSwap: An Efficient Latent Code Mapping Framework for Face Swapping Paper • 2402.18351 • Published Feb 28, 2024 • 2
Mitsua/mitsua-japanese-clip-vit-b-16 Zero-Shot Image Classification • 0.2B • Updated Dec 9, 2024 • 1 • 7
Running 554 554 Talking Face Generation with Multilingual TTS 👄 Generate a talking face video from text