view article Article How to generate text: using different decoding methods for language generation with Transformers By patrickvonplaten • Mar 1, 2020 • 237
Geometry Forcing: Marrying Video Diffusion and 3D Representation for Consistent World Modeling Paper • 2507.07982 • Published Jul 10 • 32
Geometry Forcing: Marrying Video Diffusion and 3D Representation for Consistent World Modeling Paper • 2507.07982 • Published Jul 10 • 32
Playing with Transformer at 30+ FPS via Next-Frame Diffusion Paper • 2506.01380 • Published Jun 2 • 1
MineWorld: a Real-Time and Open-Source Interactive World Model on Minecraft Paper • 2504.08388 • Published Apr 11 • 40
VidTok: A Versatile and Open-Source Video Tokenizer Paper • 2412.13061 • Published Dec 17, 2024 • 8 • 2
Make Your Actor Talk: Generalizable and High-Fidelity Lip Sync with Motion and Appearance Disentanglement Paper • 2406.08096 • Published Jun 12, 2024
IGOR: Image-GOal Representations are the Atomic Control Units for Foundation Models in Embodied AI Paper • 2411.00785 • Published Oct 17, 2024 • 8
Memories are One-to-Many Mapping Alleviators in Talking Face Generation Paper • 2212.05005 • Published Dec 9, 2022
End-to-End Rate-Distortion Optimized 3D Gaussian Representation Paper • 2406.01597 • Published Apr 9, 2024
DAE-Talker: High Fidelity Speech-Driven Talking Face Generation with Diffusion Autoencoder Paper • 2303.17550 • Published Mar 30, 2023