RedbeardNZ frankjiang commited on
Commit
930a6bf
·
verified ·
0 Parent(s):

Duplicate from acvlab/FantasyTalking

Browse files

Co-authored-by: Frank Jiang <frankjiang@users.noreply.huggingface.co>

Files changed (4) hide show
  1. .gitattributes +35 -0
  2. README.md +98 -0
  3. README_zh.md +96 -0
  4. fantasytalking_model.ckpt +3 -0
.gitattributes ADDED
@@ -0,0 +1,35 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ *.7z filter=lfs diff=lfs merge=lfs -text
2
+ *.arrow filter=lfs diff=lfs merge=lfs -text
3
+ *.bin filter=lfs diff=lfs merge=lfs -text
4
+ *.bz2 filter=lfs diff=lfs merge=lfs -text
5
+ *.ckpt filter=lfs diff=lfs merge=lfs -text
6
+ *.ftz filter=lfs diff=lfs merge=lfs -text
7
+ *.gz filter=lfs diff=lfs merge=lfs -text
8
+ *.h5 filter=lfs diff=lfs merge=lfs -text
9
+ *.joblib filter=lfs diff=lfs merge=lfs -text
10
+ *.lfs.* filter=lfs diff=lfs merge=lfs -text
11
+ *.mlmodel filter=lfs diff=lfs merge=lfs -text
12
+ *.model filter=lfs diff=lfs merge=lfs -text
13
+ *.msgpack filter=lfs diff=lfs merge=lfs -text
14
+ *.npy filter=lfs diff=lfs merge=lfs -text
15
+ *.npz filter=lfs diff=lfs merge=lfs -text
16
+ *.onnx filter=lfs diff=lfs merge=lfs -text
17
+ *.ot filter=lfs diff=lfs merge=lfs -text
18
+ *.parquet filter=lfs diff=lfs merge=lfs -text
19
+ *.pb filter=lfs diff=lfs merge=lfs -text
20
+ *.pickle filter=lfs diff=lfs merge=lfs -text
21
+ *.pkl filter=lfs diff=lfs merge=lfs -text
22
+ *.pt filter=lfs diff=lfs merge=lfs -text
23
+ *.pth filter=lfs diff=lfs merge=lfs -text
24
+ *.rar filter=lfs diff=lfs merge=lfs -text
25
+ *.safetensors filter=lfs diff=lfs merge=lfs -text
26
+ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
27
+ *.tar.* filter=lfs diff=lfs merge=lfs -text
28
+ *.tar filter=lfs diff=lfs merge=lfs -text
29
+ *.tflite filter=lfs diff=lfs merge=lfs -text
30
+ *.tgz filter=lfs diff=lfs merge=lfs -text
31
+ *.wasm filter=lfs diff=lfs merge=lfs -text
32
+ *.xz filter=lfs diff=lfs merge=lfs -text
33
+ *.zip filter=lfs diff=lfs merge=lfs -text
34
+ *.zst filter=lfs diff=lfs merge=lfs -text
35
+ *tfevents* filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,98 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ ---
4
+ [中文阅读](./README_zh.md)
5
+ # FantasyTalking: Realistic Talking Portrait Generation via Coherent Motion Synthesis
6
+
7
+ [![Home Page](https://img.shields.io/badge/Project-<Website>-blue.svg)](https://fantasy-amap.github.io/fantasy-talking/)
8
+ [![arXiv](https://img.shields.io/badge/Arxiv-2504.04842-b31b1b.svg?logo=arXiv)](https://arxiv.org/abs/2504.04842)
9
+ [![hf_paper](https://img.shields.io/badge/🤗-Paper%20In%20HF-red.svg)](https://huggingface.co/papers/2504.04842)
10
+
11
+ ## 🔥 Latest News!!
12
+ * April 28, 2025: We released the inference code and model weights for audio conditions.
13
+
14
+
15
+ <!-- ![Fig.1](https://github.com/Fantasy-AMAP/fantasy-talking/blob/main/assert/fig0_1_0.png) -->
16
+
17
+
18
+ ## Quickstart
19
+ ### 🛠️Installation
20
+
21
+ Clone the repo:
22
+
23
+ ```
24
+ git clone https://github.com/Fantasy-AMAP/fantasy-talking.git
25
+ cd fantasy-talking
26
+ ```
27
+
28
+ Install dependencies:
29
+ ```
30
+ # Ensure torch >= 2.0.0
31
+ pip install -r requirements.txt
32
+ # Optional to install flash_attn to accelerate attention computation
33
+ pip install flash_attn
34
+ ```
35
+
36
+ ### 🧱Model Download
37
+ | Models | Download Link | Notes |
38
+ | --------------|-------------------------------------------------------------------------------|-------------------------------|
39
+ | Wan2.1-I2V-14B-720P | 🤗 [Huggingface](https://huggingface.co/Wan-AI/Wan2.1-I2V-14B-720P) 🤖 [ModelScope](https://www.modelscope.cn/models/Wan-AI/Wan2.1-I2V-14B-720P) | Base model
40
+ | Wav2Vec | 🤗 [Huggingface](https://huggingface.co/facebook/wav2vec2-base-960h) 🤖 [ModelScope](https://modelscope.cn/models/AI-ModelScope/wav2vec2-base-960h) | Audio encoder
41
+ | FantasyTalking model | 🤗 [Huggingface](https://huggingface.co/acvlab/FantasyTalking/) 🤖 [ModelScope](https://www.modelscope.cn/models/amap_cvlab/FantasyTalking/) | Our audio condition weights
42
+
43
+ Download models using huggingface-cli:
44
+ ``` sh
45
+ pip install "huggingface_hub[cli]"
46
+ huggingface-cli download Wan-AI/Wan2.1-I2V-14B-720P --local-dir ./models/Wan2.1-I2V-14B-720P
47
+ huggingface-cli download facebook/wav2vec2-base-960h --local-dir ./models/wav2vec2-base-960h
48
+ huggingface-cli download acvlab/FantasyTalking --files fantasytalking_model.ckpt --local-dir ./models/fantasytalking_model.ckpt
49
+ ```
50
+
51
+ Download models using modelscope-cli:
52
+ ``` sh
53
+ pip install modelscope
54
+ modelscope download Wan-AI/Wan2.1-I2V-14B-720P --local_dir ./models/Wan2.1-I2V-14B-720P
55
+ modelscope download AI-ModelScope/wav2vec2-base-960h --local_dir ./models/wav2vec2-base-960h
56
+ modelscope download amap_cvlab/FantasyTalking --files fantasytalking_model.ckpt --local-dir ./models/fantasytalking_model.ckpt
57
+ ```
58
+
59
+ ### 🔑 Inference
60
+ ``` sh
61
+ python infer.py --image_path ./assets/images/woman.png --audio_path ./assets/audios/woman.wav
62
+ ```
63
+ You can control the character's behavior through the prompt. The recommended range for prompt and audio cfg is [3-7].
64
+ ``` sh
65
+ python infer.py --image_path ./assets/images/woman.png --audio_path ./assets/audios/woman.wav --prompt "The person is speaking enthusiastically, with their hands continuously waving." --prompt_cfg_scale 5.0 --audio_cfg_scale 5.0
66
+ ```
67
+
68
+ We present a detailed table here. The model is tested on a single A100.(512x512, 81 frames).
69
+
70
+ |`torch_dtype`|`num_persistent_param_in_dit`|Speed|Required VRAM|
71
+ |-|-|-|-|
72
+ |torch.bfloat16|None (unlimited)|15.5s/it|40G|
73
+ |torch.bfloat16|7*10**9 (7B)|32.8s/it|20G|
74
+ |torch.bfloat16|0|42.6s/it|5G|
75
+
76
+ ### Gradio Demo
77
+ We construct an online demo in Huggingface.
78
+ For the local gradio demo, you can run:
79
+ ``` sh
80
+ pip install gradio spaces
81
+ python app.py
82
+ ```
83
+
84
+ ## 🧩 Community Works
85
+ We ❤️ contributions from the open-source community! If your work has improved FantasyTalking, please inform us.
86
+ ## 🔗Citation
87
+ If you find this repository useful, please consider giving a star ⭐ and citation
88
+ ```
89
+ @article{wang2025fantasytalking,
90
+ title={FantasyTalking: Realistic Talking Portrait Generation via Coherent Motion Synthesis},
91
+ author={Wang, Mengchao and Wang, Qiang and Jiang, Fan and Fan, Yaqi and Zhang, Yunpeng and Qi, Yonggang and Zhao, Kun and Xu, Mu},
92
+ journal={arXiv preprint arXiv:2504.04842},
93
+ year={2025}
94
+ }
95
+ ```
96
+
97
+ ## Acknowledgments
98
+ Thanks to [Wan2.1](https://github.com/Wan-Video/Wan2.1), [HunyuanVideo](https://github.com/Tencent/HunyuanVideo), and [DiffSynth-Studio](https://github.com/modelscope/DiffSynth-Studio) for open-sourcing their models and code, which provided valuable references and support for this project. Their contributions to the open-source community are truly appreciated.
README_zh.md ADDED
@@ -0,0 +1,96 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [中文阅读](./README_zh.md)
2
+ # FantasyTalking: Realistic Talking Portrait Generation via Coherent Motion Synthesis
3
+
4
+ [![Home Page](https://img.shields.io/badge/Project-<Website>-blue.svg)](https://fantasy-amap.github.io/fantasy-talking/)
5
+ [![arXiv](https://img.shields.io/badge/Arxiv-2504.04842-b31b1b.svg?logo=arXiv)](https://arxiv.org/abs/2504.04842)
6
+ [![hf_paper](https://img.shields.io/badge/🤗-Paper%20In%20HF-red.svg)](https://huggingface.co/papers/2504.04842)
7
+
8
+ ## 🔥 Latest News!!
9
+ * 2025年4月28日: 开源了音频条件下的推理代码和模型权重。
10
+
11
+
12
+ <!-- ![Fig.1](https://github.com/Fantasy-AMAP/fantasy-talking/blob/main/assert/fig0_1_0.png) -->
13
+
14
+
15
+ ## 快速开始
16
+ ### 🛠️安装和依赖
17
+
18
+ 首先克隆git仓库:
19
+
20
+ ```
21
+ git clone https://github.com/Fantasy-AMAP/fantasy-talking.git
22
+ cd fantasy-talking
23
+ ```
24
+
25
+ 安装依赖:
26
+ ```
27
+ pip install -r requirements.txt
28
+ # 可选安装 flash_attn 以加速注意力计算
29
+ pip install flash_attn
30
+ ```
31
+
32
+ ### 🧱模型下载
33
+ | 模型 | 下载链接 | 备注 |
34
+ | --------------|-------------------------------------------------------------------------------|-------------------------------|
35
+ | Wan2.1-I2V-14B-720P | 🤗 [Huggingface](https://huggingface.co/Wan-AI/Wan2.1-I2V-14B-720P) 🤖 [ModelScope](https://www.modelscope.cn/models/Wan-AI/Wan2.1-I2V-14B-720P) | 基础模型
36
+ | Wav2Vec | 🤗 [Huggingface](https://huggingface.co/facebook/wav2vec2-base-960h) 🤖 [ModelScope](https://modelscope.cn/models/AI-ModelScope/wav2vec2-base-960h) | 音频编码器
37
+ | FantasyTalking model | 🤗 [Huggingface](https://huggingface.co/acvlab/FantasyTalking/) 🤖 [ModelScope](https://www.modelscope.cn/models/amap_cvlab/FantasyTalking/) | 我们的音频条件权重
38
+
39
+ 使用huggingface-cli下载模型:
40
+ ``` sh
41
+ pip install "huggingface_hub[cli]"
42
+ huggingface-cli download Wan-AI/Wan2.1-I2V-14B-720P --local-dir ./models/Wan2.1-I2V-14B-720P
43
+ huggingface-cli download facebook/wav2vec2-base-960h --local-dir ./models/wav2vec2-base-960h
44
+ huggingface-cli download acvlab/FantasyTalking --files fantasytalking_model.ckpt --local-dir ./models/fantasytalking_model.ckpt
45
+ ```
46
+
47
+ 使用modelscope-cli下载模型:
48
+ ``` sh
49
+ pip install modelscope
50
+ modelscope download Wan-AI/Wan2.1-I2V-14B-720P --local_dir ./models/Wan2.1-I2V-14B-720P
51
+ modelscope download AI-ModelScope/wav2vec2-base-960h --local_dir ./models/wav2vec2-base-960h
52
+ modelscope download amap_cvlab/FantasyTalking --files fantasytalking_model.ckpt --local-dir ./models/fantasytalking_model.ckpt
53
+ ```
54
+
55
+ ### 🔑 推理
56
+ ``` sh
57
+ python infer.py --image_path ./assets/images/woman.png --audio_path ./assets/audios/woman.wav
58
+ ```
59
+ 您可以通过提示控制角色的行为。提示和音频配置的推荐范围是[3-7]。
60
+ ``` sh
61
+ python infer.py --image_path ./assets/images/woman.png --audio_path ./assets/audios/woman.wav --prompt "The person is speaking enthusiastically, with their hands continuously waving." --prompt_cfg_scale 5.0 --audio_cfg_scale 5.0
62
+ ```
63
+
64
+ 我们在此处提供了一个详细的表格。该模型在单个A100上进行了测试。(512x512,81帧)
65
+ |`torch_dtype`|`num_persistent_param_in_dit`|Speed|Required VRAM|
66
+ |-|-|-|-|
67
+ |torch.bfloat16|None (unlimited)|15.5s/it|40G|
68
+ |torch.bfloat16|7*10**9 (7B)|32.8s/it|20G|
69
+ |torch.bfloat16|0|42.6s/it|5G|
70
+
71
+ ### Gradio 示例
72
+ 我们构建了一个Huggingface在线演示。
73
+
74
+ 对于本地的gradio演示,您可以运行:
75
+ ``` sh
76
+ pip install gradio spaces
77
+ python app.py
78
+ ```
79
+
80
+ ## 🧩 社区工作
81
+ 我们❤️喜欢来自开源社区的贡献!如果你的工作改进了FantasyTalking,请告诉我们。
82
+
83
+ ## 🔗Citation
84
+ If you find this repository useful, please consider giving a star ⭐ and citation
85
+ ```
86
+ @article{wang2025fantasytalking,
87
+ title={FantasyTalking: Realistic Talking Portrait Generation via Coherent Motion Synthesis},
88
+ author={Wang, Mengchao and Wang, Qiang and Jiang, Fan and Fan, Yaqi and Zhang, Yunpeng and Qi, Yonggang and Zhao, Kun and Xu, Mu},
89
+ journal={arXiv preprint arXiv:2504.04842},
90
+ year={2025}
91
+ }
92
+ ```
93
+
94
+ ## 致谢
95
+ 感谢[Wan2.1](https://github.com/Wan-Video/Wan2.1)、[HunyuanVideo](https://github.com/Tencent/HunyuanVideo)和[DiffSynth-Studio](https://github.com/modelscope/DiffSynth-Studio)开源他们的模型和代码,为该项目提供了宝贵的参考和支持。他们对开源社区的贡献真正值得赞赏。
96
+
fantasytalking_model.ckpt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0e75eb54d2f6e5606a4c009785dd588a6e30d0f07bdd09bf433d624f148a1b6b
3
+ size 3361779185