Sayoyo commited on
Commit
03fc4f1
·
verified ·
1 Parent(s): 1d1bb5d

Upload folder using huggingface_hub

Browse files
Files changed (2) hide show
  1. README.md +1 -118
  2. config.json +35 -0
README.md CHANGED
@@ -3,121 +3,4 @@ license: apache-2.0
3
  tags:
4
  - music
5
  - text2music
6
- pipeline_tag: text-to-audio
7
- language:
8
- - en
9
- - zh
10
- - de
11
- - fr
12
- - es
13
- - it
14
- - pt
15
- - pl
16
- - tr
17
- - ru
18
- - cs
19
- - nl
20
- - ar
21
- - ja
22
- - hu
23
- - ko
24
- - hi
25
- library_name: diffusers
26
- ---
27
-
28
- # ACE-Step: A Step Towards Music Generation Foundation Model
29
-
30
- ![ACE-Step Framework](https://github.com/ACE-Step/ACE-Step/raw/main/assets/ACE-Step_framework.png)
31
-
32
- ## Model Description
33
-
34
- ACE-Step is a novel open-source foundation model for music generation that overcomes key limitations of existing approaches through a holistic architectural design. It integrates diffusion-based generation with Sana's Deep Compression AutoEncoder (DCAE) and a lightweight linear transformer, achieving state-of-the-art performance in generation speed, musical coherence, and controllability.
35
-
36
- **Key Features:**
37
- - 15× faster than LLM-based baselines (20s for 4-minute music on A100)
38
- - Superior musical coherence across melody, harmony, and rhythm
39
- - full-song generation, duration control and accepts natural language descriptions
40
-
41
- ## Uses
42
-
43
- ### Direct Use
44
- ACE-Step can be used for:
45
- - Generating original music from text descriptions
46
- - Music remixing and style transfer
47
- - edit song lyrics
48
-
49
- ### Downstream Use
50
- The model serves as a foundation for:
51
- - Voice cloning applications
52
- - Specialized music generation (rap, jazz, etc.)
53
- - Music production tools
54
- - Creative AI assistants
55
-
56
- ### Out-of-Scope Use
57
- The model should not be used for:
58
- - Generating copyrighted content without permission
59
- - Creating harmful or offensive content
60
- - Misrepresenting AI-generated music as human-created
61
-
62
- ## How to Get Started
63
-
64
- see: https://github.com/ace-step/ACE-Step
65
-
66
- ## Hardware Performance
67
-
68
- | Device | 27 Steps | 60 Steps |
69
- |---------------|----------|----------|
70
- | NVIDIA A100 | 27.27x | 12.27x |
71
- | RTX 4090 | 34.48x | 15.63x |
72
- | RTX 3090 | 12.76x | 6.48x |
73
- | M2 Max | 2.27x | 1.03x |
74
-
75
- *RTF (Real-Time Factor) shown - higher values indicate faster generation*
76
-
77
-
78
- ## Limitations
79
-
80
- - Performance varies by language (top 10 languages perform best)
81
- - Longer generations (>5 minutes) may lose structural coherence
82
- - Rare instruments may not render perfectly
83
- - Output Inconsistency: Highly sensitive to random seeds and input duration, leading to varied "gacha-style" results.
84
- - Style-specific Weaknesses: Underperforms on certain genres (e.g. Chinese rap/zh_rap) Limited style adherence and musicality ceiling
85
- - Continuity Artifacts: Unnatural transitions in repainting/extend operations
86
- - Vocal Quality: Coarse vocal synthesis lacking nuance
87
- - Control Granularity: Needs finer-grained musical parameter control
88
-
89
- ## Ethical Considerations
90
-
91
- Users should:
92
- - Verify originality of generated works
93
- - Disclose AI involvement
94
- - Respect cultural elements and copyrights
95
- - Avoid harmful content generation
96
-
97
-
98
- ## Model Details
99
-
100
- **Developed by:** ACE Studio and StepFun
101
- **Model type:** Diffusion-based music generation with transformer conditioning
102
- **License:** Apache 2.0
103
- **Resources:**
104
- - [Project Page](https://ace-step.github.io/)
105
- - [Demo Space](https://huggingface.co/spaces/ACE-Step/ACE-Step)
106
- - [GitHub Repository](https://github.com/ACE-Step/ACE-Step)
107
-
108
-
109
- ## Citation
110
-
111
- ```bibtex
112
- @misc{gong2025acestep,
113
- title={ACE-Step: A Step Towards Music Generation Foundation Model},
114
- author={Junmin Gong, Wenxiao Zhao, Sen Wang, Shengyuan Xu, Jing Guo},
115
- howpublished={\url{https://github.com/ace-step/ACE-Step}},
116
- year={2025},
117
- note={GitHub repository}
118
- }
119
- ```
120
-
121
- ## Acknowledgements
122
- This project is co-led by ACE Studio and StepFun.
123
-
 
3
  tags:
4
  - music
5
  - text2music
6
+ ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
config.json ADDED
@@ -0,0 +1,35 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_class_name": "ACEStepTransformer2DModel",
3
+ "_diffusers_version": "0.32.2",
4
+ "attention_head_dim": 128,
5
+ "in_channels": 8,
6
+ "inner_dim": 2560,
7
+ "lyric_encoder_vocab_size": 6693,
8
+ "lyric_hidden_size": 1024,
9
+ "max_height": 16,
10
+ "max_position": 32768,
11
+ "max_width": 32768,
12
+ "mlp_ratio": 2.5,
13
+ "num_attention_heads": 20,
14
+ "num_layers": 24,
15
+ "out_channels": 8,
16
+ "patch_size": [
17
+ 16,
18
+ 1
19
+ ],
20
+ "rope_theta": 1000000.0,
21
+ "speaker_embedding_dim": 512,
22
+ "ssl_encoder_depths": [
23
+ 8,
24
+ 8
25
+ ],
26
+ "ssl_latent_dims": [
27
+ 1024,
28
+ 768
29
+ ],
30
+ "ssl_names": [
31
+ "mert",
32
+ "m-hubert"
33
+ ],
34
+ "text_embedding_dim": 768
35
+ }