Sayoyo commited on
Commit
2366939
·
verified ·
1 Parent(s): 2d83c51

Upload folder using huggingface_hub

Browse files
Files changed (2) hide show
  1. README.md +3 -159
  2. config.json +15 -0
README.md CHANGED
@@ -1,159 +1,3 @@
1
- ---
2
- license: apache-2.0
3
- tags:
4
- - music
5
- - text2music
6
- pipeline_tag: text-to-audio
7
- language:
8
- - en
9
- - zh
10
- - de
11
- - fr
12
- - es
13
- - it
14
- - pt
15
- - pl
16
- - tr
17
- - ru
18
- - cs
19
- - nl
20
- - ar
21
- - ja
22
- - hu
23
- - ko
24
- - hi
25
- library_name: diffusers
26
- ---
27
-
28
- # 🎤 Chinese Rap LoRA for ACE-Step (Rap Machine)
29
-
30
- This is a hybrid rap voice model. We meticulously curated Chinese rap/hip-hop datasets for training, with rigorous data cleaning and recaptioning. The results demonstrate:
31
-
32
- - Improved Chinese pronunciation accuracy
33
- - Enhanced stylistic adherence to hip-hop and electronic genres
34
- - Greater diversity in hip-hop vocal expressions
35
-
36
- ## Usage Guide
37
-
38
- 1. Generate higher-quality Chinese songs
39
- 2. Create superior hip-hop tracks
40
- 3. Blend with other genres to:
41
- - Produce music with better vocal quality and detail
42
- - Add experimental flavors (e.g., underground, street culture)
43
- 4. Fine-tune using these parameters:
44
-
45
- **Vocal Controls**
46
- **`vocal_timbre`**
47
- - Examples: Bright, dark, warm, cold, breathy, nasal, gritty, smooth, husky, metallic, whispery, resonant, airy, smoky, sultry, light, clear, high-pitched, raspy, powerful, ethereal, flute-like, hollow, velvety, shrill, hoarse, mellow, thin, thick, reedy, silvery, twangy.
48
- - Describes inherent vocal qualities.
49
-
50
- **`techniques`** (List)
51
- - Rap styles: `mumble rap`, `chopper rap`, `melodic rap`, `lyrical rap`, `trap flow`, `double-time rap`
52
- - Vocal FX: `auto-tune`, `reverb`, `delay`, `distortion`
53
- - Delivery: `whispered`, `shouted`, `spoken word`, `narration`, `singing`
54
- - Other: `ad-libs`, `call-and-response`, `harmonized`
55
-
56
- ## Community Note
57
-
58
- While a Chinese rap LoRA might seem niche for non-Chinese communities, we consistently demonstrate through such projects that ACE-step - as a music generation foundation model - holds boundless potential. It doesn't just improve pronunciation in one language, but spawns new styles.
59
-
60
- The universal human appreciation of music is a precious asset. Like abstract LEGO blocks, these elements will eventually combine in more organic ways. May our open-source contributions propel the evolution of musical history forward.
61
-
62
- ---
63
-
64
- # ACE-Step: A Step Towards Music Generation Foundation Model
65
-
66
- ![ACE-Step Framework](https://github.com/ACE-Step/ACE-Step/raw/main/assets/ACE-Step_framework.png)
67
-
68
- ## Model Description
69
-
70
- ACE-Step is a novel open-source foundation model for music generation that overcomes key limitations of existing approaches through a holistic architectural design. It integrates diffusion-based generation with Sana's Deep Compression AutoEncoder (DCAE) and a lightweight linear transformer, achieving state-of-the-art performance in generation speed, musical coherence, and controllability.
71
-
72
- **Key Features:**
73
- - 15× faster than LLM-based baselines (20s for 4-minute music on A100)
74
- - Superior musical coherence across melody, harmony, and rhythm
75
- - full-song generation, duration control and accepts natural language descriptions
76
-
77
- ## Uses
78
-
79
- ### Direct Use
80
- ACE-Step can be used for:
81
- - Generating original music from text descriptions
82
- - Music remixing and style transfer
83
- - edit song lyrics
84
-
85
- ### Downstream Use
86
- The model serves as a foundation for:
87
- - Voice cloning applications
88
- - Specialized music generation (rap, jazz, etc.)
89
- - Music production tools
90
- - Creative AI assistants
91
-
92
- ### Out-of-Scope Use
93
- The model should not be used for:
94
- - Generating copyrighted content without permission
95
- - Creating harmful or offensive content
96
- - Misrepresenting AI-generated music as human-created
97
-
98
- ## How to Get Started
99
-
100
- see: https://github.com/ace-step/ACE-Step
101
-
102
- ## Hardware Performance
103
-
104
- | Device | 27 Steps | 60 Steps |
105
- |---------------|----------|----------|
106
- | NVIDIA A100 | 27.27x | 12.27x |
107
- | RTX 4090 | 34.48x | 15.63x |
108
- | RTX 3090 | 12.76x | 6.48x |
109
- | M2 Max | 2.27x | 1.03x |
110
-
111
- *RTF (Real-Time Factor) shown - higher values indicate faster generation*
112
-
113
-
114
- ## Limitations
115
-
116
- - Performance varies by language (top 10 languages perform best)
117
- - Longer generations (>5 minutes) may lose structural coherence
118
- - Rare instruments may not render perfectly
119
- - Output Inconsistency: Highly sensitive to random seeds and input duration, leading to varied "gacha-style" results.
120
- - Style-specific Weaknesses: Underperforms on certain genres (e.g. Chinese rap/zh_rap) Limited style adherence and musicality ceiling
121
- - Continuity Artifacts: Unnatural transitions in repainting/extend operations
122
- - Vocal Quality: Coarse vocal synthesis lacking nuance
123
- - Control Granularity: Needs finer-grained musical parameter control
124
-
125
- ## Ethical Considerations
126
-
127
- Users should:
128
- - Verify originality of generated works
129
- - Disclose AI involvement
130
- - Respect cultural elements and copyrights
131
- - Avoid harmful content generation
132
-
133
-
134
- ## Model Details
135
-
136
- **Developed by:** ACE Studio and StepFun
137
- **Model type:** Diffusion-based music generation with transformer conditioning
138
- **License:** Apache 2.0
139
- **Resources:**
140
- - [Project Page](https://ace-step.github.io/)
141
- - [Demo Space](https://huggingface.co/spaces/ACE-Step/ACE-Step)
142
- - [GitHub Repository](https://github.com/ACE-Step/ACE-Step)
143
-
144
-
145
- ## Citation
146
-
147
- ```bibtex
148
- @misc{gong2025acestep,
149
- title={ACE-Step: A Step Towards Music Generation Foundation Model},
150
- author={Junmin Gong, Wenxiao Zhao, Sen Wang, Shengyuan Xu, Jing Guo},
151
- howpublished={\url{https://github.com/ace-step/ACE-Step}},
152
- year={2025},
153
- note={GitHub repository}
154
- }
155
- ```
156
-
157
- ## Acknowledgements
158
- This project is co-led by ACE Studio and StepFun.
159
-
 
1
+ ---
2
+ license: apache-2.0
3
+ ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
config.json ADDED
@@ -0,0 +1,15 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "r": 256,
3
+ "lora_alpha": 32,
4
+ "target_modules": [
5
+ "speaker_embedder",
6
+ "linear_q",
7
+ "linear_k",
8
+ "linear_v",
9
+ "to_q",
10
+ "to_k",
11
+ "to_v",
12
+ "to_out.0"
13
+ ],
14
+ "use_rslora": true
15
+ }