malvin noel commited on
Commit
8b05224
·
1 Parent(s): 1559a4e

Corrected initial push of light AI Video Generator

Browse files
README copy.md ADDED
@@ -0,0 +1,112 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # 🎥 Light AI Video Generator
2
+
3
+ **Light AI Video Generator** is an all-in-one Hugging Face Space that lets you create short, compelling AI-powered videos using **just a few inputs**. It handles everything — from script writing and voice generation to background video montage and subtitles.
4
+
5
+ > ✅ No editing skills required.
6
+ > 🔥 Perfect for YouTube Shorts, TikTok, Reels, and more.
7
+
8
+ ---
9
+
10
+ ## 🚀 Features
11
+
12
+ ### 1. 🧠 AI Script Generation
13
+ Use the built-in Qwen2.5 language model to automatically write a **concise, engaging script** based on your context and instructions.
14
+ Prefer to write your own? Just switch to "Use my script" mode.
15
+
16
+ ### 2. 🗣️ Voice Generation with Kokoro
17
+ Your script is turned into a **natural-sounding AI voice** using Kokoro TTS (English voices). The voice is saved and synchronized with the final video.
18
+
19
+ ### 3. 🎞️ Background Video Compilation
20
+ Upload one or more `.mp4` clips to serve as the **background footage**. The app automatically stitches and syncs them to match your script duration.
21
+
22
+ ### 4. 🎨 Video Style Settings
23
+ Customize:
24
+ - **Brightness**
25
+ - **Contrast**
26
+ - **Gamma**
27
+
28
+ Perfect to match your aesthetic or highlight visuals better.
29
+
30
+ ### 5. 🎵 Optional Background Music
31
+ Optionally upload a `.mp3` music track to mix with the voiceover.
32
+
33
+ ### 6. 📝 Subtitles (Optional)
34
+ Enable dynamic subtitles that match the voiceover and are perfectly synced for better accessibility and viewer retention.
35
+
36
+ ### 7. 🏷️ Title + Description + Tags
37
+ Get a **YouTube-ready title, description, and hashtags** automatically generated to improve discoverability.
38
+
39
+ ---
40
+
41
+ ## 🧪 How to Use
42
+
43
+ 1. Go to the **🛠️ Settings tab**:
44
+ - Enter your **context** and **instruction**
45
+ - Choose whether to auto-generate the script or input your own
46
+ - Set your desired video **duration**, **style**, and upload your background video(s)
47
+ - Optional: upload a music file and enable subtitles
48
+
49
+ 2. Click **🚀 Generate the video**
50
+
51
+ 3. Head to the **📤 Results tab** to:
52
+ - Watch the generated video
53
+ - Copy the script, title, and description with one click
54
+
55
+ ---
56
+
57
+ ## 🛠 Technologies Used
58
+
59
+ | Component | Technology |
60
+ |------------------|----------------------------------|
61
+ | UI | Gradio |
62
+ | LLM | Qwen2.5 via Hugging Face |
63
+ | Voice (TTS) | Kokoro 82M (open-weight TTS) |
64
+ | Audio Processing | Pydub, Soundfile |
65
+ | Video Editing | MoviePy |
66
+ | Subtitles | Whisper (OpenAI) |
67
+
68
+ ---
69
+
70
+ ## 📁 Project Structure
71
+
72
+ ```
73
+ .
74
+ ├── app.py # Gradio app logic & interface
75
+ ├── requirements.txt # All needed dependencies
76
+ ├── README.md # This file
77
+ ├── scripts/
78
+ │ ├── generate_scripts.py # Script, title, description generation
79
+ │ ├── generate_voice.py # Kokoro voice synthesis
80
+ │ ├── get_footage.py # Background montage builder
81
+ │ ├── edit_video.py # Final audio/video editor
82
+ │ └── generate_subtitles.py # Whisper subtitle generation
83
+ └── assets/ # Stores audio, video and outputs
84
+ ```
85
+
86
+ ---
87
+
88
+ ## ✅ Requirements
89
+
90
+ This app requires the following dependencies (automatically installed in Spaces):
91
+
92
+ ```txt
93
+ gradio
94
+ torch
95
+ transformers
96
+ kokoro>=0.9.4
97
+ soundfile
98
+ pydub
99
+ openai-whisper
100
+ moviepy
101
+ python-dotenv
102
+ ```
103
+
104
+ ---
105
+
106
+ ## 🤖 Credits
107
+
108
+ Built by [Your Name or Team]
109
+ Powered by Hugging Face 🤗, Qwen, and Kokoro TTS
110
+ MIT License
111
+
112
+ ---
app.py ADDED
@@ -0,0 +1,218 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import gradio as gr
2
+ import os
3
+ import shutil
4
+ from typing import List, Optional
5
+
6
+ from scripts.generate_scripts import generate_script, generate_title, generate_description
7
+ from scripts.generate_voice import generate_voice
8
+ from scripts.get_footage import get_video_montage_from_folder
9
+ from scripts.edit_video import edit_video
10
+ from scripts.generate_subtitles import (
11
+ transcribe_audio_to_subs,
12
+ chunk_text_by_words,
13
+ add_subtitles_to_video,
14
+ )
15
+
16
+ # ──────────────────────────────────────────────────────────────────────────────
17
+ # Constants & helper utils
18
+ # ──────────────────────────────────────────────────────────────────────────────
19
+
20
+ WORDS_PER_SECOND = 2.3 # ≃ 140 wpm
21
+
22
+
23
+ def safe_copy(src: str, dst: str) -> str:
24
+ if os.path.abspath(src) == os.path.abspath(dst):
25
+ return src
26
+ shutil.copy(src, dst)
27
+ return dst
28
+
29
+ # ──────────────────────────────────────────────────────────────────────────────
30
+ # Core processing pipeline
31
+ # ──────────────────────────────────────────────────────────────────────────────
32
+
33
+ def process_video(
34
+ context: str,
35
+ instruction: str,
36
+ target_duration: int,
37
+ script_mode: str,
38
+ custom_script: Optional[str],
39
+ lum: float,
40
+ contrast: float,
41
+ gamma: float,
42
+ add_subs: bool,
43
+ accumulated_videos: List[str] | None = None,
44
+ user_music: Optional[str] = None,
45
+ show_progress_bar: bool = True,
46
+ ):
47
+ """Build the final video using user‑defined visual parameters (brightness, contrast, gamma)."""
48
+
49
+ if not accumulated_videos:
50
+ raise ValueError("❌ Please upload at least one background video (.mp4) before generating.")
51
+
52
+ approx_words = int(target_duration * WORDS_PER_SECOND)
53
+
54
+ # --- 1. Script (AI or custom) ---
55
+ if script_mode == "Use my script":
56
+ if not custom_script or not custom_script.strip():
57
+ raise ValueError("❌ You selected 'Use my script' but the script field is empty!")
58
+ script = custom_script.strip()
59
+ title = generate_title(script)
60
+ description = generate_description(script)
61
+ else:
62
+ prompt = (
63
+ f"You are a video creation expert. Here is the context: {context.strip()}\n"
64
+ f"Instruction: {instruction.strip()}\n"
65
+ f"🔴 Strict target duration: {target_duration}s — ≈ {approx_words} words (must be respected)."
66
+ )
67
+ script = generate_script(prompt)
68
+ title = generate_title(script)
69
+ description = generate_description(script)
70
+
71
+ # --- 2. Prepare folders ---
72
+ for folder in ("./assets/audio", "./assets/backgrounds", "./assets/output"):
73
+ os.makedirs(folder, exist_ok=True)
74
+
75
+ voice_path = "./assets/audio/voice.mp3"
76
+ final_no_subs = "./assets/output/final_video.mp4"
77
+ final_with_subs = "./assets/output/final_video_subtitles.mp4"
78
+
79
+ # --- 3. Copy videos ---
80
+ for f in os.listdir("./assets/backgrounds"):
81
+ if f.lower().endswith(".mp4"):
82
+ os.remove(os.path.join("./assets/backgrounds", f))
83
+ for idx, v in enumerate(accumulated_videos):
84
+ if not os.path.isfile(v) or not v.lower().endswith(".mp4"):
85
+ raise ValueError(f"❌ Invalid file: {v}")
86
+ safe_copy(v, os.path.join("./assets/backgrounds", f"video_{idx:03d}.mp4"))
87
+
88
+ # --- 4. AI voice ---
89
+ generate_voice(script, voice_path)
90
+
91
+ # --- 5. Video montage ---
92
+ music_path = user_music if user_music and os.path.isfile(user_music) else None
93
+ _, out_no_audio = get_video_montage_from_folder(
94
+ folder_path="./assets/backgrounds",
95
+ audio_path=voice_path,
96
+ output_dir="./assets/video_music",
97
+ lum=lum,
98
+ contrast=contrast,
99
+ gamma=gamma,
100
+ show_progress_bar=show_progress_bar,
101
+ )
102
+
103
+ # --- 6. Mixing & subtitles ---
104
+ edit_video(out_no_audio, voice_path, music_path, final_no_subs)
105
+
106
+ if add_subs:
107
+ segments = transcribe_audio_to_subs(voice_path)
108
+ subs = chunk_text_by_words(segments, max_words=3)
109
+ add_subtitles_to_video(final_no_subs, subs, final_with_subs)
110
+ return script, title, description, final_with_subs
111
+ else:
112
+ return script, title, description, final_no_subs
113
+
114
+ # ─────────────────────────────────────────────��────────────────────────────────
115
+ # Upload helper
116
+ # ──────────────────────────────────────────────────────────────────────────────
117
+
118
+ def accumulate_files(new: List[str], state: List[str] | None):
119
+ state = state or []
120
+ for f in new or []:
121
+ if isinstance(f, str) and os.path.isfile(f) and f.lower().endswith(".mp4") and f not in state:
122
+ state.append(f)
123
+ return state
124
+
125
+ # ──────────────────────────────────────────────────────────────────────────────
126
+ # Gradio UI
127
+ # ──────────────────────────────────────────────────────────────────────────────
128
+
129
+ with gr.Blocks(theme="gradio/soft") as demo:
130
+ gr.Markdown("# 🎬 AI Video Generator — Advanced Controls")
131
+
132
+ # ------------------- Parameters -------------------
133
+ with gr.Tab("🛠️ Settings"):
134
+ with gr.Row():
135
+ context_input = gr.Textbox(label="🧠 Context", lines=4)
136
+ instruction_input = gr.Textbox(label="🎯 Instruction", lines=4)
137
+
138
+ duration_slider = gr.Slider(5, 120, 1, 60, label="⏱️ Target duration (s)")
139
+
140
+ script_mode = gr.Radio([
141
+ "Generate script with AI",
142
+ "Use my script",
143
+ ], value="Generate script with AI", label="Script mode")
144
+
145
+ custom_script_input = gr.Textbox(label="✍️ My script", lines=8, interactive=False)
146
+
147
+ def toggle_script_input(mode):
148
+ return gr.update(interactive=(mode == "Use my script"))
149
+
150
+ script_mode.change(toggle_script_input, inputs=script_mode, outputs=custom_script_input)
151
+
152
+ with gr.Accordion("🎨 Video Settings (brightness/contrast/gamma)", open=False):
153
+ lum_slider = gr.Slider(0, 20, 6, step=0.5, label="Brightness (0–20)")
154
+ contrast_slider = gr.Slider(0.5, 2.0, 1.0, step=0.05, label="Contrast (0.5–2.0)")
155
+ gamma_slider = gr.Slider(0.5, 2.0, 1.0, step=0.05, label="Gamma (0.5–2.0)")
156
+
157
+ with gr.Row():
158
+ add_subs_checkbox = gr.Checkbox(label="Add dynamic subtitles", value=True)
159
+
160
+ with gr.Row():
161
+ show_bar = gr.Checkbox(label="Show progress bar", value=True)
162
+
163
+ # Upload videos
164
+ videos_dropzone = gr.Files(label="🎞️ Background videos (MP4)", file_types=[".mp4"], type="filepath")
165
+ videos_state = gr.State([])
166
+ video_list_display = gr.Textbox(label="✅ Selected videos", interactive=False, lines=4)
167
+ videos_dropzone.upload(accumulate_files, [videos_dropzone, videos_state], videos_state, queue=False)
168
+ videos_state.change(lambda s: "\n".join(os.path.basename(f) for f in s), videos_state, video_list_display, queue=False)
169
+
170
+
171
+ user_music = gr.File(label="🎵 Background music (MP3, optional)", file_types=[".mp3"], type="filepath")
172
+
173
+ generate_btn = gr.Button("🚀 Generate the video", variant="primary")
174
+
175
+ with gr.Tab("📤 Results"):
176
+ video_output = gr.Video(label="🎬 Generated Video")
177
+
178
+ # Script + copy button
179
+ script_output = gr.Textbox(label="📝 Script", lines=6, interactive=False)
180
+ copy_script_btn = gr.Button("📋 Copy")
181
+ copy_script_btn.click(
182
+ None,
183
+ inputs=[script_output],
184
+ outputs=None,
185
+ js="(text) => navigator.clipboard.writeText(text)"
186
+ )
187
+
188
+ # Title + copy button
189
+ title_output = gr.Textbox(label="🎬 Title", lines=1, interactive=False)
190
+ copy_title_btn = gr.Button("📋 Copy")
191
+ copy_title_btn.click(None, inputs=title_output, outputs=None, js="(text) => {navigator.clipboard.writeText(text);}")
192
+
193
+ # Description + copy button
194
+ desc_output = gr.Textbox(label="📄 Description", lines=3, interactive=False)
195
+ copy_desc_btn = gr.Button("📋 Copy")
196
+ copy_desc_btn.click(None, inputs=desc_output, outputs=None, js="(text) => {navigator.clipboard.writeText(text);}")
197
+
198
+ # ------------------- Generation Callback -------------------
199
+ generate_btn.click(
200
+ fn=process_video,
201
+ inputs=[
202
+ context_input,
203
+ instruction_input,
204
+ duration_slider,
205
+ script_mode,
206
+ custom_script_input,
207
+ lum_slider,
208
+ contrast_slider,
209
+ gamma_slider,
210
+ add_subs_checkbox,
211
+ videos_state,
212
+ user_music,
213
+ show_bar,
214
+ ],
215
+ outputs=[script_output, title_output, desc_output, video_output],
216
+ )
217
+
218
+ demo.launch()
requirements.txt ADDED
Binary file (658 Bytes). View file
 
scripts/edit_video.py ADDED
@@ -0,0 +1,79 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # ============================
2
+ # get_footage.py (unchanged)
3
+ # ============================
4
+
5
+ # (contenu identique à la précédente version – pas de modification)
6
+
7
+ # ============================
8
+ # edit_video.py (révision => musique optionnelle et volume paramétrable)
9
+ # ============================
10
+
11
+ """Assemble la voix IA et, si fourni, la musique de fond.
12
+
13
+ Appel :
14
+ edit_video(
15
+ video_path="./assets/video_music/video_silent.mp4",
16
+ audio_path="./assets/audio/voice.mp3",
17
+ music_path=None, # ou chemin .mp3 / .wav
18
+ output_path="./assets/output/final_video.mp4",
19
+ music_volume=0.10, # volume musique (0‑1)
20
+ )
21
+ """
22
+
23
+ from moviepy import VideoFileClip, AudioFileClip, CompositeAudioClip
24
+ import os
25
+
26
+
27
+ def edit_video(
28
+ video_path: str,
29
+ audio_path: str,
30
+ music_path: str | None,
31
+ output_path: str,
32
+ *,
33
+ music_volume: float = 0.10,
34
+ ):
35
+ video_clip = VideoFileClip(video_path)
36
+ voice_clip = AudioFileClip(audio_path)
37
+ tracks = [voice_clip]
38
+
39
+ if music_path and os.path.isfile(music_path):
40
+ try:
41
+ music_clip = (
42
+ AudioFileClip(music_path)
43
+ .with_volume_scaled(music_volume)
44
+ .with_duration(video_clip.duration)
45
+ )
46
+ tracks.insert(0, music_clip)
47
+ except Exception as err:
48
+ print(f"⚠️ Musique ignorée : {err}")
49
+
50
+ final_audio = CompositeAudioClip(tracks).with_duration(video_clip.duration)
51
+ final_clip = video_clip.with_audio(final_audio)
52
+
53
+ final_clip.write_videofile(
54
+ output_path,
55
+ codec="libx264",
56
+ audio_codec="aac",
57
+ fps=30,
58
+ threads=4,
59
+ preset="medium",
60
+ ffmpeg_params=["-pix_fmt", "yuv420p"],
61
+ )
62
+ print(f"✅ Vidéo générée : {output_path}")
63
+
64
+ video_clip.close()
65
+ voice_clip.close()
66
+ if "music_clip" in locals():
67
+ music_clip.close()
68
+ final_audio.close()
69
+ final_clip.close()
70
+
71
+
72
+ if __name__ == "__main__":
73
+ # Démo rapide (remplacer les chemins par les vôtres)
74
+ edit_video(
75
+ "./assets/video_music/video_silent.mp4",
76
+ "./assets/audio/voice.mp3",
77
+ None,
78
+ "./assets/output/final_video.mp4",
79
+ )
scripts/generate_scripts.py ADDED
@@ -0,0 +1,86 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import os
2
+ import re
3
+ import json
4
+ import torch
5
+ from transformers import AutoModelForCausalLM, AutoTokenizer
6
+ import gradio as gr
7
+ from dotenv import load_dotenv
8
+
9
+
10
+ # Chargement du modèle et du tokenizer
11
+ device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
12
+ import torch
13
+ from transformers import AutoModelForCausalLM, AutoTokenizer
14
+
15
+ model_id = "Qwen/Qwen2.5-0.5B"
16
+
17
+ tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
18
+ model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.float32, trust_remote_code=True)
19
+
20
+
21
+ def generate_local(prompt: str, max_new_tokens: int = 350, temperature: float = 0.7) -> str:
22
+ device = model.device # get the device the model is on
23
+ inputs = tokenizer(prompt, return_tensors="pt").to(device)
24
+
25
+ output_ids = model.generate(
26
+ **inputs,
27
+ max_new_tokens=max_new_tokens,
28
+ do_sample=True,
29
+ temperature=temperature,
30
+ pad_token_id=tokenizer.eos_token_id,
31
+ )
32
+ return tokenizer.decode(output_ids[0], skip_special_tokens=True)
33
+
34
+
35
+
36
+ def generate_script(prompt: str, word_count: int = 60) -> str:
37
+ system_prompt = (
38
+ "You are a professional video scriptwriter. "
39
+ f"Write a script for a short YouTube video about: {prompt.strip()}.\n"
40
+ f"The video must be {word_count} words long, engaging, clear, and formatted as plain text."
41
+ )
42
+ return generate_local(system_prompt)
43
+
44
+
45
+ def one_word(query: str) -> str:
46
+ prompt_final = (
47
+ "Extract only the unique central theme of the following text in English in JSON format like this: "
48
+ '{"keyword": "impact"}. Text: ' + query
49
+ )
50
+ result = generate_local(prompt_final, max_new_tokens=30, temperature=0.4)
51
+ try:
52
+ keyword_json = json.loads(result)
53
+ keyword = keyword_json.get("keyword", "")
54
+ except json.JSONDecodeError:
55
+ matches = re.findall(r'\b[a-zA-Z]{3,}\b', result)
56
+ keyword = matches[0] if matches else ""
57
+ return keyword.lower()
58
+
59
+
60
+ def generate_title(text: str) -> str:
61
+ prompt_final = (
62
+ "Generate a unique title for a YouTube Short video that is engaging and informative, "
63
+ "maximum 100 characters, without emojis, introduction, or explanation. Content:\n" + text
64
+ )
65
+ return generate_local(prompt_final, max_new_tokens=50, temperature=0.9).strip()
66
+
67
+
68
+ def generate_description(text: str) -> str:
69
+ prompt_final = (
70
+ "Write only the YouTube video description in English:\n"
71
+ "1. A compelling opening line.\n"
72
+ "2. A clear summary of the video (max 3 lines).\n"
73
+ "3. End with 3 relevant hashtags.\n"
74
+ "No emojis or introductions. Here is the text:\n" + text
75
+ )
76
+ return generate_local(prompt_final, max_new_tokens=300, temperature=0.7).strip()
77
+
78
+
79
+ def generate_tags(text: str) -> list:
80
+ prompt_final = (
81
+ "List only the important keywords for this YouTube video, separated by commas, "
82
+ "maximum 10 keywords. Context: " + text
83
+ )
84
+ result = generate_local(prompt_final, max_new_tokens=100, temperature=0.5)
85
+ return [tag.strip() for tag in result.split(",") if tag.strip()]
86
+
scripts/generate_subtitles.py ADDED
@@ -0,0 +1,237 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #generate_subtitles.py
2
+
3
+ import random
4
+ import os
5
+ import whisper
6
+ from moviepy import (
7
+ VideoFileClip,
8
+ TextClip,
9
+ CompositeVideoClip,
10
+ ImageClip,
11
+ vfx
12
+ )
13
+ from moviepy.video.fx import FadeIn, Resize
14
+
15
+
16
+ FONT_PATH = "Arial-Bold"
17
+
18
+
19
+
20
+ # Palette de couleurs « flashy »
21
+ SUBTITLE_COLORS = [
22
+ "white", "yellow", "cyan", "deeppink", "gold", "lightgreen", "magenta", "orange"
23
+ ]
24
+
25
+
26
+
27
+
28
+ def color_for_word(word: str) -> str:
29
+ return random.choice(SUBTITLE_COLORS)
30
+
31
+
32
+
33
+
34
+
35
+ def chunk_text_by_words(segments, max_words=1):
36
+ """
37
+ Découpe chaque segment Whisper en mini sous-titres de max_words mots
38
+ pour un affichage plus dynamique.
39
+ """
40
+ print("✂️ Découpage en sous-titres dynamiques (4 mots max)...")
41
+ subs = []
42
+ for seg in segments:
43
+ words = seg['text'].strip().split()
44
+ seg_duration = seg['end'] - seg['start']
45
+ if not words or seg_duration <= 0:
46
+ continue
47
+
48
+ word_duration = seg_duration / len(words)
49
+
50
+ for i in range(0, len(words), max_words):
51
+ chunk_words = words[i:i + max_words]
52
+ chunk_text = " ".join(chunk_words)
53
+ start_time = seg['start'] + i * word_duration
54
+ end_time = start_time + len(chunk_words) * word_duration
55
+
56
+ subs.append({
57
+ "start": start_time,
58
+ "end": end_time,
59
+ "text": chunk_text
60
+ })
61
+
62
+ print(f"🧩 {len(subs)} sous-titres créés (dynamiques).")
63
+ return subs
64
+
65
+
66
+ def save_subtitles_to_srt(subtitles, output_path):
67
+ """
68
+ Sauvegarde les sous-titres au format .srt
69
+ """
70
+ def format_timestamp(seconds):
71
+ h = int(seconds // 3600)
72
+ m = int((seconds % 3600) // 60)
73
+ s = int(seconds % 60)
74
+ ms = int((seconds - int(seconds)) * 1000)
75
+ return f"{h:02}:{m:02}:{s:02},{ms:03}"
76
+
77
+ with open(output_path, "w", encoding="utf-8") as f:
78
+ for i, sub in enumerate(subtitles, 1):
79
+ f.write(f"{i}\n")
80
+ f.write(f"{format_timestamp(sub['start'])} --> {format_timestamp(sub['end'])}\n")
81
+ f.write(f"{sub['text'].strip()}\n\n")
82
+
83
+ def transcribe_audio_to_subs(audio_path):
84
+ """
85
+ Transcrit le fichier audio en texte (via Whisper), retourne la liste
86
+ des segments start/end/text, et sauvegarde en .srt.
87
+ """
88
+ print("🎙️ Transcription avec Whisper...")
89
+ model = whisper.load_model("medium") # ou "small"/"large" selon ton besoin
90
+ result = model.transcribe(audio_path)
91
+
92
+ subtitles = [{
93
+ "start": seg['start'],
94
+ "end": seg['end'],
95
+ "text": seg['text']
96
+ } for seg in result['segments']]
97
+
98
+ print(f"📝 {len(subtitles)} sous-titres générés.")
99
+
100
+ # Sauvegarde .srt
101
+ base_name = os.path.splitext(audio_path)[0]
102
+ srt_path = f"{base_name}.srt"
103
+ save_subtitles_to_srt(subtitles, srt_path)
104
+ print(f"💾 Sous-titres enregistrés dans : {srt_path}")
105
+
106
+ return subtitles
107
+
108
+ def format_subtitle_text(text, max_chars=50):
109
+ """
110
+ Coupe le texte en 2 lignes max (~50 caractères max par ligne)
111
+ pour mieux remplir la vidéo verticale sans déborder.
112
+ """
113
+ words = text.strip().split()
114
+ lines = []
115
+ current_line = ""
116
+
117
+ for word in words:
118
+ if len(current_line + " " + word) <= max_chars:
119
+ current_line += (" " + word if current_line else word)
120
+ else:
121
+ lines.append(current_line.strip())
122
+ current_line = word
123
+ # Ajout de la dernière ligne
124
+ lines.append(current_line.strip())
125
+
126
+ # Retourne uniquement 2 lignes max
127
+ return "\n".join(lines[:2])
128
+
129
+
130
+ def create_animated_subtitle_clip(text, start, end, video_w, video_h):
131
+ """
132
+ Crée un TextClip avec :
133
+ - Couleur aléatoire
134
+ - Fade-in / pop (resize progressif)
135
+ - Position verticale fixe (ajustable) ou légèrement aléatoire
136
+ """
137
+ word = text.strip()
138
+ color = color_for_word(word)
139
+
140
+
141
+ # Mise en forme du texte
142
+
143
+ # Création du clip texte de base
144
+ txt_clip = TextClip(
145
+ text=text,
146
+ font=FONT_PATH,
147
+ font_size=100,
148
+ color=color,
149
+ stroke_color="black",
150
+ stroke_width=6,
151
+ method="caption",
152
+ size=(int(video_w * 0.8), None), # 80% de la largeur, hauteur auto
153
+ text_align="center", # alignement dans la box
154
+ horizontal_align="center", # box centrée horizontalement
155
+ vertical_align="center", # box centrée verticalement
156
+ interline=4,
157
+ transparent=True
158
+ )
159
+
160
+
161
+ y_choices = [int(video_h * 0.45), int(video_h * 0.55), int(video_h * 0.6)]
162
+ base_y = random.choice(y_choices)
163
+
164
+ txt_clip = txt_clip.with_position(("center", base_y))
165
+ txt_clip = txt_clip.with_start(start).with_end(end)
166
+
167
+ # On applique un fadein + un petit effet "pop" qui grandit de 5% sur la durée du chunk
168
+ # 1) fadein de 0.2s
169
+ clip_fadein = FadeIn(duration=0.2).apply(txt_clip)
170
+
171
+ # 2) agrandissement progressif (ex: 1.0 → 1.05 sur la durée)
172
+ duration_subtitle = end - start
173
+ def pop_effect(t):
174
+ if duration_subtitle > 0:
175
+ progress = t / duration_subtitle
176
+ scale = 1.0 + 0.07 * (1 - (1 - progress) ** 3) # easing out cubic
177
+ else:
178
+ scale = 1.0
179
+ return scale
180
+
181
+ resize_effect = Resize(pop_effect)
182
+ clip_pop = resize_effect.apply(clip_fadein) # ✅ Utilisation correcte
183
+
184
+
185
+
186
+ return clip_pop
187
+
188
+
189
+ def add_subtitles_to_video(video_path, subtitles, output_file="./assets/output/video_with_subs.mp4"):
190
+ """
191
+ Insère les sous-titres animés/couleur dans la vidéo,
192
+ recadre en 1080x1920 si besoin et exporte le résultat.
193
+ """
194
+ print("🎬 Insertion des sous-titres optimisés SHORTS...")
195
+
196
+ video = VideoFileClip(video_path)
197
+
198
+ # Force le format vertical 1080×1920 si non conforme
199
+ if (video.w, video.h) != (1080, 1920):
200
+ print("📐 Recadrage vidéo en 1080×1920...")
201
+ video = video.resize((1080, 1920))
202
+
203
+ clips = [video]
204
+
205
+ for sub in subtitles:
206
+ start_time = sub['start']
207
+ end_time = sub['end']
208
+ text_chunk = sub['text']
209
+
210
+ animated_sub_clip = create_animated_subtitle_clip(
211
+ text_chunk, start_time, end_time, video_w=video.w, video_h=video.h
212
+ )
213
+ clips.append(animated_sub_clip)
214
+
215
+
216
+ final = CompositeVideoClip(clips, size=(1080, 1920)).with_duration(video.duration)
217
+
218
+ # Export en MP4 H.264 + AAC, 30 fps
219
+ final.write_videofile(
220
+ output_file,
221
+ codec="libx264",
222
+ audio_codec="aac",
223
+ fps=30,
224
+ threads=4,
225
+ preset="medium",
226
+ ffmpeg_params=["-pix_fmt", "yuv420p"]
227
+ )
228
+
229
+ print(f"✅ Vidéo Shorts/TikTok prête : {output_file}")
230
+
231
+ # test
232
+ if __name__ == "__main__":
233
+ # Exemple de test
234
+ video_path = "assets/backgrounds/video_only.mp4"
235
+ audio_path = "assets/audio/voice.mp3"
236
+ subtitles = transcribe_audio_to_subs(audio_path)
237
+ add_subtitles_to_video(video_path, subtitles, output_file="output_with_subs.mp4")
scripts/generate_voice.py ADDED
@@ -0,0 +1,35 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import os
2
+ import soundfile as sf
3
+ from kokoro import KPipeline
4
+ import random
5
+
6
+ pipeline = KPipeline(lang_code='a') # 'a' for English
7
+
8
+ ENGLISH_VOICES = [
9
+ "af_heart",
10
+ "en_us_amy",
11
+ "en_deep",
12
+ "en_female",
13
+ "en_male"
14
+ ]
15
+
16
+ def generate_voice(text: str, path: str):
17
+ for voice in random.sample(ENGLISH_VOICES, len(ENGLISH_VOICES)):
18
+ try:
19
+ print(f"🔊 Trying voice: {voice}")
20
+ generator = pipeline(text, voice=voice)
21
+
22
+ for i, (gs, ps, audio) in enumerate(generator):
23
+ if i == 0:
24
+ sf.write(path, audio, 24000)
25
+ print(f"✅ Audio saved with {voice} at: {path}")
26
+ return True
27
+ except Exception as e:
28
+ print(f"❌ Failed with {voice}: {e}")
29
+ continue
30
+
31
+ print("🛑 All voices failed.")
32
+ if os.path.exists(path):
33
+ os.remove(path)
34
+ print("🗑️ Removed broken file.")
35
+ return False
scripts/get_footage.py ADDED
@@ -0,0 +1,270 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # get_footage.py
2
+
3
+ import os
4
+ import math
5
+ import random
6
+ import numpy as np
7
+ from moviepy.video.fx.Resize import Resize
8
+ from moviepy.video.fx.LumContrast import LumContrast
9
+ from moviepy.video.fx.CrossFadeIn import CrossFadeIn
10
+ from moviepy.video.fx.CrossFadeOut import CrossFadeOut
11
+ from moviepy.video.fx.GammaCorrection import GammaCorrection
12
+ from moviepy.video.fx.MultiplyColor import MultiplyColor
13
+ from moviepy.video.fx.MultiplySpeed import MultiplySpeed
14
+ from moviepy.video.fx.Scroll import Scroll
15
+
16
+ from moviepy import (
17
+ VideoFileClip,
18
+ TextClip,
19
+ AudioFileClip,
20
+ ImageClip,
21
+ VideoClip,
22
+ concatenate_videoclips,
23
+ CompositeVideoClip
24
+ )
25
+
26
+ FONT_PATH = "C:/Windows/Fonts/arialbd.ttf"
27
+
28
+ def add_pan_effect(clip):
29
+ """
30
+ Applique un effet de pan léger et aléatoire sur l’axe X.
31
+ """
32
+ return Scroll(x_speed=random.uniform(-5, 5), y_speed=0)(clip)
33
+
34
+ def dynamic_effect(clip, lum, contrast, gamma):
35
+ """
36
+ Applique un ensemble d'effets "dynamiques":
37
+ - Zoom progressif
38
+ - Luminosité/contraste aléatoire
39
+ - Gamma correction aléatoire
40
+ - Filtre coloré subtil
41
+ - (Optionnel) Pan horizontal
42
+ - Légère variation de vitesse
43
+ """
44
+ duration = clip.duration
45
+
46
+ # Zoom progressif
47
+ max_zoom_factor = random.uniform(0.02, 0.05) # 2% à 5% d'agrandissement total
48
+ zoomed_clip = clip.with_effects([
49
+ Resize(lambda t: 1 + max_zoom_factor * (t / duration))
50
+ ])
51
+
52
+ # Luminosité/Contraste
53
+ lum_value = lum
54
+ contrast_value = contrast
55
+ lum_clip = zoomed_clip.with_effects([
56
+ LumContrast(lum=lum_value, contrast=contrast_value)
57
+ ])
58
+
59
+ # Gamma correction
60
+ gamma_clip = lum_clip.with_effects([
61
+ GammaCorrection(gamma = gamma)
62
+ ])
63
+
64
+ # Filtre coloré subtil
65
+ color_shift = (
66
+ 1.0 + random.uniform(-0.02, 0.05),
67
+ 1.0 + random.uniform(-0.03, 0.03),
68
+ 1.0 + random.uniform(-0.05, 0.01)
69
+ )
70
+ color_clip = gamma_clip.with_effects([
71
+ MultiplyColor(color_shift)
72
+ ])
73
+
74
+ # (Optionnel) pan horizontal
75
+ # color_clip = add_pan_effect(color_clip)
76
+
77
+ # Variation de vitesse
78
+ speed_factor = random.uniform(0.9, 1.2)
79
+ final_clip = color_clip.with_effects([
80
+ MultiplySpeed(speed_factor)
81
+ ])
82
+
83
+ return final_clip
84
+
85
+
86
+ def add_timer_overlay(clip):
87
+ """
88
+ Ajoute une barre de progression sur la vidéo.
89
+ """
90
+ duration = clip.duration
91
+ overlay_clips = []
92
+
93
+ # Barre de progression
94
+ bar_height = 50
95
+ bar_width = int(clip.w * 0.8)
96
+ bar_x = (clip.w - bar_width) // 2
97
+ bar_y = int(clip.h * 0.10)
98
+
99
+ def make_bar_frame(t):
100
+ progress = min(t / duration, 1.0)
101
+ current_width = int(bar_width * progress)
102
+ frame = np.zeros((bar_height, bar_width, 3), dtype=np.uint8)
103
+ frame[:, :current_width] = [0, 255, 0] # barre verte
104
+ return frame
105
+
106
+ bar_clip = VideoClip(make_bar_frame, duration=duration)
107
+ bar_clip = bar_clip.with_position((bar_x, bar_y))
108
+ overlay_clips.append(bar_clip)
109
+
110
+ # Composition finale
111
+ final = CompositeVideoClip([clip, *overlay_clips], size=clip.size)
112
+ return final
113
+
114
+
115
+ def apply_crossfade_effects(clips, duration=0.12):
116
+ """
117
+ Applique un crossfade (fondu entrée/sortie) entre chaque clip.
118
+ """
119
+ clips_with_fades = []
120
+ for i, clip in enumerate(clips):
121
+ effects = []
122
+ if i != 0:
123
+ effects.append(CrossFadeIn(duration))
124
+ if i != len(clips) - 1:
125
+ effects.append(CrossFadeOut(duration))
126
+ clip_with_effects = clip.with_effects(effects)
127
+ clips_with_fades.append(clip_with_effects)
128
+ return clips_with_fades
129
+
130
+
131
+ def get_video_montage_from_folder(
132
+ folder_path: str = "./assets/videos",
133
+ audio_path: str = "./assets/audio/voice.mp3",
134
+ output_dir: str = "./assets/backgrounds",
135
+ lum: float = 6.0,
136
+ contrast: float = 1.0,
137
+ gamma: float = 1.0,
138
+ show_progress_bar: bool = True,
139
+ ):
140
+ """
141
+ 1) Parcourt tous les fichiers vidéo dans 'folder_path'
142
+ 2) Construit un montage vertical (1080x1920) en appliquant dynamic_effect()
143
+ et un crossfade entre chaque clip.
144
+ 3) La durée totale est bornée à la durée de l'audio (on coupe le surplus).
145
+ 4) Exporte deux versions : avec et sans audio.
146
+ """
147
+
148
+ # Prépare les chemins de sortie
149
+ os.makedirs(output_dir, exist_ok=True)
150
+ output_with_audio = os.path.join(output_dir, "video_with_audio.mp4")
151
+ output_no_audio = os.path.join(output_dir, "video_silent.mp4")
152
+
153
+ # Charge l'audio pour connaître la durée cible
154
+ voiceover = AudioFileClip(audio_path)
155
+ audio_duration = voiceover.duration
156
+ print(f"🎧 Durée audio : {audio_duration:.2f} s")
157
+
158
+ # Liste de tous les fichiers vidéo dans le dossier
159
+ all_videos = [
160
+ f for f in os.listdir(folder_path)
161
+ if f.lower().endswith((".mp4", ".mov", ".avi", ".mkv"))
162
+ ]
163
+
164
+ if not all_videos:
165
+ print(f"❌ Aucune vidéo trouvée dans le dossier : {folder_path}")
166
+ return None, None
167
+
168
+ clips = []
169
+ total_duration = 0.0
170
+
171
+ # Parcours des vidéos dans l'ordre
172
+ for idx, video_file in enumerate(all_videos):
173
+ video_path = os.path.join(folder_path, video_file)
174
+ try:
175
+ clip = VideoFileClip(video_path)
176
+
177
+ # Redimensionne en 1080x1920 (vertical)
178
+ target_w, target_h = 1080, 1920
179
+ clip_ar = clip.w / clip.h
180
+ target_ar = target_w / target_h
181
+
182
+ if clip_ar > target_ar:
183
+ # On adapte la hauteur
184
+ clip = clip.resized(height=target_h)
185
+ # On coupe la largeur
186
+ clip = clip.cropped(width=target_w, x_center=clip.w / 2)
187
+ else:
188
+ # On adapte la largeur
189
+ clip = clip.resized(width=target_w)
190
+ # On coupe la hauteur
191
+ clip = clip.cropped(height=target_h, y_center=clip.h / 2)
192
+
193
+ # Applique l’effet dynamique
194
+ dynamic_clip = dynamic_effect(clip, lum, contrast, gamma)
195
+ clips.append(dynamic_clip)
196
+ total_duration += dynamic_clip.duration
197
+
198
+ # Si la somme dépasse la durée audio, on arrête la boucle
199
+ if total_duration >= audio_duration:
200
+ break
201
+
202
+ except Exception as e:
203
+ print(f"⚠️ Erreur avec le fichier {video_file} : {e}")
204
+
205
+ if not clips:
206
+ print("❌ Aucun clip valide. Montage impossible.")
207
+ return None, None
208
+
209
+ # Crossfade entre les clips
210
+ clips = apply_crossfade_effects(clips, duration=0.15)
211
+
212
+ # Concaténation, borne la durée totale à celle de l'audio
213
+ final_clip = concatenate_videoclips(clips, method="compose").subclipped(0, audio_duration)
214
+
215
+ # Overlay (par ex. barre de progression)
216
+ if show_progress_bar:
217
+ final_clip = add_timer_overlay(final_clip)
218
+
219
+ # --------------------
220
+ # 1) Version AVEC audio
221
+ # --------------------
222
+ final_clip_with_audio = final_clip.with_audio(voiceover)
223
+ final_clip_with_audio.write_videofile(
224
+ output_with_audio,
225
+ codec='libx264',
226
+ audio_codec='aac',
227
+ fps=30,
228
+ threads=4,
229
+ preset="medium",
230
+ ffmpeg_params=["-pix_fmt", "yuv420p"]
231
+ )
232
+ print(f"✅ Montage créé (AVEC audio) : {output_with_audio}")
233
+
234
+
235
+ # --------------------
236
+ # 2) Version SANS audio
237
+ # --------------------
238
+ final_clip.write_videofile(
239
+ output_no_audio,
240
+ codec='libx264',
241
+ fps=30,
242
+ threads=4,
243
+ preset="medium",
244
+ ffmpeg_params=["-pix_fmt", "yuv420p"],
245
+ audio=False
246
+ )
247
+ print(f"✅ Montage créé (SANS audio) : {output_no_audio}")
248
+
249
+ # Libère la mémoire
250
+ for c in clips:
251
+ c.close()
252
+ voiceover.close()
253
+ final_clip.close()
254
+ final_clip_with_audio.close()
255
+
256
+
257
+ return output_with_audio, output_no_audio
258
+
259
+
260
+ # -----------------------------
261
+ # Exemple d'utilisation local
262
+ # -----------------------------
263
+ if __name__ == "__main__":
264
+ # Suppose que vous avez déjà un fichier voice.mp3
265
+ # et un dossier "./assets/videos" contenant plusieurs vidéos.
266
+ get_video_montage_from_folder(
267
+ folder_path="./assets/videos",
268
+ audio_path="./assets/audio/voice.mp3",
269
+ output_dir="./assets/backgrounds"
270
+ )