File size: 8,721 Bytes
780fde8
 
 
 
 
 
 
 
 
 
 
e3729ed
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
---
title: AutoVideoEditor
emoji: πŸ“Š
colorFrom: purple
colorTo: purple
sdk: docker
pinned: false
app_file: app.py
---


# AVE - AI Video Editor

AVE is an advanced AI-powered video editing platform designed to streamline the editing process for portrait-mode, aesthetic videos (such as Instagram Reels). The application leverages Google’s Gemini generative AI, MoviePy for video/audio manipulation, and Flask for the web interface to provide a semi-automated video production process with customizable editing plans.
---

## Table of Contents

- [Overview](#overview)
- [Features](#features)
- [Architecture & Workflow](#architecture--workflow)
- [Installation](#installation)
- [Usage](#usage)
- [Configuration](#configuration)
- [File Structure](#file-structure)
- [Contributing](#contributing)

---

## Overview

AVE is built to help users generate short, high-quality, portrait-mode videos that are stylistically aligned with modern, aesthetic standards. It automatically analyzes the provided media files and leverages Google Gemini's 2.5 Pro to create a comprehensive JSON editing plan. This plan details which segments to include, when to apply effects, and how to arrange clips, thereby greatly reducing manual work.

### How It Works

1. **File Upload and Caching:**  
   Users upload various media files (videos, audios, and images). Each file is hashed and either processed or retrieved from a cache if already available, ensuring efficient re-use of uploads.

2. **Editing Plan Generation:**  
   With the help of Google Gemini API, the system generates a detailed JSON editing plan. The plan follows a specified structure that includes clip order, start and end times, speed adjustments, optional muting, and overall color adjustments.

3. **Video Assembly:**  
   The application then processes the clips based on the generated plan. Using MoviePy and FFMPEG under the hood, clips are trimmed, adjusted (including speed and volume), concatenated, and optionally overlaid with transitions and filters.

4. **Progress and Notifications:**  
   During video processing, users can receive real-time progress updates. Additionally, the system logs all steps for debugging and auditing purposes.

---

## Features

- **AI-Assisted Editing Plan:**  
  Uses Gemini AI to generate a JSON-based editing plan that incorporates advanced parameters:
  - Clip selection based on aesthetics
  - Logical sequencing of clips
  - Optional speed adjustments and mute settings
  - Background audio specification and color adjustment hints

- **Media Upload and Caching:**  
  - Secure file uploads with configurable size limits
  - SHA256-based file hashing for deduplication and caching
  - Resilient file processing with automatic timeout handling

- **Video Processing and Assembly:**  
  - Uses MoviePy and FFMPEG for video processing
  - Supports effects like speed adjustment, volume control, and color grading
  - Designed for both low-resolution previews and full HQ processing

- **Progress Monitoring and Logging:**  
  - Detailed progress updates throughout the processing chain
  - Logging of each step (file upload, processing progress, plan generation, etc.)
  - Error handling with descriptive logging for better troubleshooting

- **User Interface & Experience:**  
  - Modern, responsive, and mobile-friendly web interface built with Flask and a custom HTML/CSS design 
  - Real-time progress indicators and potential for WebSocket integration in future releases

- **Extensibility:**  
  - Well-defined modular structure for adding new editing effects and transitions
  - Prepared for integration with asynchronous task queues (Celery/Redis) for scalability
  - Possible user authentication modules and file storage enhancements in later versions

---

## Architecture & Workflow

1. **Front-End:**  
   - A clean and modern web interface served using Flask and rendered by Jinja2 templates.
   - Supports file selection, form inputs for style description, target duration, and other processing parameters.

2. **Back-End:**  
   - **Flask Application:** Acts as the central point for handling API requests, file uploads, and processing triggers.
   - **File Caching & Upload Worker:** Uses threading to manage file uploads and caching using a SHA256 hash.
   - **Editing Plan Generation:** Interacts with the Gemini AI API to generate and validate a JSON editing plan based on the inputs.
   - **Video Assembly Engine:** Processes individual clips as per the JSON plan using MoviePy and FFMPEG for final video assembly.

3. **Logging & Cleanup:**  
   - Comprehensive logging using Python’s logging framework.
   - Cleanup functions are in place (or scheduled for future automation) for temporary files and cache entries.

---

## Installation

### Prerequisites

- Python 3.8 or above
- [FFmpeg](https://ffmpeg.org/download.html) installed and available in your system’s PATH
- A valid API key for the Gemini AI service

### Clone the Repository

```bash
git clone https://github.com/yourusername/AVE.git
cd AVE
```

### Create and Activate a Virtual Environment

```bash
python -m venv venv
source venv/bin/activate  # On Windows use: venv\Scripts\activate
```

### Install Dependencies

```bash
pip install -r repos/AVE/requirements.txt
```

### Setting Up the API Key

Edit the configuration in the source code (e.g., in `main.py` or via environment variables) and replace `"YOUR_API_KEY"` with your actual Gemini API key. You may also include other configuration details such as `MODEL_NAME`, `UPLOAD_FOLDER`, and target timeouts.

---

## Usage

### Running the Application

Once you have installed the dependencies and configured your API key, you can start the server with:

```bash
python repos/AVE/main.py
```

This will start the Flask server on the configured host/port (defaults to `localhost:7860`).

### Uploading Files and Generating Videos

1. **Navigate to the Web Interface:**  
   Open your web browser and go to `http://localhost:7860`.

2. **Upload Your Media Files:**  
   Use the intuitive interface to upload videos, audio tracks, or images. Ensure the files meet the allowed formats (e.g., `mp4`, `mov`, `mp3`, `jpg`, etc.).

3. **Enter Editing Details:**  
   Fill out the form with your desired style description, target duration, and (optionally) provide a sample video to guide the AI.

4. **Submit and Monitor:**  
   Submit the form. The system will start processing, and you will receive updates about each stage:
   - File upload
   - Editing plan generation
   - Video processing and assembly

5. **Preview and Download:**  
   After processing is complete, preview your generated video directly in the browser. Download the final edited video if satisfied with the result.

---

## Configuration

- **File Uploads:**  
  Configurable settings include allowed file types and maximum file sizes.

- **Timeouts & Caching:**  
  - `MAX_WAIT_TIME` controls how long the application waits for file processing.
  - `CACHE_EXPIRY_SECONDS` determines the cache duration for uploaded files.

- **Server Settings:**  
  The Flask application configuration (like `SERVER_NAME`) is set for both local development and production deployment. Adjust as required when deploying behind proxy servers or on cloud platforms.

- **Logging:**  
  Adjust logging levels in the configuration. The current setup logs INFO and ERROR levels to give detailed runtime feedback while processing files.

---

## File Structure

```
AVE/
β”œβ”€β”€ repos/
β”‚   └── AVE/
β”‚       β”œβ”€β”€ main.py           # Main Flask application and processing code
β”‚       β”œβ”€β”€ requirements.txt  # Project dependencies
β”‚       └── ...               # Other Python modules and helper functions
β”œβ”€β”€ uploads/                  # Directory for user-uploaded files
β”œβ”€β”€ output/                   # Directory for generated and final video output
β”œβ”€β”€ templates/
β”‚   └── index.html            # Main web interface template
└── README.md                 # This file
```

Each section of the code is modularized with clear responsibilities:
- `main.py` handles the overall video editing process, including file upload, plan generation, and video assembly.
- Templates and static files deliver a modern, responsive UI.
- Helper functions manage caching, progress updates, and error handling.

---

## Contributing

Contributions are welcome! If you want to contribute new features, improvements, or fixes:

1. **Fork the Repository:** Create your own fork and clone it locally.
2. **Create a Branch:** Use feature-specific branches (e.g., `feature-websocket-notifications`).
3. **Submit a Pull Request:** Provide detailed explanations of your changes and new features.