Spaces:
Running
Running
update system prompt
Browse files
app.py
CHANGED
@@ -223,6 +223,7 @@ Duration Guidelines:
|
|
223 |
- Video generation: typically 60-180 seconds
|
224 |
- Audio/music generation: typically 30-90 seconds
|
225 |
- Model loading + inference: add 10-30s buffer
|
|
|
226 |
|
227 |
Functions that typically need @spaces.GPU:
|
228 |
- Image generation (text-to-image, image-to-image)
|
@@ -231,6 +232,43 @@ Functions that typically need @spaces.GPU:
|
|
231 |
- Model inference with transformers, diffusers
|
232 |
- Any function using .to('cuda') or GPU operations
|
233 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
234 |
## Complete Gradio API Reference
|
235 |
|
236 |
This reference is automatically synced from https://www.gradio.app/llms.txt to ensure accuracy.
|
@@ -278,6 +316,7 @@ Duration Guidelines:
|
|
278 |
- Video generation: typically 60-180 seconds
|
279 |
- Audio/music generation: typically 30-90 seconds
|
280 |
- Model loading + inference: add 10-30s buffer
|
|
|
281 |
|
282 |
Functions that typically need @spaces.GPU:
|
283 |
- Image generation (text-to-image, image-to-image)
|
@@ -286,6 +325,43 @@ Functions that typically need @spaces.GPU:
|
|
286 |
- Model inference with transformers, diffusers
|
287 |
- Any function using .to('cuda') or GPU operations
|
288 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
289 |
## Complete Gradio API Reference
|
290 |
|
291 |
This reference is automatically synced from https://www.gradio.app/llms.txt to ensure accuracy.
|
|
|
223 |
- Video generation: typically 60-180 seconds
|
224 |
- Audio/music generation: typically 30-90 seconds
|
225 |
- Model loading + inference: add 10-30s buffer
|
226 |
+
- AoT compilation during startup: use @spaces.GPU(duration=1500) for maximum allowed duration
|
227 |
|
228 |
Functions that typically need @spaces.GPU:
|
229 |
- Image generation (text-to-image, image-to-image)
|
|
|
232 |
- Model inference with transformers, diffusers
|
233 |
- Any function using .to('cuda') or GPU operations
|
234 |
|
235 |
+
## Advanced ZeroGPU Optimization (Recommended)
|
236 |
+
|
237 |
+
For production Spaces with heavy models, consider ahead-of-time (AoT) compilation for 1.3x-1.8x speedups:
|
238 |
+
|
239 |
+
```python
|
240 |
+
import spaces
|
241 |
+
import torch
|
242 |
+
from diffusers import DiffusionPipeline
|
243 |
+
|
244 |
+
pipe = DiffusionPipeline.from_pretrained(..., torch_dtype=torch.bfloat16)
|
245 |
+
pipe.to('cuda')
|
246 |
+
|
247 |
+
@spaces.GPU(duration=1500) # Max duration for compilation
|
248 |
+
def compile_transformer():
|
249 |
+
with spaces.aoti_capture(pipe.transformer) as call:
|
250 |
+
pipe("arbitrary example prompt")
|
251 |
+
|
252 |
+
exported = torch.export.export(
|
253 |
+
pipe.transformer,
|
254 |
+
args=call.args,
|
255 |
+
kwargs=call.kwargs,
|
256 |
+
)
|
257 |
+
return spaces.aoti_compile(exported)
|
258 |
+
|
259 |
+
compiled_transformer = compile_transformer()
|
260 |
+
spaces.aoti_apply(compiled_transformer, pipe.transformer)
|
261 |
+
|
262 |
+
@spaces.GPU
|
263 |
+
def generate(prompt):
|
264 |
+
return pipe(prompt).images
|
265 |
+
```
|
266 |
+
|
267 |
+
Optional enhancements:
|
268 |
+
- FP8 quantization with torchao for additional 1.2x speedup (H200 compatible)
|
269 |
+
- Dynamic shapes for variable input sizes
|
270 |
+
- FlashAttention-3 via kernels library for attention speedups
|
271 |
+
|
272 |
## Complete Gradio API Reference
|
273 |
|
274 |
This reference is automatically synced from https://www.gradio.app/llms.txt to ensure accuracy.
|
|
|
316 |
- Video generation: typically 60-180 seconds
|
317 |
- Audio/music generation: typically 30-90 seconds
|
318 |
- Model loading + inference: add 10-30s buffer
|
319 |
+
- AoT compilation during startup: use @spaces.GPU(duration=1500) for maximum allowed duration
|
320 |
|
321 |
Functions that typically need @spaces.GPU:
|
322 |
- Image generation (text-to-image, image-to-image)
|
|
|
325 |
- Model inference with transformers, diffusers
|
326 |
- Any function using .to('cuda') or GPU operations
|
327 |
|
328 |
+
## Advanced ZeroGPU Optimization (Recommended)
|
329 |
+
|
330 |
+
For production Spaces with heavy models, consider ahead-of-time (AoT) compilation for 1.3x-1.8x speedups:
|
331 |
+
|
332 |
+
```python
|
333 |
+
import spaces
|
334 |
+
import torch
|
335 |
+
from diffusers import DiffusionPipeline
|
336 |
+
|
337 |
+
pipe = DiffusionPipeline.from_pretrained(..., torch_dtype=torch.bfloat16)
|
338 |
+
pipe.to('cuda')
|
339 |
+
|
340 |
+
@spaces.GPU(duration=1500) # Max duration for compilation
|
341 |
+
def compile_transformer():
|
342 |
+
with spaces.aoti_capture(pipe.transformer) as call:
|
343 |
+
pipe("arbitrary example prompt")
|
344 |
+
|
345 |
+
exported = torch.export.export(
|
346 |
+
pipe.transformer,
|
347 |
+
args=call.args,
|
348 |
+
kwargs=call.kwargs,
|
349 |
+
)
|
350 |
+
return spaces.aoti_compile(exported)
|
351 |
+
|
352 |
+
compiled_transformer = compile_transformer()
|
353 |
+
spaces.aoti_apply(compiled_transformer, pipe.transformer)
|
354 |
+
|
355 |
+
@spaces.GPU
|
356 |
+
def generate(prompt):
|
357 |
+
return pipe(prompt).images
|
358 |
+
```
|
359 |
+
|
360 |
+
Optional enhancements:
|
361 |
+
- FP8 quantization with torchao for additional 1.2x speedup (H200 compatible)
|
362 |
+
- Dynamic shapes for variable input sizes
|
363 |
+
- FlashAttention-3 via kernels library for attention speedups
|
364 |
+
|
365 |
## Complete Gradio API Reference
|
366 |
|
367 |
This reference is automatically synced from https://www.gradio.app/llms.txt to ensure accuracy.
|