Spaces:

thecollabagepatch
/

magenta-retry

Running

App Files Files Community

magenta-retry / docs /performance.md

thecollabagepatch

extracted markdown from app.py into docs

8f1aba9 6 days ago

preview code

raw

history blame

982 Bytes

Current observations

L40S 48GB → faster than realtime. Use pace:"realtime" to avoid client over‑buffering.
L4 24GB → slightly below realtime even with pre‑roll buffering, TF32/Autotune, smaller chunks (max_decode_frames), and the base checkpoint.

Practical guidance

For consistent realtime, target ~40GB VRAM per active stream (e.g., A100 40GB, or MIG slices ≈ 35–40GB on newer GPUs).
Keep client‑side overlap‑add (25–40 ms) for seamless chunk joins.
Prefer pace:"realtime" once playback begins; use ASAP only to build a short pre‑roll if needed.
Optional knob: max_decode_frames (default 50 ≈ 2.0 s). Reducing to 36–45 can lower per‑chunk latency/VRAM, but doesn't increase frames/sec throughput.

Concurrency

This research build is designed for one active jam per GPU. Concurrency would require GPU partitioning (MIG) or horizontal scaling with a session scheduler.