Spaces:

thecollabagepatch
/

magenta-retry

Running

App Files Files Community

magenta-retry / docs /performance.md

thecollabagepatch's picture

thecollabagepatch

extracted markdown from app.py into docs

8f1aba9 6 days ago

|

history blame contribute delete

982 Bytes

	### Current observations
	- L40S 48GB → faster than realtime. Use `pace:"realtime"` to avoid client over‑buffering.
	- L4 24GB → slightly below realtime even with pre‑roll buffering, TF32/Autotune, smaller chunks (`max_decode_frames`), and the base checkpoint.

	### Practical guidance
	- For consistent realtime, target ~40GB VRAM per active stream (e.g., A100 40GB, or MIG slices ≈ 35–40GB on newer GPUs).
	- Keep client‑side overlap‑add (25–40 ms) for seamless chunk joins.
	- Prefer `pace:"realtime"` once playback begins; use ASAP only to build a short pre‑roll if needed.
	- Optional knob: `max_decode_frames` (default 50 ≈ 2.0 s). Reducing to 36–45 can lower per‑chunk latency/VRAM, but doesn't increase frames/sec throughput.

	### Concurrency
	This research build is designed for one active jam per GPU. Concurrency would require GPU partitioning (MIG) or horizontal scaling with a session scheduler.