Spaces:

thecollabagepatch
/

magenta-retry

Running

magenta-retry / docs /about_status.md

extracted markdown from app.py into docs

8f1aba9 6 days ago

1.16 kB

What this is

We're exploring AI‑assisted loop‑based music creation that can run on GPUs (not just TPUs) and stream to apps in realtime.

HTTP (bar‑aligned): /generate, /jam/start, /jam/next, /jam/stop, /jam/update, etc.
WebSocket (realtime): ws://…/ws/jam with mode="rt" (Colab‑style continuous chunks). New in this build.

L40S 48GB: comfortably faster than realtime → we added a pace: "realtime" switch so the server doesn't outrun playback.
L4 24GB: consistently just under realtime; even with pre‑roll buffering, TF32/JAX tunings, reduced chunk size, and the base checkpoint, we still see eventual under‑runs.
Implication: For production‑quality realtime, aim for ~40GB VRAM per user/session (e.g., A100 40GB, or MIG slices ≈ 35–40GB on newer parts). Smaller GPUs can demo, but sustained realtime is not reliable.