magenta-retry / docs /about_status.md
thecollabagepatch's picture
extracted markdown from app.py into docs
8f1aba9
|
raw
history blame
1.16 kB

What this is

We're exploring AI‑assisted loop‑based music creation that can run on GPUs (not just TPUs) and stream to apps in realtime.

Implemented backends

  • HTTP (bar‑aligned): /generate, /jam/start, /jam/next, /jam/stop, /jam/update, etc.
  • WebSocket (realtime): ws://…/ws/jam with mode="rt" (Colab‑style continuous chunks). New in this build.

What we learned (GPU notes)

  • L40S 48GB: comfortably faster than realtime → we added a pace: "realtime" switch so the server doesn't outrun playback.
  • L4 24GB: consistently just under realtime; even with pre‑roll buffering, TF32/JAX tunings, reduced chunk size, and the base checkpoint, we still see eventual under‑runs.
  • Implication: For production‑quality realtime, aim for ~40GB VRAM per user/session (e.g., A100 40GB, or MIG slices ≈ 35–40GB on newer parts). Smaller GPUs can demo, but sustained realtime is not reliable.

Model / audio specs

  • Model: MagentaRT (T5X; decoder RVQ depth = 16)
  • Audio: 48 kHz stereo, 2.0 s chunks by default, 40 ms crossfade
  • Context: 10 s rolling context window