magenta-retry / docs /about_status.md
thecollabagepatch's picture
extracted markdown from app.py into docs
8f1aba9

What this is

We're exploring AI‑assisted loop‑based music creation that can run on GPUs (not just TPUs) and stream to apps in realtime.

Implemented backends

  • HTTP (bar‑aligned): /generate, /jam/start, /jam/next, /jam/stop, /jam/update, etc.
  • WebSocket (realtime): ws://…/ws/jam with mode="rt" (Colab‑style continuous chunks). New in this build.

What we learned (GPU notes)

  • L40S 48GB: comfortably faster than realtime → we added a pace: "realtime" switch so the server doesn't outrun playback.
  • L4 24GB: consistently just under realtime; even with pre‑roll buffering, TF32/JAX tunings, reduced chunk size, and the base checkpoint, we still see eventual under‑runs.
  • Implication: For production‑quality realtime, aim for ~40GB VRAM per user/session (e.g., A100 40GB, or MIG slices ≈ 35–40GB on newer parts). Smaller GPUs can demo, but sustained realtime is not reliable.

Model / audio specs

  • Model: MagentaRT (T5X; decoder RVQ depth = 16)
  • Audio: 48 kHz stereo, 2.0 s chunks by default, 40 ms crossfade
  • Context: 10 s rolling context window