## What this is We're exploring AI‑assisted loop‑based music creation that can run on GPUs (not just TPUs) and stream to apps in realtime. ### Implemented backends - **HTTP (bar‑aligned):** `/generate`, `/jam/start`, `/jam/next`, `/jam/stop`, `/jam/update`, etc. - **WebSocket (realtime):** `ws://…/ws/jam` with `mode="rt"` (Colab‑style continuous chunks). New in this build. ## What we learned (GPU notes) - **L40S 48GB:** comfortably **faster than realtime** → we added a `pace: "realtime"` switch so the server doesn't outrun playback. - **L4 24GB:** **consistently just under realtime**; even with pre‑roll buffering, TF32/JAX tunings, reduced chunk size, and the **base** checkpoint, we still see eventual under‑runs. - **Implication:** For production‑quality realtime, aim for ~**40GB VRAM** per user/session (e.g., **A100 40GB**, or MIG slices ≈ **35–40GB** on newer parts). Smaller GPUs can demo, but sustained realtime is not reliable. ## Model / audio specs - **Model:** MagentaRT (T5X; decoder RVQ depth = 16) - **Audio:** 48 kHz stereo, 2.0 s chunks by default, 40 ms crossfade - **Context:** 10 s rolling context window