thecollabagepatch commited on
Commit
8f1aba9
·
1 Parent(s): b6c83bc

extracted markdown from app.py into docs

Browse files
Files changed (7) hide show
  1. Dockerfile +3 -0
  2. app.py +17 -198
  3. docs/about_status.md +16 -0
  4. docs/api_http.md +37 -0
  5. docs/api_websocket.md +101 -0
  6. docs/changelog.md +11 -0
  7. docs/performance.md +12 -0
Dockerfile CHANGED
@@ -145,6 +145,9 @@ COPY --chown=appuser:appuser jam_worker.py /home/appuser/app/jam_worker.py
145
  COPY --chown=appuser:appuser one_shot_generation.py /home/appuser/app/one_shot_generation.py
146
  COPY --chown=appuser:appuser documentation.html /home/appuser/app/documentation.html
147
 
 
 
 
148
  USER appuser
149
 
150
  EXPOSE 7860
 
145
  COPY --chown=appuser:appuser one_shot_generation.py /home/appuser/app/one_shot_generation.py
146
  COPY --chown=appuser:appuser documentation.html /home/appuser/app/documentation.html
147
 
148
+ # Create docs directory and copy documentation files
149
+ COPY --chown=appuser:appuser docs/ /home/appuser/app/docs/
150
+
151
  USER appuser
152
 
153
  EXPOSE 7860
app.py CHANGED
@@ -292,6 +292,17 @@ def _patch_t5x_for_gpu_coords():
292
  # Call the patch immediately at import time (before MagentaRT init)
293
  _patch_t5x_for_gpu_coords()
294
 
 
 
 
 
 
 
 
 
 
 
 
295
  def create_documentation_interface():
296
  """Create a Gradio interface for documentation and transparency"""
297
  with gr.Blocks(title="MagentaRT Research API", theme=gr.themes.Soft()) as interface:
@@ -311,223 +322,31 @@ continuous music either as **bar-aligned chunks over HTTP** or as **low-latency
311
  # About & current status
312
  # ------------------------------------------------------------------
313
  with gr.Tab("📖 About & Status"):
314
- gr.Markdown(
315
- r"""
316
- ## What this is
317
- We're exploring AI‑assisted loop‑based music creation that can run on GPUs (not just TPUs) and stream to apps in realtime.
318
-
319
- ### Implemented backends
320
- - **HTTP (bar‑aligned):** `/generate`, `/jam/start`, `/jam/next`, `/jam/stop`, `/jam/update`, etc.
321
- - **WebSocket (realtime):** `ws://…/ws/jam` with `mode="rt"` (Colab‑style continuous chunks). New in this build.
322
-
323
- ## What we learned (GPU notes)
324
- - **L40S 48GB:** comfortably **faster than realtime** → we added a `pace: "realtime"` switch so the server doesn’t outrun playback.
325
- - **L4 24GB:** **consistently just under realtime**; even with pre‑roll buffering, TF32/JAX tunings, reduced chunk size, and the **base** checkpoint, we still see eventual under‑runs.
326
- - **Implication:** For production‑quality realtime, aim for ~**40GB VRAM** per user/session (e.g., **A100 40GB**, or MIG slices ≈ **35–40GB** on newer parts). Smaller GPUs can demo, but sustained realtime is not reliable.
327
-
328
- ## Model / audio specs
329
- - **Model:** MagentaRT (T5X; decoder RVQ depth = 16)
330
- - **Audio:** 48 kHz stereo, 2.0 s chunks by default, 40 ms crossfade
331
- - **Context:** 10 s rolling context window
332
- """
333
- )
334
 
335
  # ------------------------------------------------------------------
336
  # HTTP API
337
  # ------------------------------------------------------------------
338
  with gr.Tab("🔧 API (HTTP)"):
339
- gr.Markdown(
340
- r"""
341
- ### Single Generation
342
- ```bash
343
- curl -X POST \
344
- "$HOST/generate" \
345
- -F "loop_audio=@drum_loop.wav" \
346
- -F "bpm=120" \
347
- -F "bars=8" \
348
- -F "styles=acid house,techno" \
349
- -F "guidance_weight=5.0" \
350
- -F "temperature=1.1"
351
- ```
352
-
353
- ### Continuous Jamming (bar‑aligned, HTTP)
354
- ```bash
355
- # 1) Start a session
356
- echo $(curl -s -X POST "$HOST/jam/start" \
357
- -F "loop_audio=@loop.wav" \
358
- -F "bpm=120" \
359
- -F "bars_per_chunk=8") | jq .
360
- # → {"session_id":"…"}
361
-
362
- # 2) Pull next chunk (repeat)
363
- curl "$HOST/jam/next?session_id=$SESSION"
364
-
365
- # 3) Stop
366
- curl -X POST "$HOST/jam/stop" \
367
- -H "Content-Type: application/json" \
368
- -d '{"session_id":"'$SESSION'"}'
369
- ```
370
-
371
- ### Common parameters
372
- - **bpm** *(int)* – beats per minute
373
- - **bars / bars_per_chunk** *(int)* – musical length
374
- - **styles** *(str)* – comma‑separated text prompts (mixed internally)
375
- - **guidance_weight** *(float)* – style adherence (CFG weight)
376
- - **temperature / topk** – sampling controls
377
- - **intro_bars_to_drop** *(int, /generate)* – generate-and-trim intro
378
- """
379
- )
380
 
381
  # ------------------------------------------------------------------
382
- # WebSocket API: realtime (rt mode)
383
  # ------------------------------------------------------------------
384
  with gr.Tab("🧩 API (WebSocket • rt mode)"):
385
- gr.Markdown(
386
- r"""
387
- Connect to `wss://…/ws/jam` and send a **JSON control stream**. In `rt` mode the server emits ~2 s WAV chunks (or binary frames) continuously.
388
-
389
- ### Start (client → server)
390
- ```jsonc
391
- {
392
- "type": "start",
393
- "mode": "rt",
394
- "binary_audio": false, // true → raw WAV bytes + separate chunk_meta
395
- "params": {
396
- "styles": "heavy metal", // or "jazz, hiphop"
397
- "style_weights": "1.0,1.0", // optional, auto‑normalized
398
- "temperature": 1.1,
399
- "topk": 40,
400
- "guidance_weight": 1.1,
401
- "pace": "realtime", // "realtime" | "asap" (default)
402
- "max_decode_frames": 50 // 50≈2.0s; try 36–45 on smaller GPUs
403
- }
404
- }
405
- ```
406
-
407
- ### Server events (server → client)
408
- - `{"type":"started","mode":"rt"}` – handshake
409
- - `{"type":"chunk","audio_base64":"…","metadata":{…}}` – base64 WAV
410
- - `metadata.sample_rate` *(int)* – usually 48000
411
- - `metadata.chunk_frames` *(int)* – e.g., 50
412
- - `metadata.chunk_seconds` *(float)* – frames / 25.0
413
- - `metadata.crossfade_seconds` *(float)* – typically 0.04
414
- - `{"type":"chunk_meta","metadata":{…}}` – sent **after** a binary frame when `binary_audio=true`
415
- - `{"type":"status",…}`, `{"type":"error",…}`, `{"type":"stopped"}`
416
-
417
- ### Update (client → server)
418
- ```jsonc
419
- {
420
- "type": "update",
421
- "styles": "jazz, hiphop",
422
- "style_weights": "1.0,0.8",
423
- "temperature": 1.2,
424
- "topk": 64,
425
- "guidance_weight": 1.0,
426
- "pace": "realtime", // optional live flip
427
- "max_decode_frames": 40 // optional; <= 50
428
- }
429
- ```
430
-
431
- ### Stop / ping
432
- ```json
433
- {"type":"stop"}
434
- {"type":"ping"}
435
- ```
436
-
437
- ### Browser quick‑start (schedules seamlessly with 25–40 ms crossfade)
438
- ```html
439
- <script>
440
- const XFADE = 0.025; // 25 ms
441
- let ctx, gain, ws, nextTime = 0;
442
- async function start(){
443
- ctx = new (window.AudioContext||window.webkitAudioContext)();
444
- gain = ctx.createGain(); gain.connect(ctx.destination);
445
- ws = new WebSocket("wss://YOUR_SPACE/ws/jam");
446
- ws.onopen = ()=> ws.send(JSON.stringify({
447
- type:"start", mode:"rt", binary_audio:false,
448
- params:{ styles:"warmup", temperature:1.1, topk:40, guidance_weight:1.1, pace:"realtime" }
449
- }));
450
- ws.onmessage = async ev => {
451
- const msg = JSON.parse(ev.data);
452
- if (msg.type === "chunk" && msg.audio_base64){
453
- const bin = atob(msg.audio_base64); const buf = new Uint8Array(bin.length);
454
- for (let i=0;i<bin.length;i++) buf[i] = bin.charCodeAt(i);
455
- const ab = buf.buffer; const audio = await ctx.decodeAudioData(ab);
456
- const src = ctx.createBufferSource(); const g = ctx.createGain();
457
- src.buffer = audio; src.connect(g); g.connect(gain);
458
- if (nextTime < ctx.currentTime + 0.05) nextTime = ctx.currentTime + 0.12;
459
- const startAt = nextTime, dur = audio.duration;
460
- nextTime = startAt + Math.max(0, dur - XFADE);
461
- g.gain.setValueAtTime(0, startAt);
462
- g.gain.linearRampToValueAtTime(1, startAt + XFADE);
463
- g.gain.setValueAtTime(1, startAt + Math.max(0, dur - XFADE));
464
- g.gain.linearRampToValueAtTime(0, startAt + dur);
465
- src.start(startAt);
466
- }
467
- };
468
- }
469
- </script>
470
- ```
471
-
472
- ### Python client (async)
473
- ```python
474
- import asyncio, json, websockets, base64, soundfile as sf, io
475
- async def run(url):
476
- async with websockets.connect(url) as ws:
477
- await ws.send(json.dumps({"type":"start","mode":"rt","binary_audio":False,
478
- "params": {"styles":"warmup","temperature":1.1,"topk":40,"guidance_weight":1.1,"pace":"realtime"}}))
479
- while True:
480
- msg = json.loads(await ws.recv())
481
- if msg.get("type") == "chunk":
482
- wav = base64.b64decode(msg["audio_base64"]) # bytes of a WAV
483
- x, sr = sf.read(io.BytesIO(wav), dtype="float32")
484
- print("chunk", x.shape, sr)
485
- elif msg.get("type") in ("stopped","error"): break
486
- asyncio.run(run("wss://YOUR_SPACE/ws/jam"))
487
- ```
488
- """
489
- )
490
 
491
  # ------------------------------------------------------------------
492
  # Performance & hardware guidance
493
  # ------------------------------------------------------------------
494
  with gr.Tab("📊 Performance & Hardware"):
495
- gr.Markdown(
496
- r"""
497
- ### Current observations
498
- - **L40S 48GB** → faster than realtime. Use `pace:"realtime"` to avoid client over‑buffering.
499
- - **L4 24GB** → slightly **below** realtime even with pre‑roll buffering, TF32/Autotune, smaller chunks (`max_decode_frames`), and the **base** checkpoint.
500
-
501
- ### Practical guidance
502
- - For consistent realtime, target **~40GB VRAM per active stream** (e.g., **A100 40GB**, or MIG slices ≈ **35–40GB** on newer GPUs).
503
- - Keep client‑side **overlap‑add** (25–40 ms) for seamless chunk joins.
504
- - Prefer **`pace:"realtime"`** once playback begins; use **ASAP** only to build a short pre‑roll if needed.
505
- - Optional knob: **`max_decode_frames`** (default **50** ≈ 2.0 s). Reducing to **36–45** can lower per‑chunk latency/VRAM, but doesn’t increase frames/sec throughput.
506
-
507
- ### Concurrency
508
- This research build is designed for **one active jam per GPU**. Concurrency would require GPU partitioning (MIG) or horizontal scaling with a session scheduler.
509
- """
510
- )
511
 
512
  # ------------------------------------------------------------------
513
  # Changelog & legal
514
  # ------------------------------------------------------------------
515
  with gr.Tab("🗒️ Changelog & Legal"):
516
- gr.Markdown(
517
- r"""
518
- ### Recent changes
519
- - New **WebSocket realtime** route: `/ws/jam` (`mode:"rt"`)
520
- - Added server pacing flag: `pace: "realtime" | "asap"`
521
- - Exposed `max_decode_frames` for shorter chunks on smaller GPUs
522
- - Client test page now does proper **overlap���add** crossfade between chunks
523
-
524
- ### Licensing
525
- This project uses MagentaRT under:
526
- - **Code:** Apache 2.0
527
- - **Model weights:** CC‑BY 4.0
528
- Please review the MagentaRT repo for full terms.
529
- """
530
- )
531
 
532
  gr.Markdown(
533
  r"""
 
292
  # Call the patch immediately at import time (before MagentaRT init)
293
  _patch_t5x_for_gpu_coords()
294
 
295
+ def load_doc_content(filename: str) -> str:
296
+ """Load markdown content from docs directory, with fallback."""
297
+ try:
298
+ doc_path = Path(__file__).parent / "docs" / filename
299
+ return doc_path.read_text(encoding='utf-8')
300
+ except FileNotFoundError:
301
+ return f"⚠️ Documentation file `{filename}` not found. Please check the docs directory."
302
+ except Exception as e:
303
+ return f"⚠️ Error loading `{filename}`: {e}"
304
+
305
+
306
  def create_documentation_interface():
307
  """Create a Gradio interface for documentation and transparency"""
308
  with gr.Blocks(title="MagentaRT Research API", theme=gr.themes.Soft()) as interface:
 
322
  # About & current status
323
  # ------------------------------------------------------------------
324
  with gr.Tab("📖 About & Status"):
325
+ gr.Markdown(load_doc_content("about_status.md"))
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
326
 
327
  # ------------------------------------------------------------------
328
  # HTTP API
329
  # ------------------------------------------------------------------
330
  with gr.Tab("🔧 API (HTTP)"):
331
+ gr.Markdown(load_doc_content("api_http.md"))
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
332
 
333
  # ------------------------------------------------------------------
334
+ # WebSocket API: realtime ('rt' mode)
335
  # ------------------------------------------------------------------
336
  with gr.Tab("🧩 API (WebSocket • rt mode)"):
337
+ gr.Markdown(load_doc_content("api_websocket.md"))
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
338
 
339
  # ------------------------------------------------------------------
340
  # Performance & hardware guidance
341
  # ------------------------------------------------------------------
342
  with gr.Tab("📊 Performance & Hardware"):
343
+ gr.Markdown(load_doc_content("performance.md"))
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
344
 
345
  # ------------------------------------------------------------------
346
  # Changelog & legal
347
  # ------------------------------------------------------------------
348
  with gr.Tab("🗒️ Changelog & Legal"):
349
+ gr.Markdown(load_doc_content("changelog.md"))
 
 
 
 
 
 
 
 
 
 
 
 
 
 
350
 
351
  gr.Markdown(
352
  r"""
docs/about_status.md ADDED
@@ -0,0 +1,16 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ## What this is
2
+ We're exploring AI‑assisted loop‑based music creation that can run on GPUs (not just TPUs) and stream to apps in realtime.
3
+
4
+ ### Implemented backends
5
+ - **HTTP (bar‑aligned):** `/generate`, `/jam/start`, `/jam/next`, `/jam/stop`, `/jam/update`, etc.
6
+ - **WebSocket (realtime):** `ws://…/ws/jam` with `mode="rt"` (Colab‑style continuous chunks). New in this build.
7
+
8
+ ## What we learned (GPU notes)
9
+ - **L40S 48GB:** comfortably **faster than realtime** → we added a `pace: "realtime"` switch so the server doesn't outrun playback.
10
+ - **L4 24GB:** **consistently just under realtime**; even with pre‑roll buffering, TF32/JAX tunings, reduced chunk size, and the **base** checkpoint, we still see eventual under‑runs.
11
+ - **Implication:** For production‑quality realtime, aim for ~**40GB VRAM** per user/session (e.g., **A100 40GB**, or MIG slices ≈ **35–40GB** on newer parts). Smaller GPUs can demo, but sustained realtime is not reliable.
12
+
13
+ ## Model / audio specs
14
+ - **Model:** MagentaRT (T5X; decoder RVQ depth = 16)
15
+ - **Audio:** 48 kHz stereo, 2.0 s chunks by default, 40 ms crossfade
16
+ - **Context:** 10 s rolling context window
docs/api_http.md ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ### Single Generation
2
+ ```bash
3
+ curl -X POST \
4
+ "$HOST/generate" \
5
+ -F "loop_audio=@drum_loop.wav" \
6
+ -F "bpm=120" \
7
+ -F "bars=8" \
8
+ -F "styles=acid house,techno" \
9
+ -F "guidance_weight=5.0" \
10
+ -F "temperature=1.1"
11
+ ```
12
+
13
+ ### Continuous Jamming (bar‑aligned, HTTP)
14
+ ```bash
15
+ # 1) Start a session
16
+ echo $(curl -s -X POST "$HOST/jam/start" \
17
+ -F "loop_audio=@loop.wav" \
18
+ -F "bpm=120" \
19
+ -F "bars_per_chunk=8") | jq .
20
+ # → {"session_id":"…"}
21
+
22
+ # 2) Pull next chunk (repeat)
23
+ curl "$HOST/jam/next?session_id=$SESSION"
24
+
25
+ # 3) Stop
26
+ curl -X POST "$HOST/jam/stop" \
27
+ -H "Content-Type: application/json" \
28
+ -d '{"session_id":"'$SESSION'"}'
29
+ ```
30
+
31
+ ### Common parameters
32
+ - **bpm** *(int)* – beats per minute
33
+ - **bars / bars_per_chunk** *(int)* – musical length
34
+ - **styles** *(str)* – comma‑separated text prompts (mixed internally)
35
+ - **guidance_weight** *(float)* – style adherence (CFG weight)
36
+ - **temperature / topk** – sampling controls
37
+ - **intro_bars_to_drop** *(int, /generate)* – generate-and-trim intro
docs/api_websocket.md ADDED
@@ -0,0 +1,101 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Connect to `wss://…/ws/jam` and send a **JSON control stream**. In `rt` mode the server emits ~2 s WAV chunks (or binary frames) continuously.
2
+
3
+ ### Start (client → server)
4
+ ```jsonc
5
+ {
6
+ "type": "start",
7
+ "mode": "rt",
8
+ "binary_audio": false, // true → raw WAV bytes + separate chunk_meta
9
+ "params": {
10
+ "styles": "heavy metal", // or "jazz, hiphop"
11
+ "style_weights": "1.0,1.0", // optional, auto‑normalized
12
+ "temperature": 1.1,
13
+ "topk": 40,
14
+ "guidance_weight": 1.1,
15
+ "pace": "realtime", // "realtime" | "asap" (default)
16
+ "max_decode_frames": 50 // 50≈2.0s; try 36–45 on smaller GPUs
17
+ }
18
+ }
19
+ ```
20
+
21
+ ### Server events (server → client)
22
+ - `{"type":"started","mode":"rt"}` – handshake
23
+ - `{"type":"chunk","audio_base64":"…","metadata":{…}}` – base64 WAV
24
+ - `metadata.sample_rate` *(int)* – usually 48000
25
+ - `metadata.chunk_frames` *(int)* – e.g., 50
26
+ - `metadata.chunk_seconds` *(float)* – frames / 25.0
27
+ - `metadata.crossfade_seconds` *(float)* – typically 0.04
28
+ - `{"type":"chunk_meta","metadata":{…}}` – sent **after** a binary frame when `binary_audio=true`
29
+ - `{"type":"status",…}`, `{"type":"error",…}`, `{"type":"stopped"}`
30
+
31
+ ### Update (client → server)
32
+ ```jsonc
33
+ {
34
+ "type": "update",
35
+ "styles": "jazz, hiphop",
36
+ "style_weights": "1.0,0.8",
37
+ "temperature": 1.2,
38
+ "topk": 64,
39
+ "guidance_weight": 1.0,
40
+ "pace": "realtime", // optional live flip
41
+ "max_decode_frames": 40 // optional; <= 50
42
+ }
43
+ ```
44
+
45
+ ### Stop / ping
46
+ ```json
47
+ {"type":"stop"}
48
+ {"type":"ping"}
49
+ ```
50
+
51
+ ### Browser quick‑start (schedules seamlessly with 25–40 ms crossfade)
52
+ ```html
53
+ <script>
54
+ const XFADE = 0.025; // 25 ms
55
+ let ctx, gain, ws, nextTime = 0;
56
+ async function start(){
57
+ ctx = new (window.AudioContext||window.webkitAudioContext)();
58
+ gain = ctx.createGain(); gain.connect(ctx.destination);
59
+ ws = new WebSocket("wss://YOUR_SPACE/ws/jam");
60
+ ws.onopen = ()=> ws.send(JSON.stringify({
61
+ type:"start", mode:"rt", binary_audio:false,
62
+ params:{ styles:"warmup", temperature:1.1, topk:40, guidance_weight:1.1, pace:"realtime" }
63
+ }));
64
+ ws.onmessage = async ev => {
65
+ const msg = JSON.parse(ev.data);
66
+ if (msg.type === "chunk" && msg.audio_base64){
67
+ const bin = atob(msg.audio_base64); const buf = new Uint8Array(bin.length);
68
+ for (let i=0;i<bin.length;i++) buf[i] = bin.charCodeAt(i);
69
+ const ab = buf.buffer; const audio = await ctx.decodeAudioData(ab);
70
+ const src = ctx.createBufferSource(); const g = ctx.createGain();
71
+ src.buffer = audio; src.connect(g); g.connect(gain);
72
+ if (nextTime < ctx.currentTime + 0.05) nextTime = ctx.currentTime + 0.12;
73
+ const startAt = nextTime, dur = audio.duration;
74
+ nextTime = startAt + Math.max(0, dur - XFADE);
75
+ g.gain.setValueAtTime(0, startAt);
76
+ g.gain.linearRampToValueAtTime(1, startAt + XFADE);
77
+ g.gain.setValueAtTime(1, startAt + Math.max(0, dur - XFADE));
78
+ g.gain.linearRampToValueAtTime(0, startAt + dur);
79
+ src.start(startAt);
80
+ }
81
+ };
82
+ }
83
+ </script>
84
+ ```
85
+
86
+ ### Python client (async)
87
+ ```python
88
+ import asyncio, json, websockets, base64, soundfile as sf, io
89
+ async def run(url):
90
+ async with websockets.connect(url) as ws:
91
+ await ws.send(json.dumps({"type":"start","mode":"rt","binary_audio":False,
92
+ "params": {"styles":"warmup","temperature":1.1,"topk":40,"guidance_weight":1.1,"pace":"realtime"}}))
93
+ while True:
94
+ msg = json.loads(await ws.recv())
95
+ if msg.get("type") == "chunk":
96
+ wav = base64.b64decode(msg["audio_base64"]) # bytes of a WAV
97
+ x, sr = sf.read(io.BytesIO(wav), dtype="float32")
98
+ print("chunk", x.shape, sr)
99
+ elif msg.get("type") in ("stopped","error"): break
100
+ asyncio.run(run("wss://YOUR_SPACE/ws/jam"))
101
+ ```
docs/changelog.md ADDED
@@ -0,0 +1,11 @@
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ### Recent changes
2
+ - New **WebSocket realtime** route: `/ws/jam` (`mode:"rt"`)
3
+ - Added server pacing flag: `pace: "realtime" | "asap"`
4
+ - Exposed `max_decode_frames` for shorter chunks on smaller GPUs
5
+ - Client test page now does proper **overlap‑add** crossfade between chunks
6
+
7
+ ### Licensing
8
+ This project uses MagentaRT under:
9
+ - **Code:** Apache 2.0
10
+ - **Model weights:** CC‑BY 4.0
11
+ Please review the MagentaRT repo for full terms.
docs/performance.md ADDED
@@ -0,0 +1,12 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ### Current observations
2
+ - **L40S 48GB** → faster than realtime. Use `pace:"realtime"` to avoid client over‑buffering.
3
+ - **L4 24GB** → slightly **below** realtime even with pre‑roll buffering, TF32/Autotune, smaller chunks (`max_decode_frames`), and the **base** checkpoint.
4
+
5
+ ### Practical guidance
6
+ - For consistent realtime, target **~40GB VRAM per active stream** (e.g., **A100 40GB**, or MIG slices ≈ **35–40GB** on newer GPUs).
7
+ - Keep client‑side **overlap‑add** (25–40 ms) for seamless chunk joins.
8
+ - Prefer **`pace:"realtime"`** once playback begins; use **ASAP** only to build a short pre‑roll if needed.
9
+ - Optional knob: **`max_decode_frames`** (default **50** ≈ 2.0 s). Reducing to **36–45** can lower per‑chunk latency/VRAM, but doesn't increase frames/sec throughput.
10
+
11
+ ### Concurrency
12
+ This research build is designed for **one active jam per GPU**. Concurrency would require GPU partitioning (MIG) or horizontal scaling with a session scheduler.