MagentaRT Research API

what this is

this API serves google's magentaRT in two distinct ways. first, as a backend for our iOS app (the untitled jamming app) where users create initial loops with stability ai's stable-audio-open-small and then MagentaRT uses the combined audio as context. second, as a standalone web interface that connects directly to magentaRT via websockets without any audio context.

both modes support switching between base models and custom fine-tunes hosted on Hugging Face. this is designed as a template space for duplication, letting you experiment with real-time music generation outside of google colab.

this is meant to be duplicated to your own GPU-enabled space since the iOS app is still in active development and doesn't have funding to support multiple concurrent users yet.

hardware requirements: optimal performance requires an L40S GPU (48GB VRAM) for real-time streaming. L4 24GB almost works but will not achieve real-time performance (if someone knows an optimization that will solve this, please let me know).

name	what it does	example	when to set
`MRT_CKPT_REPO`	huggingface repo ID that hosts your finetune checkpoints/assets.	`thepatch/magenta-ft`	set to make this finetune the default on boot.
`MRT_CKPT_STEP`	checkpoint step number to load on boot.	`1863001`	set if you want a specific checkpoint preselected.
`MRT_SIZE`	base model family used by the finetune (e.g., large).	`large`	set to match the base you finetuned from.
`SPACE_MODE`	controls readiness behavior: `serve` (GPU, ready to generate) vs `template` (CPU template for duplication). If unset, the server auto-detects.	`serve` or `template`	set for explicit behavior; otherwise it falls back to auto-detection.

overview

this API revolves around google's magentaRT, designed for real-time audio streaming using finetunes hosted on HF. built for iOS app integration with webSocket streaming support for web applications (and potentially VST plugins).

quick start - webSocket streaming

connect to wss://<your-space>/ws/jam for real-time audio generation:

start real-time generation

{
  "type": "start",
  "mode": "rt",
  "binary_audio": false,
  "params": {
    "styles": "electronic, ambient",
    "style_weights": "1.0, 0.8",
    "temperature": 1.1,
    "topk": 40,
    "guidance_weight": 1.1,
    "pace": "realtime",
    "style_ramp_seconds": 8.0,
    "mean": 0.0,
    "centroid_weights": "0.0, 0.0, 0.0"
  }
}

update parameters live

{
  "type": "update",
  "styles": "jazz, hiphop",
  "style_weights": "1.0, 0.8",
  "temperature": 1.2,
  "topk": 64,
  "guidance_weight": 1.0,
  "mean": 0.2,
  "centroid_weights": "0.1, 0.3, 0.0"
}

stop generation

{"type": "stop"}

API endpoints

POST /generate - generate 4–8 bars of music with input audio

POST /generate_style - generate music from style prompts only (experimental)

POST /jam/start - start continuous jamming session

GET /jam/next - get next audio chunk from session

POST /jam/consume - mark chunk as consumed

POST /jam/stop - end jamming session

WEBSOCKET /ws/jam - real-time streaming interface

POST /model/select - switch between base and fine-tuned models

custom fine-tuning

train your own MagentaRT models and use them in the web app demo or the iOS app.

1. train your model

use the official MagentaRT fine-tuning notebook:

MagentaRT Fine-tuning Colab

this will create checkpoint folders like:

checkpoint_1861001/
checkpoint_1862001/
and steering assets: cluster_centroids.npy, mean_style_embed.npy

2. package checkpoints

checkpoints must be compressed as .tgz files to preserve .zarray files correctly.

important: do not download checkpoint folders directly from Google Drive - the .zarray files won't transfer properly.

checkpoint packaging script

use this in a Colab cell to properly package your checkpoints:

# Mount Drive to access your trained checkpoints
from google.colab import drive
drive.mount('/content/drive')

# Set the path to your checkpoint folder
CKPT_SRC = '/content/drive/MyDrive/thepatch/checkpoint_1862001'  # Adjust path

# Copy folder to local storage (preserves dotfiles)
!rm -rf /content/checkpoint_1862001
!cp -a "$CKPT_SRC" /content/

# Verify .zarray files are present
!find /content/checkpoint_1862001 -name .zarray | wc -l

# Create properly formatted .tgz archive
!tar -C /content -czf /content/checkpoint_1862001.tgz checkpoint_1862001

# Verify critical files are in the archive
!tar -tzf /content/checkpoint_1862001.tgz | grep -c '.zarray'

# Download the .tgz file
from google.colab import files
files.download('/content/checkpoint_1862001.tgz')

3. upload to hugging face

create a model repository and upload:

Your .tgz checkpoint files
cluster_centroids.npy (for steering)
mean_style_embed.npy (for steering)

example repository: thepatch/magenta-ft
shows the correct file structure with .tgz files and .npy steering assets in the root directory.

4. use in the app

in the iOS app's model selector, point to your hf repository URL. the app will automatically discover available checkpoints and allow switching between them.

technical specifications

audio format: 48 kHz stereo, ~2.0s chunks with ~40ms crossfade. the 4/8 bar chunks will have varying length due to input bpm
model sizes: 'base' and 'large' variants available (we didn't notice any speedup in generation time using 'base' rather than 'large')
steering: support for text prompts, audio embeddings, and centroid-based fine-tune steering
real-time performance: L40S recommended; L4 will experience slight delays
Memory Requirements: 30+GB VRAM for sustained real-time streaming

a little more about the ios app

uses http requests

the reseed endpoints are still under development... the idea is to re-inject the initial context with token splicing
single-shot generation endpoints (one_shot_generation.py)
the stable-audio-open-small backend is hosted by me. it generates with just 2gb GPU RAM
gradual style embed changes to try and avoid abrupt genre switches

note: the /generate_style endpoint is experimental and may not properly adhere to BPM without additional context (considering metronome-based context instead of silence).

deployment

to run your own instance:

duplicate this huggingface space by clicking the three dots in the top right
select 'run locally' if you got a 5090 or something, otherwise just duplicate.
ensure you have access to an L40S GPU by enabling billing
point your iOS app to the new space URL (e.g., https://your-username-magenta-retry.hf.space)
upload your fine-tuned models to hf as described above

support & contact

this is an active research project. for questions, technical support, or collaboration:

email: kev@thecollabagepatch.com

research Status: this project is under very active development. features and API may change. We welcome feedback and contributions from the research community. im just a vibe coder.

licensing

built on google's magentaRT (Apache 2.0 + CC-BY 4.0). users are responsible for their generated outputs and ensuring compliance with applicable laws and platform policies.

auto-generated API docs (for all the http requests)

magentaRT research API