magentaRT research API

exploring ways to jam • real-time streaming with http/ws • custom fine-tune model-switching support

research project

what this is

this API serves google's magentaRT in two distinct ways. first, as a backend for our iOS app (the untitled jamming app) where users create initial loops with stability ai's stable-audio-open-small and then MagentaRT uses the combined audio as context. second, as a standalone web interface that connects directly to magentaRT via websockets without any audio context.

both modes support switching between base models and custom fine-tunes hosted on Hugging Face. this is designed as a template space for duplication, letting you experiment with real-time music generation outside of google colab.

this is meant to be duplicated to your own GPU-enabled space since the iOS app is still in active development and doesn't have funding to support multiple concurrent users yet.

hardware requirements: optimal performance requires an L40S GPU (48GB VRAM) for real-time streaming. L4 24GB almost works but will not achieve real-time performance (if someone knows an optimization that will solve this, please let me know).

environment variables (optional, but helpful)

you can boot this Space directly into your own finetune by setting the variables below in Settings → Variables and secrets → Variables. if you don't set them, you can still select models at runtime using /model/select from the frontend/API.

Quick start: set these to make a finetune the default on boot:

those values correspond to the example finetune in this repo (checkpoint_1863001.tgz on top of the large base).

name what it does example when to set
MRT_CKPT_REPO huggingface repo ID that hosts your finetune checkpoints/assets. thepatch/magenta-ft set to make this finetune the default on boot.
MRT_CKPT_STEP checkpoint step number to load on boot. 1863001 set if you want a specific checkpoint preselected.
MRT_SIZE base model family used by the finetune (e.g., large). large set to match the base you finetuned from.
SPACE_MODE controls readiness behavior: serve (GPU, ready to generate) vs template (CPU template for duplication). If unset, the server auto-detects. serve or template set for explicit behavior; otherwise it falls back to auto-detection.
alternative: select a model at runtime via API
curl -X POST https://<your-space>.hf.space/model/select \
  -H 'Content-Type: application/json' \
  -d '{
    "ckpt_repo": "thepatch/magenta-ft",
    "ckpt_step": 1863001,
    "size": "large",
    "prewarm": true
  }'

when you call prewarm:true, the backend performs a warmup before returning, so the first jam starts hot.

open realtime web tester

app demo video

iPhone app generating music in real-time

overview

this API revolves around google's magentaRT, designed for real-time audio streaming using finetunes hosted on HF. built for iOS app integration with webSocket streaming support for web applications (and potentially VST plugins).

quick start - webSocket streaming

connect to wss://<your-space>/ws/jam for real-time audio generation:

start real-time generation

{
  "type": "start",
  "mode": "rt",
  "binary_audio": false,
  "params": {
    "styles": "electronic, ambient",
    "style_weights": "1.0, 0.8",
    "temperature": 1.1,
    "topk": 40,
    "guidance_weight": 1.1,
    "pace": "realtime",
    "style_ramp_seconds": 8.0,
    "mean": 0.0,
    "centroid_weights": "0.0, 0.0, 0.0"
  }
}

update parameters live

{
  "type": "update",
  "styles": "jazz, hiphop",
  "style_weights": "1.0, 0.8",
  "temperature": 1.2,
  "topk": 64,
  "guidance_weight": 1.0,
  "mean": 0.2,
  "centroid_weights": "0.1, 0.3, 0.0"
}

stop generation

{"type": "stop"}

API endpoints

POST /generate - generate 4–8 bars of music with input audio
POST /generate_style - generate music from style prompts only (experimental)
POST /jam/start - start continuous jamming session
GET /jam/next - get next audio chunk from session
POST /jam/consume - mark chunk as consumed
POST /jam/stop - end jamming session
WEBSOCKET /ws/jam - real-time streaming interface
POST /model/select - switch between base and fine-tuned models

custom fine-tuning

train your own MagentaRT models and use them in the web app demo or the iOS app.

1. train your model

use the official MagentaRT fine-tuning notebook:

MagentaRT Fine-tuning Colab

this will create checkpoint folders like:

  • checkpoint_1861001/
  • checkpoint_1862001/
  • and steering assets: cluster_centroids.npy, mean_style_embed.npy

2. package checkpoints

checkpoints must be compressed as .tgz files to preserve .zarray files correctly.

important: do not download checkpoint folders directly from Google Drive - the .zarray files won't transfer properly.

checkpoint packaging script

use this in a Colab cell to properly package your checkpoints:

# Mount Drive to access your trained checkpoints
from google.colab import drive
drive.mount('/content/drive')

# Set the path to your checkpoint folder
CKPT_SRC = '/content/drive/MyDrive/thepatch/checkpoint_1862001'  # Adjust path

# Copy folder to local storage (preserves dotfiles)
!rm -rf /content/checkpoint_1862001
!cp -a "$CKPT_SRC" /content/

# Verify .zarray files are present
!find /content/checkpoint_1862001 -name .zarray | wc -l

# Create properly formatted .tgz archive
!tar -C /content -czf /content/checkpoint_1862001.tgz checkpoint_1862001

# Verify critical files are in the archive
!tar -tzf /content/checkpoint_1862001.tgz | grep -c '.zarray'

# Download the .tgz file
from google.colab import files
files.download('/content/checkpoint_1862001.tgz')

3. upload to hugging face

create a model repository and upload:

example repository: thepatch/magenta-ft
shows the correct file structure with .tgz files and .npy steering assets in the root directory.

4. use in the app

in the iOS app's model selector, point to your hf repository URL. the app will automatically discover available checkpoints and allow switching between them.

technical specifications

a little more about the ios app

uses http requests

note: the /generate_style endpoint is experimental and may not properly adhere to BPM without additional context (considering metronome-based context instead of silence).

deployment

to run your own instance:

  1. duplicate this huggingface space by clicking the three dots in the top right
  2. select 'run locally' if you got a 5090 or something, otherwise just duplicate.
  3. ensure you have access to an L40S GPU by enabling billing
  4. point your iOS app to the new space URL (e.g., https://your-username-magenta-retry.hf.space)
  5. upload your fine-tuned models to hf as described above

support & contact

this is an active research project. for questions, technical support, or collaboration:

email: kev@thecollabagepatch.com

research Status: this project is under very active development. features and API may change. We welcome feedback and contributions from the research community. im just a vibe coder.

licensing

built on google's magentaRT (Apache 2.0 + CC-BY 4.0). users are responsible for their generated outputs and ensuring compliance with applicable laws and platform policies.

auto-generated API docs (for all the http requests)