Real-Time AI Sound Generation on Arm: A Personal Tool for Creative Freedom

Community Article Published June 3, 2025

Upvote

By Michael Gamble, Partner & Ecosystem Lead, Arm

As a software engineer and music producer, I’m always exploring how technology can expand creative expression. That curiosity recently led me to build a personal sound generation app that runs directly on-device—powered by an Arm-based CPU and open-source generative AI models. It’s fast, private, and enables me to generate studio-ready sounds from a simple prompt, all within seconds.

This project brings together the best of several worlds:

The Stable Audio Open model from Stability AI, sourced from Hugging Face
Execution powered by PyTorch and TorchAudio
A fast, efficient pipeline that runs natively on Arm-based CPUs
A seamless creative handoff to Ableton Live

A New Kind of Creative Companion

When I’m deep in a music project using Ableton Live, I don’t want to interrupt my workflow to dig through libraries or browse sound packs. I wanted a tool that could meet me where I am—right in the flow.

Now, I can simply describe the sound I’m imagining (“analog bassline,” “cinematic riser,” “lofi snare”), and within seconds, the generated .wav file appears in my Ableton browser. From there, I can tweak it, loop it, or turn it into an instrument.

Every sound is unique. No one else will generate exactly what I do. That sense of personal ownership fuels my creativity.

Powered by Arm: On-Device, On-Demand

This sound generator runs entirely on-device using Arm-based CPU technology—no GPU, no cloud inference, no latency. Thanks to Arm's efficiency and performance-per-watt, the app stays responsive even during multi-step diffusion runs.

The generation engine is built on:

The Stable Audio Open model by Stability AI, available via Hugging Face
PyTorch and TorchAudio for model inference and audio handling
Optimized multithreaded execution for smooth CPU performance

Sample Code: Optimized CPU Generation

To maximize performance on Arm CPUs, I enabled full thread utilization:

# Use all available Arm CPU threads
torch.set_num_threads(os.cpu_count())

To maintain low memory usage across generations:

# Clear memory periodically
if gen_count % 3 == 0:
    gc.collect()
    print(f"Memory cleared at generation {gen_count}")

Core generation loop, tuned for speed and efficiency:

output = generate_diffusion_cond(
    model,
    steps=7,                  # Reduced step count for faster inference
    cfg_scale=1,
    conditioning=conditioning,
    sample_size=sample_size,
    sigma_min=0.3,
    sigma_max=500,
    sampler_type="dpmpp-3m-sde",
    device=device
)

Device Flexibility: CPU, Metal, CUDA

Although optimized for CPU, the program can also run on Metal (Apple Silicon) or CUDA if needed:

device = "mps"    # Apple Silicon
# device = "cuda" # NVIDIA
# device = "cpu"  # Arm CPU (default)
model = model.to(device).to(torch.float32)

Seamless Workflow with Ableton Live

The tool outputs .wav files directly to a project folder monitored by Ableton Live. Here's a sample CLI interaction:

Enter a prompt for generating audio:
Ambient texture
Enter a tempo for the audio:
100
Generated audio saved to: Ambient texture.wav

I immediately see the file show up in my browser within Live, ready to be arranged, modulated, and transformed.

Why This Matters

This project is a personal prototype—but it’s also a window into the future of content creation. With efficient, on-device AI inference on Arm CPUs, artists and developers can:

Stay in creative flow without waiting on cloud resources
Ensure data privacy and full ownership of outputs
Extend AI tools into edge devices, DAWs, and new creative interfaces

This is what happens when open-source innovation meets efficient compute: real-time generative power, accessible to every creator.

Explore the ecosystem that made this possible:

Community

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote