Papers
arxiv:2505.07235

Multi-band Frequency Reconstruction for Neural Psychoacoustic Coding

Published on May 12
Authors:
,
,
,
,
,

Abstract

MUFFIN, a convolutional Neural Psychoacoustic Coding framework, achieves high audio compression quality using perceptual salience-based frequency band allocation and a transformer-inspired architecture.

AI-generated summary

Achieving high-fidelity audio compression while preserving perceptual quality across diverse content remains a key challenge in Neural Audio Coding (NAC). We introduce MUFFIN, a fully convolutional Neural Psychoacoustic Coding (NPC) framework that leverages psychoacoustically guided multi-band frequency reconstruction. At its core is a Multi-Band Spectral Residual Vector Quantization (MBS-RVQ) module that allocates bitrate across frequency bands based on perceptual salience. This design enables efficient compression while disentangling speaker identity from content using distinct codebooks. MUFFIN incorporates a transformer-inspired convolutional backbone and a modified snake activation to enhance resolution in fine-grained spectral regions. Experimental results on multiple benchmarks demonstrate that MUFFIN consistently outperforms existing approaches in reconstruction quality. A high-compression variant achieves a state-of-the-art 12.5 Hz rate with minimal loss. MUFFIN also proves effective in downstream generative tasks, highlighting its promise as a token representation for integration with language models. Audio samples and code are available.

Community

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2505.07235 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2505.07235 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2505.07235 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.