Audio Moment-DETR
This is a Audio Moment DETR (AM-DETR) proposed in Language-based Audio Moment Retrieval. Given the text query, AM-DETR searches for specific audio segments relevant to the query from the long audio recording.
Install
Installing Lighthouse is required. Check the dependencies and your envirionment.
apt install ffmpeg
pip install 'git+https://github.com/line/lighthouse.git'
pip install torch==2.1.0 torchvision==0.16.0 torchaudio==2.1.0 torchtext==0.16.0 transformers==4.51.3 --index-url https://download.pytorch.org/whl/cu118
Sample script
import io
import requests
import torch
from transformers import AutoModel, AutoConfig
repo_id = "lighthouse-emnlp2024/AM-DETR"
config = AutoConfig.from_pretrained(repo_id, trust_remote_code=True)
config.device="cpu"
model = AutoModel.from_pretrained(repo_id, config=config, trust_remote_code=True)
audio_bytes = io.BytesIO(requests.get('https://github.com/line/lighthouse/raw/refs/heads/main/api_example/1a-ODBWMUAE.wav').content)
query = "Heavy rain falls"
feats = model.encode_audio(audio_path=audio_bytes)
prediction = model.predict(query, feats)
for start, end, score in prediction["pred_relevant_windows"]:
print(f"Moment, Score: {start:05.2f} - {end:05.2f}, {score:.2f}")
Citation
@inproceedings{munakata2025language,
title={Language-based Audio Moment Retrieval},
author={Munakata, Hokuto and Nishimura, Taichi and Nakada, Shota and Komatsu, Tatsuya},
booktitle={ICASSP 2025-2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
pages={1--5},
year={2025},
organization={IEEE}
}
- Downloads last month
- 215
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support