Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2410.15316

about 18 hours ago

EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters

Paper • 2402.04252 • Published Feb 6, 2024 • 29
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models

Paper • 2402.03749 • Published Feb 6, 2024 • 13
ScreenAI: A Vision-Language Model for UI and Infographics Understanding

Paper • 2402.04615 • Published Feb 7, 2024 • 44
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss

Paper • 2402.05008 • Published Feb 7, 2024 • 23

🍓 Ichigo v0.5

The experimental family designed to train LLMs to understand sound natively.

Ichigo: Mixed-Modal Early-Fusion Realtime Voice Assistant

Paper • 2410.15316 • Published Oct 20, 2024 • 12
Menlo/Ichigo-whisper-v0.1

Audio-Text-to-Text • Updated Jan 3 • 24

Ichigo: Mixed-Modal Early-Fusion Realtime Voice Assistant

Paper • 2410.15316 • Published Oct 20, 2024 • 12
MMAU: A Massive Multi-Task Audio Understanding and Reasoning Benchmark

Paper • 2410.19168 • Published Oct 24, 2024 • 21
LLM-Powered GUI Agents in Phone Automation: Surveying Progress and Prospects

Paper • 2504.19838 • Published Apr 28 • 22

about 19 hours ago

SoundCTM: Uniting Score-based and Consistency Models for Text-to-Sound Generation

Paper • 2405.18503 • Published May 28, 2024 • 9
DITTO-2: Distilled Diffusion Inference-Time T-Optimization for Music Generation

Paper • 2405.20289 • Published May 30, 2024 • 11
LiveSpeech: Low-Latency Zero-shot Text-to-Speech via Autoregressive Modeling of Audio Discrete Codes

Paper • 2406.02897 • Published Jun 5, 2024 • 16
Audio Mamba: Bidirectional State Space Model for Audio Representation Learning

Paper • 2406.03344 • Published Jun 5, 2024 • 21

A collection of audio related papers that I want to read

LiteASR: Efficient Automatic Speech Recognition with Low-Rank Approximation

Paper • 2502.20583 • Published Feb 27 • 13
Ichigo: Mixed-Modal Early-Fusion Realtime Voice Assistant

Paper • 2410.15316 • Published Oct 20, 2024 • 12
Spark-TTS: An Efficient LLM-Based Text-to-Speech Model with Single-Stream Decoupled Speech Tokens

Paper • 2503.01710 • Published Mar 3 • 6

🍓 Ichigo v0.4

The experimental family designed to train LLMs to understand sound natively.

Ichigo: Mixed-Modal Early-Fusion Realtime Voice Assistant

Paper • 2410.15316 • Published Oct 20, 2024 • 12
Running on Zero

6

6

Ichigo Llama3.1 S V0.4

🏃

The latest version of Ichigo v0.4
Menlo/Ichigo-llama3.1-s-instruct-v0.4

Audio-Text-to-Text • 8B • Updated Dec 13, 2024 • 17 • 20

Runtime error

80

80

Dailypapershackernews

📈
Prithvi WxC: Foundation Model for Weather and Climate

Paper • 2409.13598 • Published Sep 20, 2024 • 45
TurtleBench: Evaluating Top Language Models via Real-World Yes/No Puzzles

Paper • 2410.05262 • Published Oct 7, 2024 • 11
Ichigo: Mixed-Modal Early-Fusion Realtime Voice Assistant

Paper • 2410.15316 • Published Oct 20, 2024 • 12

about 18 hours ago

EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters

Paper • 2402.04252 • Published Feb 6, 2024 • 29
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models

Paper • 2402.03749 • Published Feb 6, 2024 • 13
ScreenAI: A Vision-Language Model for UI and Infographics Understanding

Paper • 2402.04615 • Published Feb 7, 2024 • 44
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss

Paper • 2402.05008 • Published Feb 7, 2024 • 23

A collection of audio related papers that I want to read

LiteASR: Efficient Automatic Speech Recognition with Low-Rank Approximation

Paper • 2502.20583 • Published Feb 27 • 13
Ichigo: Mixed-Modal Early-Fusion Realtime Voice Assistant

Paper • 2410.15316 • Published Oct 20, 2024 • 12
Spark-TTS: An Efficient LLM-Based Text-to-Speech Model with Single-Stream Decoupled Speech Tokens

Paper • 2503.01710 • Published Mar 3 • 6

🍓 Ichigo v0.5

The experimental family designed to train LLMs to understand sound natively.

Ichigo: Mixed-Modal Early-Fusion Realtime Voice Assistant

Paper • 2410.15316 • Published Oct 20, 2024 • 12
Menlo/Ichigo-whisper-v0.1

Audio-Text-to-Text • Updated Jan 3 • 24

🍓 Ichigo v0.4

The experimental family designed to train LLMs to understand sound natively.

Ichigo: Mixed-Modal Early-Fusion Realtime Voice Assistant

Paper • 2410.15316 • Published Oct 20, 2024 • 12
Running on Zero

6

6

Ichigo Llama3.1 S V0.4

🏃

The latest version of Ichigo v0.4
Menlo/Ichigo-llama3.1-s-instruct-v0.4

Audio-Text-to-Text • 8B • Updated Dec 13, 2024 • 17 • 20

Ichigo: Mixed-Modal Early-Fusion Realtime Voice Assistant

Paper • 2410.15316 • Published Oct 20, 2024 • 12
MMAU: A Massive Multi-Task Audio Understanding and Reasoning Benchmark

Paper • 2410.19168 • Published Oct 24, 2024 • 21
LLM-Powered GUI Agents in Phone Automation: Surveying Progress and Prospects

Paper • 2504.19838 • Published Apr 28 • 22

Runtime error

80

80

Dailypapershackernews

📈
Prithvi WxC: Foundation Model for Weather and Climate

Paper • 2409.13598 • Published Sep 20, 2024 • 45
TurtleBench: Evaluating Top Language Models via Real-World Yes/No Puzzles

Paper • 2410.05262 • Published Oct 7, 2024 • 11
Ichigo: Mixed-Modal Early-Fusion Realtime Voice Assistant

Paper • 2410.15316 • Published Oct 20, 2024 • 12

about 19 hours ago

SoundCTM: Uniting Score-based and Consistency Models for Text-to-Sound Generation

Paper • 2405.18503 • Published May 28, 2024 • 9
DITTO-2: Distilled Diffusion Inference-Time T-Optimization for Music Generation

Paper • 2405.20289 • Published May 30, 2024 • 11
LiveSpeech: Low-Latency Zero-shot Text-to-Speech via Autoregressive Modeling of Audio Discrete Codes

Paper • 2406.02897 • Published Jun 5, 2024 • 16
Audio Mamba: Bidirectional State Space Model for Audio Representation Learning

Paper • 2406.03344 • Published Jun 5, 2024 • 21

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs