1528 219 441

Merve Noyan PRO

merve

https://github.com/merveenoyan/smol-vision

AI & ML interests

VLMs, vision & co

Recent Activity

posted an update 1 day ago

Qwen2.5-Omni is soooo good that people build multimodal reasoning models off of it 🥹 > https://huggingface.co/KE-Team/Ke-Omni-R-3B is open-source audio reasoning model sota on average of benchmarks, based on https://huggingface.co/Qwen/Qwen2.5-Omni-3B 🗣️ > https://huggingface.co/Haoz0206/Omni-R1 is a video reasoning model with pixel level grounding (see below) and it's super competitive ⏯️ based on https://huggingface.co/Qwen/Qwen2.5-Omni-7B

upvoted a changelog 1 day ago

New Inference Providers Dashboard

liked a dataset 2 days ago

lmms-lab/multimodal-open-r1-8k-verified

View all activity

Organizations

merve's activity

posted an update 1 day ago

Post

1148

Qwen2.5-Omni is soooo good that people build multimodal reasoning models off of it 🥹
> KE-Team/Ke-Omni-R-3B is open-source audio reasoning model sota on average of benchmarks, based on Qwen/Qwen2.5-Omni-3B 🗣️
> Haoz0206/Omni-R1 is a video reasoning model with pixel level grounding (see below) and it's super competitive ⏯️ based on Qwen/Qwen2.5-Omni-7B

upvoted a changelog 1 day ago

Changelog

New Inference Providers Dashboard

1 day ago

• 30

liked a dataset 2 days ago

lmms-lab/multimodal-open-r1-8k-verified

Viewer • Updated Jan 27 • 7.69k • 973 • 55

New activity in google/paligemma-3b-pt-896 2 days ago

Update README.md

#10 opened 4 days ago by

punitvara

replied to their post 2 days ago

it's 🆗

posted an update 2 days ago

Post

1390

Past week was insanely packed for open AI! 😱
Luckily we picked some highlights for you ❤️ lfg!

💬 LLMs/VLMs
> Deepseek 🐳 released deepseek-ai/DeepSeek-R1-0528, 38B model, only 0.2 and 1.4 points behind o3 in AIME 24/25 🤯 they also released an 8B distilled version based on Qwen3 (OS) deepseek-ai/deepseek-r1-678e1e131c0169c0bc89728d
> Xiaomi released MiMo-7B-RL (LLM for code and math) and MiMo-VL-7B-RL (VLM for visual reasoning, GUI agentic task and general use) (OS) 😍 XiaomiMiMo/mimo-vl-68382ccacc7c2875500cd212
> NVIDIA released , new reasoning model nvidia/Nemotron-Research-Reasoning-Qwen-1.5B
> DS: MiniMax released https://huggingface.co/MiniMaxAI/SynLogic, new 49k logical reasoning examples across 35 tasks including solving cipher, sudoku and more!

🖼️ Image/Video Generation
> tencent released tencent/HunyuanPortrait, a new model for consistent portrait generation with SVD Research license. They also released tencent/HunyuanVideo-Avatar, audio driven avatar generation (OS)
> showlab released showlab/OmniConsistency, consistent stylization model (OS)
> Rapidata/text-2-video-human-preferences-veo3 is a new T2V preference dataset based on videos from Veo3 with 46k examples (OS)

Audio🗣️
> https://huggingface.co/ResembleAI/Chatterbox is a new 500M text-to-speech model preferred more than ElevenLabs (OS) 😍
> PlayHT/PlayDiffusion is a new speech editing model (OS)

Other
> https://huggingface.co/NX-AI/TiReX is a new time series foundation model
> Yandex released a huge (4.79B examples!) video recommendation dataset https://huggingface.co/yandex/yambda

OS ones have Apache2.0 or MIT licenses, find more models and datasets here merve/releases-30-may-6840097345e0b1e915bff843