51 83 141

Dmitry Ryumin

DmitryRyumin

https://dmitryryumin.github.io

DmitryRyumin

AI & ML interests

Machine Learning and Applications, Multi-Modal Understanding

Recent Activity

liked a Space 3 days ago

NihalGazi/Text-To-Speech-Unlimited

liked a Space 3 days ago

OpenSound/SoloSpeech

updated a Space 4 days ago

DmitryRyumin/BiBiER

View all activity

Organizations

DmitryRyumin's activity

liked 2 Spaces 3 days ago

403

Realistic Text To Speech Unlimited

🔥

Free Text-To-Speech generator with Emotion control (OpenAI)

SoloSpeech

🎯

State-of-the-art target speech extractor

updated a Space 4 days ago

BiBiER

🏃

Bilingual Bimodal Emotion Recognition

published a Space 5 days ago

BiBiER

🏃

Bilingual Bimodal Emotion Recognition

liked a Space 5 days ago

767

Chatterbox TTS

🍿

Expressive Zeroshot TTS

liked a model 8 days ago

tencent/HunyuanPortrait

Image-to-Video • Updated 9 days ago • 62

liked a model 14 days ago

boltuix/bert-emotion

Text Classification • Updated 12 days ago • 11.6k • 18

reacted to AdinaY's post with 🚀 15 days ago

Post

2773

ByteDance is absolutely cooking lately🔥

BAGEL 🥯 7B active parameter open multimodal foundation model by Bytedance Seed team.

ByteDance-Seed/BAGEL-7B-MoT

✨ Apache 2.0
✨ Outperforms top VLMs (Qwen2.5-VL & InternVL-2.5)
✨ Mixture-of-Transformer-Experts + dual encoders
✨ Trained on trillions of interleaved tokens

liked a model 15 days ago

tiiuae/Falcon-H1-3B-Instruct

Text Generation • Updated 4 days ago • 625 • 6

liked a Space 15 days ago

Falcon H1 Playground

🦅

Chat with Falcon-H1 models

upvoted a collection 15 days ago

Falcon-H1

Collection

Falcon-H1 Family of Hybrid-Head Language Models, including 0.5B, 1.5B, 1.5B-Deep, 3B, 7B, and 34B (pretrained and instruction-tuned). • 37 items • Updated 16 days ago • 38

liked 2 Spaces 20 days ago

VideoLLaMA3

💬

Frontier Foundation Models for Video Understanding

149

VideoLLaMA2

🎥

Media understanding

upvoted an article 24 days ago

Article

Vision Language Models (Better, Faster, Stronger)

and 4 others •

25 days ago

• 414

liked a Space 26 days ago

MimicMotion

🤸

HQ human motion video gen with pose-guided control

liked a model 28 days ago

nvidia/parakeet-tdt-0.6b-v2

Automatic Speech Recognition • Updated 14 days ago • 892k • 1.1k

liked a Space 28 days ago

375

Parakeet-TDT-0.6b-V2

Transcribe audio files to text with timestamps

liked a model about 1 month ago

XiaomiMiMo/MiMo-7B-RL

Text Generation • Updated about 4 hours ago • 33k • 261

reacted to merterbak's post with 🔥 about 1 month ago

Post

4864

Qwen 3 models released🔥
It offers 2 MoE and 6 dense models with following parameter sizes: 0.6B, 1.7B, 4B, 8B, 14B, 30B(MoE), 32B, and 235B(MoE).
Models: Qwen/qwen3-67dd247413f0e2e4f653967f
Blog: https://qwenlm.github.io/blog/qwen3/
Demo: Qwen/Qwen3-Demo
GitHub: https://github.com/QwenLM/Qwen3

✅ Pre-trained 119 languages(36 trillion tokens) and dialects with strong translation and instruction following abilities. (Qwen2.5 was pre-trained on 18 trillion tokens.)
✅Qwen3 dense models match the performance of larger Qwen2.5 models. For example, Qwen3-1.7B/4B/8B/14B/32B perform like Qwen2.5-3B/7B/14B/32B/72B.
✅ Three stage done while pretraining:
• Stage 1: General language learning and knowledge building.
• Stage 2: Reasoning boost with STEM, coding, and logic skills.
• Stage 3: Long context training
✅ It supports MCP in the model
✅ Strong agent skills
✅ Supports seamless between thinking mode (for hard tasks like math and coding) and non-thinking mode (for fast chatting) inside chat template.
✅ Better human alignment for creative writing, roleplay, multi-turn conversations, and following detailed instructions.

upvoted a collection about 1 month ago

Qwen3

Collection

40 items • Updated 16 days ago • 735