Dmitry Ryumin's picture

Dmitry Ryumin

DmitryRyumin

AI & ML interests

Machine Learning and Applications, Multi-Modal Understanding

Recent Activity

liked a Space 3 days ago
NihalGazi/Text-To-Speech-Unlimited
liked a Space 3 days ago
OpenSound/SoloSpeech
updated a Space 4 days ago
DmitryRyumin/BiBiER
View all activity

Organizations

Gradio-Themes-Party's profile picture Gradio-Blocks-Party's profile picture Blog-explorers's profile picture New Era Artificial Intelligence's profile picture ICCV2023's profile picture ZeroGPU Explorers's profile picture Journalists on Hugging Face's profile picture Social Post Explorers's profile picture Dev Mode Explorers's profile picture LEYA Lab's profile picture

DmitryRyumin's activity

reacted to AdinaY's post with 🚀 15 days ago
view post
Post
2773
ByteDance is absolutely cooking lately🔥

BAGEL 🥯 7B active parameter open multimodal foundation model by Bytedance Seed team.

ByteDance-Seed/BAGEL-7B-MoT

✨ Apache 2.0
✨ Outperforms top VLMs (Qwen2.5-VL & InternVL-2.5)
✨ Mixture-of-Transformer-Experts + dual encoders
✨ Trained on trillions of interleaved tokens
upvoted an article 24 days ago
view article
Article

Vision Language Models (Better, Faster, Stronger)

By merve and 4 others
414
reacted to merterbak's post with 🔥 about 1 month ago
view post
Post
4864
Qwen 3 models released🔥
It offers 2 MoE and 6 dense models with following parameter sizes: 0.6B, 1.7B, 4B, 8B, 14B, 30B(MoE), 32B, and 235B(MoE).
Models: Qwen/qwen3-67dd247413f0e2e4f653967f
Blog: https://qwenlm.github.io/blog/qwen3/
Demo: Qwen/Qwen3-Demo
GitHub: https://github.com/QwenLM/Qwen3

✅ Pre-trained 119 languages(36 trillion tokens) and dialects with strong translation and instruction following abilities. (Qwen2.5 was pre-trained on 18 trillion tokens.)
✅Qwen3 dense models match the performance of larger Qwen2.5 models. For example, Qwen3-1.7B/4B/8B/14B/32B perform like Qwen2.5-3B/7B/14B/32B/72B.
✅ Three stage done while pretraining:
• Stage 1: General language learning and knowledge building.
• Stage 2: Reasoning boost with STEM, coding, and logic skills.
• Stage 3: Long context training
✅ It supports MCP in the model
✅ Strong agent skills
✅ Supports seamless between thinking mode (for hard tasks like math and coding) and non-thinking mode (for fast chatting) inside chat template.
✅ Better human alignment for creative writing, roleplay, multi-turn conversations, and following detailed instructions.