Merve Noyan PRO

merve

AI & ML interests

VLMs, vision & co

Recent Activity

Organizations

Hugging Face's profile picture Google's profile picture SODA's profile picture Notebooks-explorers's profile picture Deprem Yapay Zeka's profile picture Deprem Private's profile picture PyTorch Image Models's profile picture Turkish NLP Dataset Creators's profile picture Templates's profile picture Demo Crafters πŸ€— 's profile picture Keras's profile picture tensorflow's profile picture Mukayese's profile picture HugGAN Community's profile picture EPFL VILAB's profile picture Hugging Face Fellows's profile picture Huggingface.js's profile picture Tools's profile picture HuggingFaceM4's profile picture scikit-learn's profile picture JAX β™₯️ Diffusers 🧨's profile picture 2023 Jan Offsite hackathon's profile picture HF Canonical Model Maintainers's profile picture scikit-learn's profile picture fastai X Hugging Face Group 2022's profile picture Huggingface Projects's profile picture boun-tabi-LMG's profile picture Kornia AI's profile picture skops-tests's profile picture Hugging Face H4's profile picture Keras Dreambooth Event's profile picture Turkish T5 - BERT - GPT-2's profile picture Blog-explorers's profile picture Hugging Face for Computer Vision's profile picture Hacktoberfest 2023's profile picture Hugging Face Smol Models Research's profile picture adept-hf-collab's profile picture Qwen's profile picture ZeroGPU Explorers's profile picture kotol's profile picture Magic Leap Community's profile picture Llava Hugging Face's profile picture MLX Community's profile picture Social Post Explorers's profile picture Top Contributors: Profile Followers's profile picture Dev Mode Explorers's profile picture Paris AI Running Club's profile picture yorg's profile picture CVPR2024's profile picture Les papiers de Merve's profile picture nltpt's profile picture s0409's profile picture Hugging Face FineVideo's profile picture mv's profile picture H company's profile picture Cookbook Authors's profile picture open/ acc's profile picture Agents's profile picture wut?'s profile picture University of Sydney's profile picture smolagents's profile picture s0225's profile picture Orr and associates org's profile picture gg-hf-g's profile picture llrehf's profile picture University of Science and Technology of China's profile picture VLMs's profile picture all things vision LMs's profile picture

merve's activity

posted an update 1 day ago
replied to their post 2 days ago
posted an update 2 days ago
view post
Post
1389
Past week was insanely packed for open AI! 😱
Luckily we picked some highlights for you ❀️ lfg!

πŸ’¬ LLMs/VLMs
> Deepseek 🐳 released deepseek-ai/DeepSeek-R1-0528, 38B model, only 0.2 and 1.4 points behind o3 in AIME 24/25 🀯 they also released an 8B distilled version based on Qwen3 (OS) deepseek-ai/deepseek-r1-678e1e131c0169c0bc89728d
> Xiaomi released MiMo-7B-RL (LLM for code and math) and MiMo-VL-7B-RL (VLM for visual reasoning, GUI agentic task and general use) (OS) 😍 XiaomiMiMo/mimo-vl-68382ccacc7c2875500cd212
> NVIDIA released , new reasoning model nvidia/Nemotron-Research-Reasoning-Qwen-1.5B
> DS: MiniMax released https://huggingface.co/MiniMaxAI/SynLogic, new 49k logical reasoning examples across 35 tasks including solving cipher, sudoku and more!

πŸ–ΌοΈ Image/Video Generation
> tencent released tencent/HunyuanPortrait, a new model for consistent portrait generation with SVD Research license. They also released tencent/HunyuanVideo-Avatar, audio driven avatar generation (OS)
> showlab released showlab/OmniConsistency, consistent stylization model (OS)
> Rapidata/text-2-video-human-preferences-veo3 is a new T2V preference dataset based on videos from Veo3 with 46k examples (OS)

AudioπŸ—£οΈ
> https://huggingface.co/ResembleAI/Chatterbox is a new 500M text-to-speech model preferred more than ElevenLabs (OS) 😍
> PlayHT/PlayDiffusion is a new speech editing model (OS)

Other
> https://huggingface.co/NX-AI/TiReX is a new time series foundation model
> Yandex released a huge (4.79B examples!) video recommendation dataset https://huggingface.co/yandex/yambda

OS ones have Apache2.0 or MIT licenses, find more models and datasets here merve/releases-30-may-6840097345e0b1e915bff843
reacted to ProCreations's post with πŸš€ 2 days ago
view post
Post
2266
60 followers,
yay
  • 2 replies
Β·
posted an update 2 days ago
view post
Post
1313
Yesterday was the day of vision language action models (VLAs)!

> SmolVLA: open-source small VLA for robotics by Hugging Face LeRobot team πŸ€–
Blog: https://huggingface.co/blog/smolvla
Model: lerobot/smolvla_base

> Holo-1: 3B & 7B web/computer use agentic VLAs by H Company πŸ’»
Model family: Hcompany/holo1-683dd1eece7eb077b96d0cbd
Demo: https://huggingface.co/spaces/multimodalart/Holo1
Blog: https://huggingface.co/blog/Hcompany/holo1
super exciting times!!
reacted to danaaubakirova's post with πŸ€—β€οΈ 2 days ago
posted an update 3 days ago
posted an update 4 days ago
replied to prithivMLmods's post 5 days ago
reacted to prithivMLmods's post with β€οΈπŸ‘ 5 days ago
view post
Post
4770
OpenAI, Google, Hugging Face, and Anthropic have released guides and courses on building agents, prompting techniques, scaling AI use cases, and more. Below are 10+ minimalistic guides and courses that may help you in your progress. πŸ“–

β€· Agents Companion : https://www.kaggle.com/whitepaper-agent-companion
β€· Building Effective Agents : https://www.anthropic.com/engineering/building-effective-agents
β€· Guide to building agents by OpenAI : https://cdn.openai.com/business-guides-and-resources/a-practical-guide-to-building-agents.pdf
β€· Prompt engineering by Google : https://www.kaggle.com/whitepaper-prompt-engineering
β€· Google: 601 real-world gen AI use cases : https://cloud.google.com/transform/101-real-world-generative-ai-use-cases-from-industry-leaders
β€· Prompt engineering by IBM : https://www.ibm.com/think/topics/prompt-engineering-guide
β€· Prompt Engineering by Anthropic : https://docs.anthropic.com/en/docs/build-with-claude/prompt-engineering/overview
β€· Scaling AI use cases : https://cdn.openai.com/business-guides-and-resources/identifying-and-scaling-ai-use-cases.pdf
β€· Prompting Guide 101 : https://services.google.com/fh/files/misc/gemini-for-google-workspace-prompting-guide-101.pdf
β€· AI in the Enterprise by OpenAI : https://cdn.openai.com/business-guides-and-resources/ai-in-the-enterprise.pdf

by HFπŸ€— :
β€· AI Agents Course by Huggingface : https://huggingface.co/learn/agents-course/unit0/introduction
β€· Smol-agents Docs : https://huggingface.co/docs/smolagents/en/tutorials/building_good_agents
β€· MCP Course by Huggingface : https://huggingface.co/learn/mcp-course/unit0/introduction
β€· Other Course (LLM, Computer Vision, Deep RL, Audio, Diffusion, Cookbooks, etc..) : https://huggingface.co/learn
  • 2 replies
Β·
posted an update 5 days ago
view post
Post
1100
New GUI model by Salesforce AI & Uni HK: Jedi
tianbaoxiexxx/Jedi xlangai/Jedi-7B-1080p πŸ€—
Based on Qwen2.5-VL with Apache 2.0 license

prompt with below screenshot β†’ select "find more"
  • 3 replies
Β·
reacted to prithivMLmods's post with β€οΈπŸ€— 7 days ago
view post
Post
2148
Just made a demo for Cosmos-Reason1, a physical AI model that understands physical common sense and generates appropriate embodied decisions in natural language through long chain-of-thought reasoning. Also added video understanding support to it. πŸ€—πŸš€

✦ Try the demo here : prithivMLmods/DocScope-R1

β€· Cosmos-Reason1-7B : nvidia/Cosmos-Reason1-7B
β€· docscopeOCR-7B-050425-exp : prithivMLmods/docscopeOCR-7B-050425-exp
β€· Captioner-Relaxed : Ertugrul/Qwen2.5-VL-7B-Captioner-Relaxed

β€· Multimodal Implementations : prithivMLmods/multimodal-implementations-67c9982ea04b39f0608badb0

β€· GitHub :
β€’ https://github.com/PRITHIVSAKTHIUR/Cosmos-x-DocScope
β€’ https://github.com/PRITHIVSAKTHIUR/Nvidia-Cosmos-Reason1-Demo.

To know more about it, visit the model card of the respective model. !!
posted an update 7 days ago
view post
Post
1942
HOT: MiMo-VL new 7B vision LMs by Xiaomi surpassing gpt-4o (Mar), competitive in GUI agentic + reasoning tasks ❀️‍πŸ”₯ XiaomiMiMo/mimo-vl-68382ccacc7c2875500cd212

not only that, but also MIT license & usable with transformers πŸ”₯
posted an update 8 days ago
view post
Post
2688
introducing: VLM vibe eval πŸͺ­ visionLMsftw/VLMVibeEval

vision LMs are saturated over benchmarks, so we built vibe eval πŸ’¬

> compare different models with refreshed in-the-wild examples in different categories 🀠
> submit your favorite model for eval
no numbers -- just vibes!
reacted to clem's post with β€οΈπŸ‘€πŸš€ 10 days ago
view post
Post
3484
Playing with Veo3 this morning. Share your prompt if you want me to create videos for you (bonus point if they funnily reference HF/open-source). These videos are "a cat on the moon rapping "I love Hugging Face""!
Β·