AI & ML interests

None defined yet.

merveย 
posted an update 13 days ago
view post
Post
2952
GPT-4.1-mini level model right in your iPhone ๐Ÿคฏ

openbmb/MiniCPM-V-4 is only 4B while surpassing GPT-4.1-mini in vision benchmarks ๐Ÿ”ฅ

allows commercial use as well!
merveย 
posted an update 15 days ago
view post
Post
1056
we're all sleeping on this OCR model rednote-hilab/dots.ocr ๐Ÿ”ฅ

dots.ocr is a new 3B model with sota performance, support for 100 languages & allowing commercial use! ๐Ÿคฏ

single e2e model to extract image, convert tables, formula, and more into markdown ๐Ÿ“
try it MohamedRashad/Dots-OCR
merveย 
posted an update 16 days ago
view post
Post
608
massive releases and tons of Flux 1. Krea LoRas past week!
here's some of the picks, find more models in collection ๐Ÿซก merve/releases-august-2-6890c14248203522b7d0267f

LLMs ๐Ÿ’ฌ
> Tencent dropped tencent/Hunyuan-7B-Instruct
> Qwen released Qwen/Qwen3-Coder-30B-A3B-Instruct, 30B MoE with 3B params for coding (OS)

vision/multimodal
> RedNote released rednote-hilab/dots.ocr - 3B OCR model (OS)
> Cohere released CohereLabs/command-a-vision-07-2025 - 112B (dense!) VLM for 6 languages
> StepFun-AI shipped stepfun-ai/step3 - 321B MoE VLM (OS)
> Skywork shipped Skywork/Skywork-UniPic-1.5B - new any-to-any model (image+text โ†’ image+text) (OS)
merveย 
posted an update 20 days ago
jsulzย 
posted an update 21 days ago
view post
Post
2912
We've crossed 1 million repositories backed by Xet storage on Hugging Face! ๐Ÿš€๐Ÿš€๐Ÿš€

You can follow along our progress converting the Hub from Git LFS to Xet at jsulz/ready-xet-go

We have a lot of repos left to migrate, which means I have plenty of time to add more animations ๐Ÿคช
merveย 
posted an update 21 days ago
view post
Post
3577
past week in open AI was insane ๐Ÿ”ฅ here's some of picks, find more here merve/releases-july-25-688768ca47fe3693407e02d1

๐Ÿ’ฌ LLMs & VLMs
> Qwen/Qwen3-235B-A22B-Thinking-2507 had a new update (OS)
> Qwen/Qwen3-Coder-480B-A35B-Instruct is out with 480B total 35B active params ๐Ÿคฏ (OS)
> AllenAI dropped an update to allenai/olmOCR-7B-0725 ๐Ÿ“
> InternLM released internlm/Intern-S1 - 235B Qwen3 MoE + 6B InternViT encoder (OS)
> OmniSVG/OmniSVG is a new SVG generation VLM (OS)

๐Ÿ–ผ๏ธ image/video/3D generation
> WanAI released Wan2.2 series - both T2V and I2V 14B models for high-quality video generation (OS) multimodalart/wan-22-688767e313337b434ed55112
> Tencent dropped tencent/HunyuanWorld-1 - image-to-3D scene generation model
  • 1 reply
ยท
merveย 
posted an update 23 days ago
view post
Post
4350
๐Ÿคฏ 241B VLM with apache-2.0 license internlm/Intern-S1

internlm released Intern-S1: multimodal reasoning model based on 235B MoE Qwen3 and 6B InternViT ๐Ÿ˜

benchmarks look great (๐Ÿ‘‘ best model โœ… best open model)
merveย 
posted an update 28 days ago
view post
Post
803
so many open LLMs and image LoRAs dropped past week, here's some picks for you ๐Ÿซก merve/releases-july-18-687e3fbd2ab9b39c51f9238b

LLMs
> ByteDance released a bunch of translation models called Seed-X-RM (7B) ByteDance-Seed/Seed-X-RM-7B
> NVIDIA released reasoning models of which 32B surpassing the giant Qwen3-235B with cc-by-4.0 license ๐Ÿ‘ nvidia/openreasoning-nemotron-687730dae0170059860f1f01
> LG released a new EXAONE model (32B) LGAI-EXAONE/EXAONE-4.0-32B

VLMs/any-to-any
> vidore/colqwen-omni-v0.1 is a new any-to-any retriever (MIT)
> HiDream-ai/HiDream-E1-1 is image+text in image+text out model (MIT)

LoRAs
> There's a bunch of LoRAs based on Flux Kontext, gotta check out the collection ๐Ÿค 
merveย 
posted an update about 1 month ago
merveย 
posted an update about 1 month ago
merveย 
posted an update about 1 month ago
view post
Post
2625
Fine-tune Gemma3n on videos with audios inside with Colab A100 ๐Ÿ”ฅ
Just dropped the notebook where you can learn how to fine-tune Gemma3n on images+audio+text at the same time!

keep in mind, it's made for educational purposes ๐Ÿซก we do LoRA, audio resampling & video downsampling to be able to train <40GB VRAM

stretch modalities and unfreeze layers as you wish! ๐Ÿ™๐Ÿป merve/smol-vision
  • 1 reply
ยท
danieldkย 
posted an update about 1 month ago
view post
Post
1974
kernels 0.8.0 is out: https://github.com/huggingface/kernels/releases/tag/v0.8.0

This release refines kernel selection in the kernelize function:

โ€ข You can now register kernels for certain CUDA capability ranges.
โ€ข Rather than doing exact mating of modes, fall back to other compatible modes. If you are kernelizing for inference, but you only registered a training + torch.compile kernel, it will use that kernel since it is compatible with inference as well.
  • 1 reply
ยท
jsulzย 
posted an update about 1 month ago
view post
Post
2965
We've moved over 20PB from Git LFS to Xet on the Hub without downtime or data loss. Having things "just work" on a migration of this scale is about as good as it gets.

Now, we're migrating the rest of the Hub https://huggingface.co/blog/migrating-the-hub-to-xet

But how did we get here?

In the early days of joining Hugging Face, we made a few key design decisions:
* There would be no "hard cut-over" from Git LFS to Xet
* A Xet-enabled repository should be able to contain both Xet and LFS files
* Repository migrations from LFS to Xet can run in the background without disrupting downloads or uploads

These were largely driven by our desire to ensure the community could keep working without interruption.

We cover the infrastructure making this all go in this post, specifically:
* An integral piece of infrastructure known internally as the Git LFS Bridge
* Background content migrations that run around the clock

To skip the wait and join Xet now, sign up here https://huggingface.co/join/xet
merveย 
posted an update about 1 month ago
view post
Post
2449
past week had huuuge releases ๐Ÿ’—
here's our picks ๐Ÿ”ฅ find more models, datasets, demos here merve/releases-july-11-68750452c358c98b0fa663f7

> moonshotai/Kimi-K2-Instruct is the new sota LLM with 1T total 32B active parameters ๐Ÿคฏ

> HuggingFaceTB/SmolLM3-3B is the new best LM for it's size, offers thinking mode ๐Ÿ’ญ as well as the dataset HuggingFaceTB/smoltalk2

> Alibaba-NLP/WebSailor-3B is the new agentic LLM for complex browsing

> Google DeepMind released medical vision LMs with an agentic doctor-patient app google/medgemma-release-680aade845f90bec6a3f60c4

> fal released a LoRA to improve details on face images fal/Realism-Detailer-Kontext-Dev-LoRA
danieldkย 
posted an update about 1 month ago
danieldkย 
posted an update about 1 month ago
view post
Post
353
Kernels 0.7.0 is out: https://github.com/huggingface/kernels/releases/tag/v0.7.0 ๐Ÿš€

This release makes it possible to register multiple kernels for a layer. Do you have a super-fast kernel for inference and another kernel for training? Register them both and kernelize will pick the kernel depending on whether you are going to do training or inference.
merveย 
posted an update about 1 month ago
view post
Post
3135
GitHub refuses to render notebooks for a long time now ๐Ÿ’”

so smol-vision now lives in Hugging Face model repository ๐Ÿค— merve/smol-vision
  • 1 reply
ยท
merveย 
posted an update about 1 month ago
view post
Post
3466
ByteDance released Tar 1.5B and 7B: image-text in image-text out models, fully open-source ๐Ÿ‘ ByteDance-Seed/tar-6864cf0d9fe59a3b91cc4260

They have an image tokenizer unified with text, and they de-tokenize using either of two models (LLM and diffusion)
The model is actually a full LLM (Qwen2), the tokenizer converts image tokens ๐Ÿคฏ
merveย 
posted an update about 1 month ago
view post
Post
3700
Huge drops in open AI past week!
Find more models, datasets, demos here merve/releases-july-4-686bcc54ed7c45c341fbf654
Some of our picks ๐Ÿซก
โฏ๏ธ BAAI/MTVCraft is a new Veo3-like text-to-video model, demo is here BAAI/MTVCraft
๐Ÿง‘๐Ÿปโ€๐Ÿ’ป apple/diffucoder-6868139f56672ae046fe04e8 is a new family of diffusion LLMs (7B base and instruct) for coding
๐Ÿ—ฃ๏ธ kyutai/tts-1.6b-en_fr is a new small TTS model for English and France
๐Ÿ‘€ aharley/alltracker is a new pixel tracking model by Stanford, demo is here aharley/alltracker
๐Ÿ“– racineai/OGC_MEGA_MultiDomain_DocRetrieval is a new large visual document retrieval dataset
  • 1 reply
ยท
merveย 
posted an update about 2 months ago
view post
Post
971
SOOOO MANY MODEL RELEASES ๐Ÿ˜
Here's some picks from past week ๐Ÿค—

> ByteDance/XVerse is a new identity preserving image generation model ๐Ÿ–ผ๏ธ
> google/gemma-3n-E4B-it, any-to-text model supported by transformers ๐Ÿค—
> nvidia/llama-nemoretriever-colembed-3b-v1 two new state-of-the-art visual document retrievers ๐Ÿ“‘
> New version of Dia TTS model is up nari-labs/Dia-1.6B-0626
> Black Forest Labs releases Kontext benchmark black-forest-labs/kontext-bench

Find more here merve/releases-june-27-6864e8eb17f7e3a8b444083c