Loubna Ben Allal

loubnabnl

AI & ML interests

SmolLMs, ML for code, data

Recent Activity

Organizations

Hugging Face's profile picture BigScience Workshop's profile picture BigScience Catalogue Data's profile picture BigScience Data's profile picture HuggingFaceBR4's profile picture Team 8's profile picture CodeParrot's profile picture BigCode's profile picture Hugging Face H4's profile picture Hugging Face OSS Metrics's profile picture CompVis Community's profile picture BigCode Data's profile picture LocalCodeLLMs's profile picture Need4Speed's profile picture EPFL Machine Learning and Optimization Laboratory's profile picture Code Llama's profile picture Hugging Face Smol Models Research's profile picture Hugging Face Smol Cluster's profile picture Nt3awnou's profile picture huggingPartyParis's profile picture Qwen's profile picture ZeroGPU Explorers's profile picture HF AFAIK's profile picture gg-hf's profile picture Nanotron Research's profile picture Women on Hugging Face's profile picture Hugging Face SMOL's profile picture FineData's profile picture bigcode nvidia's profile picture Social Post Explorers's profile picture Dev Mode Explorers's profile picture Cosmopedia Stories Collab's profile picture HuggingFaceFW-Dev's profile picture StarCoder2 Data's profile picture Data Agents's profile picture Argilla Warehouse's profile picture smol-explorers's profile picture swissai-hf-data's profile picture Hugging Face Science's profile picture Open R1's profile picture smol-ablations's profile picture SmolEvalData's profile picture

loubnabnl's activity

reacted to danaaubakirova's post with ๐Ÿค—โค๏ธ 1 day ago
reacted to danieldk's post with ๐Ÿ”ฅ๐Ÿค— 1 day ago
view post
Post
1386
We have been working on a project called kernels. kernels makes it possible to load compute kernels directly from the Hub! ๐Ÿš€

We plan to give kernels a more proper introduction soon. But for those who have been following along, we are happy to announce a new release:

- New layer API with torch.compile support.
- Experimental support for loading Apple Silicon Metal ๐Ÿค˜ Kernels.
- Generate wheels from Hub kernels for legacy deployments.

Full release notes here: https://github.com/huggingface/kernels/releases/tag/v0.6.0
reacted to Xenova's post with ๐Ÿ”ฅ 1 day ago
view post
Post
1932
NEW: Real-time conversational AI models can now run 100% locally in your browser! ๐Ÿคฏ

๐Ÿ” Privacy by design (no data leaves your device)
๐Ÿ’ฐ Completely free... forever
๐Ÿ“ฆ Zero installation required, just visit a website
โšก๏ธ Blazingly-fast WebGPU-accelerated inference

Try it out: webml-community/conversational-webgpu

For those interested, here's how it works:
- Silero VAD for voice activity detection
- Whisper for speech recognition
- SmolLM2-1.7B for text generation
- Kokoro for text to speech

Powered by Transformers.js and ONNX Runtime Web! ๐Ÿค— I hope you like it!
  • 2 replies
ยท
reacted to AdinaY's post with ๐Ÿ”ฅ 1 day ago
view post
Post
1024
OpenAudio S1-mini ๐Ÿ”Š a new OPEN multilingual TTS model trained on 2M+ hours of data, by FishAudio

fishaudio/openaudio-s1-mini

โœจ Supports 14 languages
โœจ 50+ emotions & tones
โœจ RLHF-optimized
โœจ Special effects: laughing, crying, shouting, etc.
  • 1 reply
ยท
reacted to merve's post with ๐Ÿ”ฅ 1 day ago
view post
Post
1148
Qwen2.5-Omni is soooo good that people build multimodal reasoning models off of it ๐Ÿฅน
> KE-Team/Ke-Omni-R-3B is open-source audio reasoning model sota on average of benchmarks, based on Qwen/Qwen2.5-Omni-3B ๐Ÿ—ฃ๏ธ
> Haoz0206/Omni-R1 is a video reasoning model with pixel level grounding (see below) and it's super competitive โฏ๏ธ based on Qwen/Qwen2.5-Omni-7B
reacted to clem's post with ๐Ÿš€๐Ÿ”ฅ 11 days ago
view post
Post
3484
Playing with Veo3 this morning. Share your prompt if you want me to create videos for you (bonus point if they funnily reference HF/open-source). These videos are "a cat on the moon rapping "I love Hugging Face""!
ยท
reacted to nyuuzyou's post with ๐Ÿ”ฅ 11 days ago
view post
Post
2939
I recently updated nyuuzyou/pxhere dataset and it now contains approximately 1.1M CC0 high-resolution images
reacted to merve's post with ๐Ÿ”ฅ 14 days ago
view post
Post
2584
Google released MedGemma on I/O'25 ๐Ÿ‘ google/medgemma-release-680aade845f90bec6a3f60c4

> 4B and 27B instruction fine-tuned vision LMs and a 4B pre-trained vision LM for medicine
> available with transformers from the get-go ๐Ÿค—

they also released a cool demo for scan reading โžก๏ธ google/rad_explain

use with transformers โคต๏ธ
  • 1 reply
ยท
replied to their post 16 days ago
reacted to AdinaY's post with ๐Ÿ”ฅ๐Ÿš€ 16 days ago
view post
Post
2784
ByteDance is absolutely cooking lately๐Ÿ”ฅ

BAGEL ๐Ÿฅฏ 7B active parameter open multimodal foundation model by Bytedance Seed team.

ByteDance-Seed/BAGEL-7B-MoT

โœจ Apache 2.0
โœจ Outperforms top VLMs (Qwen2.5-VL & InternVL-2.5)
โœจ Mixture-of-Transformer-Experts + dual encoders
โœจ Trained on trillions of interleaved tokens
reacted to sayakpaul's post with ๐Ÿ”ฅ 16 days ago
view post
Post
1679
Despite the emergence of combining LLM and DiT architectures for T2I synthesis, its design remains severely understudied.

This was done long ago and got into CVPR25 -- super excited to finally share it now, along with the data and code โ™ฅ๏ธ

We explore several architectural choices that affect this design. We provide an open & reproducible training recipe that works at scale.

Works like Playground v3 have already explored a deep fusion between an LLM and a DiT, sharing their representations through layerwise attention. They exhibit excellent performance on T2I.

Despite its compelling results and other performance virtues, it remains unexplored, which is what we want to improve in our work. Specifically, we take a pre-trained LLM (Gemma-2B) and trainable DiT, and set out to explore what makes a "good deep fusion" between the two for T2I.

We explore several key questions in the work, such as:

Q1: How should we do attention? We considered several alternatives. PixArt-Alpha like attention (cross-attention) is very promising.
Q2: Should we incorporate additional text modulation?
Q3: Can we eliminate timestep conditioning?
Q4: How do we do positional encodings?
Q5: Do instruction-tuned LLMs help deep fusion?
Q6: Would using a decoder LLM from a multimodal model be helpful?
Q7: Does using a better variant of Gemma help?

Based on the above findings, we arrive at FuseDiT with the following components on top of the base architecture from the findings of our experiments.

* No AdaLN-Zero modules
* 1D + 2D-RoPE
* Gemma 2 2B, adjusting DiT configurations accordingly

We trained FuseDiT on a mixture from CC12M, JourneyDB, & SA (~26M image-text pairs) for 800 steps. While not the best model, it's encouraging to develop something in a guided manner using open datasets.

To know more (code, models, all are available), please check out the paper:
https://lnkd.in/gg6qyqZX.
posted an update 21 days ago
reacted to merterbak's post with ๐Ÿ”ฅ 21 days ago
reacted to albertvillanova's post with ๐Ÿ”ฅ 21 days ago
view post
Post
2417
New in smolagents v1.16.0:
๐Ÿ” Bing support in WebSearchTool
๐Ÿ Custom functions & executor_kwargs in LocalPythonExecutor
๐Ÿ”ง Streaming GradioUI fixes
๐ŸŒ Local web agents via api_base & api_key
๐Ÿ“š Better docs

๐Ÿ‘‰ https://github.com/huggingface/smolagents/releases/tag/v1.16.0
reacted to merve's post with ๐Ÿ”ฅ 21 days ago
view post
Post
2270
New sota open-source depth estimation: Marigold v1-1 ๐ŸŒผ

> normal maps, depth maps of scenes & faces prs-eth/marigold-normals prs-eth/marigold
> get albedo (true color) and BRDF (texture) maps of scenes prs-eth/marigold-intrinsics
> they even release a depth-to-3D printer format demo ๐Ÿ˜ฎ prs-eth/depth-to-3d-print

All models are here prs-eth/marigold-computer-vision-6669e9e3d3ee30f48214b9ba
reacted to lysandre's post with โค๏ธ 3 months ago
view post
Post
6938
SmolVLM-2 and SigLIP-2 are now part of transformers in dedicated releases!

They're added on top of the v4.49.0 release, and can be installed from the following tags: v4.49.0-SmolVLM-2 and v4.49.0-SigLIP-2.

This marks a new beginning for the release process of transformers. For the past five years, we've been doing monthly releases featuring many models (v4.49.0, the latest release, features 9 new architectures).

Starting with SmolVLM-2 & SigLIP2, we'll now additionally release tags supporting new models on a stable branch. These models are therefore directly available for use by installing from the tag itself. These tags will continue to be updated with fixes applied to these models.

Going forward, continue expecting software releases following semantic versioning: v4.50.0 will have ~10 new architectures compared to v4.49.0, as well as a myriad of new features, improvements and bug fixes. Accompanying these software releases, we'll release tags offering brand new models as fast as possible, to make them accessible to all immediately.
  • 1 reply
ยท