Nerdy Face

Enterprise
company

AI & ML interests

None defined yet.

Recent Activity

nerdyface's activity

m-ric 
posted an update 2 days ago
view post
Post
1055
If you didn't yet, you should read the technical report for SmolVLA, published yesterday by the Hugging Face robotics team!
➡️ Amongst other ideas, it introduces "Async inference" to boost their robot actions.

Robots have a problem: performing the actions takes time (Unlike agents where action executions are near-instant!)
Most often, robots wait until they've finished performing actions to start thinking about hte next steps. This is a huge latency cost!

So the team decided to have the PolicyServer (aka the"thinking" part) restart early : instead of waiting for the n observations they just sent to be completed, they gather the observations after k < n steps, and start preparing the next actions based on that while the steps are running until n, to directly send their next steps.

➡️ This boosted robot throughput by ~30%! (nearly 2× tasks per time window).

gg @cadene and team! 👏

Report here: SmolVLA: A Vision-Language-Action Model for Affordable and Efficient Robotics (2506.01844)
AtAndDev 
posted an update 9 days ago
view post
Post
2667
deepseek-ai/DeepSeek-R1-0528

This is the end
  • 1 reply
·
m-ric 
posted an update 11 days ago
view post
Post
2560
A new research paper from KAIST builds on smolagents to push boundaries of distillation 🥳
➡️ "Distilling LLM Agent into Small Models with Retrieval and Code Tools" teaches that, when trying to distil reasoning capability from a strong LLM ("teacher") into a smaller one ("student"), it's much better to use Agent traces than CoT traces.

Advantages are:
1. Improved generalization
Intuitively, this is because your agent can encounter more "surprising" results by interacting with its environment : for example, a web research called by the LLM teacher in agent mode can bring results that the LLM teacher would not have generated in CoT.

2. Reduce hallucinations
The trace won't hallucinate tool call outputs!

Thank you @akseljoonas for mentioning this paper!
m-ric 
posted an update 24 days ago
view post
Post
2629
𝗔𝗯𝘀𝗼𝗹𝘂𝘁𝗲 𝗭𝗲𝗿𝗼: 𝗟𝗟𝗠𝘀 𝗰𝗮𝗻 𝘁𝗿𝗮𝗶𝗻 𝘄𝗶𝘁𝗵𝗼𝘂𝘁 𝗮𝗻𝘆 𝗲𝘅𝘁𝗲𝗿𝗻𝗮𝗹 𝗱𝗮𝘁𝗮 🤯

Has the "data wall" just been breached?

Recent RL paradigms often relied on a set of questions an answers that needs to be manually curated. Researchers from Tsinghua University went like "why though".

🤔 Indeed, why learn from question designed by a human teacher, when the model can start from their base knowledge and learn by experimenting in a code environment, proposing coding tasks themselves and trying to solve them?

Thus they created “Absolute Zero Reasoning” (AZR), an approach that removes any need for human curated data.

🎭 𝗗𝘂𝗮𝗹 𝗿𝗼𝗹𝗲𝘀:
‣ Proposer: Generates challenging but solvable coding tasks
‣ Solver: Attempts to solve those self-proposed tasks

🧪 𝗧𝗵𝗿𝗲𝗲 𝘁𝗮𝘀𝗸 𝘁𝘆𝗽𝗲𝘀: all types are defined as triplets of program, input and output
‣ Deduction: Give model an input and program, it must deduce the output
‣ Abduction: Give model an program and output, it must find the input that gave said output
‣ Induction: Synthesize a program from input/output pairs
Btw this reminded me of my long-forgotten philosophy classes: Aristotle was more on the induction side, learning from real-world analogies, while Plato was more on the deduction side, trying to progress quite far with just one input and his reasoning.

📊 𝗥𝗲𝘀𝘂𝗹𝘁𝘀:
‣ AZR post-training creates a nice improvement on known models like Qwen2.5-7B
‣ Shows strong cross-domain transfer: coding ↔️ math reasoning

🧐 𝗢𝘁𝗵𝗲𝗿 𝗳𝗶𝗻𝗱𝗶𝗻𝗴𝘀:
‣ Having a better base performance (general or code specific) amplify the gains from Absolute Zero Reasoning
‣ Researchers warn about "Uh-oh moments" (winking to the "aha moments" of DeepSeek) where the model generates concerning goals like "make an extremely convoluted code to outsmart all these humans": so supervision is still needed!

Paper here: Absolute Zero: Reinforced Self-play Reasoning with Zero Data (2505.03335)
m-ric 
posted an update 28 days ago
view post
Post
4423
I've made an open version of Google's NotebookLM, and it shows the superiority of the open source tech task! 💪

The app's workflow is simple. Given a source PDF or URL, it extracts the content from it, then tasks Meta's Llama 3.3-70B with writing the podcast script, with a good prompt crafted by @gabrielchua ("two hosts, with lively discussion, fun notes, insightful question etc.")
Then it hands off the text-to-speech conversion to Kokoro-82M, and there you go, you have two hosts discussion any article.

The generation is nearly instant, because:
> Llama 3.3 70B is running at 1,000 tokens/seconds with Cerebras inference
> The audio is generated in streaming mode by the tiny (yet powerful) Kokoro, generating voices faster than real-time.

And the audio generation runs for free on Zero GPUs, hosted by HF on H200s.

Overall, open source solutions rival the quality of closed-source solutions at close to no cost!

Try it here 👉👉 m-ric/open-notebooklm
·
hassenhamdi 
posted an update about 1 month ago
view post
Post
416
I want to remind community that, making tech safe and beneficial and using it responsibly is a collective responsibility and I urge every member of community that value rational and ethical use of tech, AI etc. to take action, stand and speak against any conduct that might cause things to shift toward undesired path,
Yesterday a dataset been uploaded on huggingface ,your favorite daily place for AI dataset, model, post ,research, learning about AI etc, for military applications which is against ethics and responsible AI, huggingface policies as stated in their restricted content list:

1. Unlawful or illegal Content
...
Content promoting high-risk illegal activities (weapons development, illegal substances, scams, gambling, pseudo-pharmaceuticals, plagiarism, etc.).
...
3. Harmful or Abusive Content
Terrorist Content or Content that glorifies **violence**, suffering, or humiliation.
...
we may also moderate other types of Content in response to evolving challenges posed by advancements in Machine Learning.

I urge community to report any content against ethical use of tech stack including AI , keeping huggingface good place for AI and data enthusiasts.
Here the link to the dataset: ZennyKenny/tactical-military-reasoning-v.1.0
Take action report it, let the platform stay an enjoyable place for whatever you are using it for AI , Data etc.
  • 5 replies
·
hassenhamdi 
posted an update about 1 month ago
view post
Post
426
It seems that huggingface consider deepfakes content a violation/ misconduct but not encouraging harmful activity or application such using AI for developing war technology.
How shameful.
m-ric 
posted an update about 2 months ago
view post
Post
2873
New king of open VLMs: InternVL3 takes Qwen 2.5's crown! 👑

InternVL have been a wildly successful series of model : and the latest iteration has just taken back their crown thanks to their superior, natively multimodal vision training pipeline.

➡️ Most of the vision language models (VLMs) these days are built like Frankenstein : take a good text-only Large Language Model (LLM) backbone, stitch a specific vision transformer (ViT) on top of it. Then the training is sequential 🔢 : 1. Freeze the LLM weights while you train the ViT only to work with the LLM part, then 2. Unfreeze all weights to train all weights in order to work together.

💫 The Shanghai Lab decided to challenge this paradigm and chose this approach that they call "native". For each of their model sizes, they still start from a good LLM (mostly Qwen-2.5 series, did I tell you I'm a huge fan of Qwen? ❤️), and stitch the ViT, but they don't freeze anything : they train all weights together with interleaved text and image understanding data in a single pre-training phase 🎨.

They claim it results in more seamless interactions between modalities. And the results prove them right: they took the crown of top VLMs, at nearly all sizes, from their Qwen-2.5 parents. 👑
  • 2 replies
·
thomwolf 
posted an update about 2 months ago
view post
Post
5084
If you've followed the progress of robotics in the past 18 months, you've likely noticed how robotics is increasingly becoming the next frontier that AI will unlock.

At Hugging Face—in robotics and across all AI fields—we believe in a future where AI and robots are open-source, transparent, and affordable; community-built and safe; hackable and fun. We've had so much mutual understanding and passion working with the Pollen Robotics team over the past year that we decided to join forces!

You can already find our open-source humanoid robot platform Reachy 2 on the Pollen website and the Pollen community and people here on the hub at pollen-robotics

We're so excited to build and share more open-source robots with the world in the coming months!
  • 1 reply
·
AtAndDev 
posted an update 2 months ago
view post
Post
3020
Llama 4 is out...
·
m-ric 
posted an update 2 months ago
view post
Post
2406
🚀 DeepSeek R1 moment has come for GUI agents: Rule-based Reinforcement Learning gives better results than SFT with 500x smaller datasets!

Traditionally (by which I mean "in the last few months"), GUI agents have been trained with supervised fine-tuning (SFT). This meant, collecting huge datasets of screen captures from people using computers, and using these to fine-tune your model. 📚

👉 But last week, a new paper introduced UI-R1, applying DeepSeek's R1-style rule-based reinforcement learning (RL) specifically to GUI action prediction tasks.
This is big news: with RL, maybe we could build good agents without the need for huge datasets.

UI-R1 uses a unified reward function that evaluates multiple responses from models, optimizing via policy algorithms like Group Relative Policy Optimization (GRPO).

Specifically, the reward function assesses:
🎯 Action type accuracy: Does the predicted action match the ground truth?
📍 Coordinate accuracy (specifically for clicks): Is the predicted click within the correct bounding box?
📑 Output format: Does the model clearly articulate both its reasoning and final action?

Using just 136 carefully selected mobile tasks—compared to 76,000 tasks for larger models like OS-Atlas—UI-R1 shows significant efficiency and improved performance:
📈 Boosted action prediction accuracy from 76% to 89% on AndroidControl.
🌐 Outperformed larger, SFT-trained models (e.g., OS-Atlas-7B), demonstrating superior results with vastly fewer data points (136 tasks vs. 76K).
🔍 Enhanced adaptability and generalization, excelling even in out-of-domain scenarios.

The paper tests this RL-based method only in low-level GUI tasks. Could it generalize to more complex interactions? 🧐

Read the full paper here 👉 UI-R1: Enhancing Action Prediction of GUI Agents by Reinforcement Learning (2503.21620)
thomwolf 
posted an update 2 months ago
view post
Post
3499
The new DeepSite space is really insane for vibe-coders
enzostvs/deepsite

With the wave of vibe-coding-optimized LLMs like the latest open-source DeepSeek model (version V3-0324), you can basically prompt out-of-the-box and create any app and game in one-shot.

It feels so powerful to me, no more complex framework or under-the-hood prompt engineering to have a working text-to-app tool.

AI is eating the world and *open-source* AI is eating AI itself!

PS: and even more meta is that the DeepSite app and DeepSeek model are both fully open-source code => time to start recursively improve?

PPS: you still need some inference hosting unless you're running the 600B param model at home, so check the very nice list of HF Inference Providers for this model: deepseek-ai/DeepSeek-V3-0324
  • 1 reply
·
AtAndDev 
posted an update 3 months ago
view post
Post
4292
There seems to multiple paid apps shared here that are based on models on hf, but some ppl sell their wrappers as "products" and promote them here. For a long time, hf was the best and only platform to do oss model stuff but with the recent AI website builders anyone can create a product (really crappy ones btw) and try to sell it with no contribution to oss stuff. Please dont do this, or try finetuning the models you use...
Sorry for filling yall feed with this bs but yk...
  • 6 replies
·
m-ric 
posted an update 3 months ago
view post
Post
5075
smolagents now support vLLM! 🥳

As one of the most popular local inference solutions, the community had been asking us to integrate vLLM: after a heavy refactoring of our LLM classes, we've just released smolagents 1.11.0, with a brand new VLLMModel class.

Go try it and tell us what you think!

https://github.com/huggingface/smolagents/blob/45b2c86857b7f7657daaa74e4d17d347e9e2c4a4/src/smolagents/models.py#L497
AtAndDev 
posted an update 3 months ago
view post
Post
1625
Gemma 3 seems to be really good at human preference. Just waiting for ppl to see it.
thomwolf 
posted an update 3 months ago
view post
Post
2928
We've kept pushing our Open-R1 project, an open initiative to replicate and extend the techniques behind DeepSeek-R1.

And even we were mind-blown by the results we got with this latest model we're releasing: ⚡️OlympicCoder ( open-r1/OlympicCoder-7B and open-r1/OlympicCoder-32B)

It's beating Claude 3.7 on (competitive) programming –a domain Anthropic has been historically really strong at– and it's getting close to o1-mini/R1 on olympiad level coding with just 7B parameters!

And the best part is that we're open-sourcing all about its training dataset, the new IOI benchmark, and more in our Open-R1 progress report #3: https://huggingface.co/blog/open-r1/update-3

Datasets are are releasing:
- open-r1/codeforces
- open-r1/codeforces-cots
- open-r1/ioi
- open-r1/ioi-test-cases
- open-r1/ioi-sample-solutions
- open-r1/ioi-cots
- open-r1/ioi-2024-model-solutions