111 13 229

s3nh

AI & ML interests

Quantization, LLMs, Deep Learning for good. Follow me if you like my work. Patreon.com/s3nh

Recent Activity

liked a model 12 days ago

ByteDance-Seed/BAGEL-7B-MoT

liked a model 13 days ago

uwg/upscaler

reacted to KnutJaegersberg's post with ❤️ 18 days ago

The Intelligence Curse The document warns of the "intelligence curse," a potential consequence of advanced AI (AGI) where powerful entities lose their incentive to invest in people as AI automates work[cite: 13, 297]. This could lead to job displacement, reduced social mobility, and a concentration of power and wealth based on AI ownership, similar to the "resource curse" in resource-rich states[cite: 17, 18, 31, 329, 353]. To counter this, the authors propose averting AI catastrophes to prevent centralization, diffusing AI widely to keep humans economically relevant, and democratizing institutions to remain anchored to human needs[cite: 22, 23, 25, 35, 36, 37, 566]. https://intelligence-curse.ai/intelligence-curse.pdf

View all activity

Organizations

s3nh's activity

liked a model 12 days ago

ByteDance-Seed/BAGEL-7B-MoT

Any-to-Any • Updated 15 days ago • 9.94k • 977

liked a model 13 days ago

uwg/upscaler

Updated Dec 23, 2024 • 606

reacted to KnutJaegersberg's post with ❤️ 18 days ago

Post

2729

The Intelligence Curse

The document warns of the "intelligence curse," a potential consequence of advanced AI (AGI) where powerful entities lose their incentive to invest in people as AI automates work[cite: 13, 297]. This could lead to job displacement, reduced social mobility, and a concentration of power and wealth based on AI ownership, similar to the "resource curse" in resource-rich states[cite: 17, 18, 31, 329, 353]. To counter this, the authors propose averting AI catastrophes to prevent centralization, diffusing AI widely to keep humans economically relevant, and democratizing institutions to remain anchored to human needs[cite: 22, 23, 25, 35, 36, 37, 566].

https://intelligence-curse.ai/intelligence-curse.pdf

reacted to loubnabnl's post with ❤️ 18 days ago

Post

2672

SmolVLM is now available on PocketPal — you can run it offline on your smartphone to interpret the world around you. 🌍📱

And check out this real-time camera demo by @ngxson , powered by llama.cpp:
https://github.com/ngxson/smolvlm-realtime-webcam
https://x.com/pocketpal_ai

3 replies

liked a model 18 days ago

Kijai/WanVideo_comfy

Updated 3 days ago • 654

reacted to merve's post with 👍🚀 19 days ago

Post

6574

A real-time object detector much faster and accurate than YOLO with Apache 2.0 license just landed to Hugging Face transformers 🔥

D-FINE is the sota real-time object detector that runs on T4 (free Colab) 🤩

> Collection with all checkpoints and demo ustc-community/d-fine-68109b427cbe6ee36b4e7352

Notebooks:
> Tracking https://github.com/qubvel/transformers-notebooks/blob/main/notebooks/DFine_tracking.ipynb
> Inference https://github.com/qubvel/transformers-notebooks/blob/main/notebooks/DFine_inference.ipynb
> Fine-tuning https://github.com/qubvel/transformers-notebooks/blob/main/notebooks/DFine_finetune_on_a_custom_dataset.ipynb
h/t @vladislavbro @qubvel-hf @ariG23498 and the authors of the paper 🎩

Regular object detectors attempt to predict bounding boxes in (x, y, w, h) pixel perfect coordinates, which is very rigid and hard to solve 🥲☹️

D-FINE formulates object detection as a distribution for bounding box coordinates, refines them iteratively, and it's more accurate 🤩

Another core idea behind this model is Global Optimal Localization Self-Distillation ⤵️

this model uses final layer's distribution output (sort of like a teacher) to distill to earlier layers to make early layers more performant.

2 replies

reacted to mrfakename's post with 👍🤗 19 days ago

Post

3382

Hi everyone,

I just launched TTS Arena V2 - a platform for benchmarking TTS models by blind A/B testing. The goal is to make it easy to compare quality between open-source and commercial models, including conversational ones.

What's new in V2:

- **Conversational Arena**: Evaluate models like CSM-1B, Dia 1.6B, and PlayDialog in multi-turn settings
- **Personal Leaderboard**: Optional login to see which models you tend to prefer
- **Multi-speaker TTS**: Random voices per generation to reduce speaker bias
- **Performance Upgrade**: Rebuilt from Gradio → Flask. Much faster with fewer failed generations.
- **Keyboard Shortcuts**: Vote entirely via keyboard

Also added models like MegaTTS 3, Cartesia Sonic, and ElevenLabs' full lineup.

I'd love any feedback, feature suggestions, or ideas for models to include.

TTS-AGI/TTS-Arena-V2

4 replies

liked a model 19 days ago

calcuis/wan-gguf

Text-to-Video • Updated 14 days ago • 34.1k • 96

liked 2 Spaces 21 days ago

729

MMAudio — generating synchronized audio from video/text

🔊

Generate audio from video or text prompts

596

Di♪♪Rhythm

🎶

Blazingly Fast and Embarrassingly Simple Song Generation

liked 2 models 23 days ago

Lightricks/LTX-Video-0.9.7-dev

Text-to-Video • Updated 15 days ago • 3.79k • • 10

Lightricks/LTX-Video

Text-to-Video • Updated 17 days ago • 494k • • 1.65k

reacted to merve's post with 🚀👍 about 1 month ago

Post

2211

you can easily fine-tune, quantize, play with sota vision LM InternVL3 now 🔥
we have recently merged InternVL3 to Hugging Face transformers and released converted checkpoints 🤗

collection for converted checkpoints: merve/internvl3-hf-6814be2943b2ae0e711c92a5
notebook: https://colab.research.google.com/drive/1wAQ7cyjyaCwLXbMA_OjXZe7aCxCFm6sI?usp=sharing 📖