Aritra Roy Gosthipaty's picture

Aritra Roy Gosthipaty PRO

ariG23498

·

https://arig23498.github.io/

AI & ML interests

Deep Representation Learning

Recent Activity

upvoted a collection about 12 hours ago

Qwen3-Embedding

updated a dataset about 12 hours ago

model-metadata/trending_models

commented on their article about 15 hours ago

KV Cache from scratch in nanoVLM

View all activity

Organizations

ariG23498's activity

upvoted a collection about 12 hours ago

Qwen3-Embedding

6 items • Updated 1 day ago • 54

upvoted an article 2 days ago

Article

KV Cache from scratch in nanoVLM

By

and 4 others •

3 days ago

• 55

upvoted a paper 3 days ago

FlowMo: Variance-Based Flow Guidance for Coherent Motion in Video Generation

Paper • 2506.01144 • Published 5 days ago • 14

upvoted an article 3 days ago

Article

SmolVLA: Efficient Vision-Language-Action Model trained on Lerobot Community Data

By

and 8 others •

4 days ago

• 93

upvoted a paper 3 days ago

SmolVLA: A Vision-Language-Action Model for Affordable and Efficient Robotics

Paper • 2506.01844 • Published 4 days ago • 74

upvoted 2 changelogs 11 days ago

Changelog

Xet is now the default storage option for new users and organizations

14 days ago

• 58

Changelog

Static Spaces can now have a build step

14 days ago

• 91

upvoted an article 12 days ago

Article

🐯 Liger GRPO meets TRL

By

and 5 others •

13 days ago

• 36

upvoted 2 articles 16 days ago

Article

The Transformers Library: standardizing model definitions

By

and 3 others •

23 days ago

• 111

Article

nanoVLM: The simplest repository to train your VLM in pure PyTorch

By

and 6 others •

17 days ago

• 139

upvoted an article 17 days ago

Article

Microsoft and Hugging Face expand collaboration

By

and 2 others •

19 days ago

• 20

upvoted a collection 22 days ago

MobileCLIP Models + DataCompDR Data

MobileCLIP: Mobile-friendly image-text models with SOTA zero-shot capabilities. DataCompDR: Improved datasets for training image-text SOTA models. • 22 items • Updated Oct 4, 2024 • 29

upvoted an article 23 days ago

Article

Improving Hugging Face Model Access for Kaggle Users

By

and 4 others •

24 days ago

• 27

upvoted an article 25 days ago

Article

Vision Language Models (Better, Faster, Stronger)

By

and 4 others •

26 days ago

• 417

upvoted a paper 29 days ago

MoE-LLaVA: Mixture of Experts for Large Vision-Language Models

Paper • 2401.15947 • Published Jan 29, 2024 • 53

upvoted an article about 1 month ago

Article

A Dive into Pretraining Strategies for Vision-Language Models

By

and 1 other •

Feb 3, 2023

• 67

upvoted a paper about 1 month ago

Kimi-VL Technical Report

Paper • 2504.07491 • Published Apr 10 • 127

upvoted an article about 1 month ago

Article

Vision Language Models Explained

By

and 1 other •

Apr 11, 2024

• 372

upvoted a paper about 1 month ago

CubeDiff: Repurposing Diffusion-Based Image Models for Panorama Generation

Paper • 2501.17162 • Published Jan 28 • 1

upvoted a collection about 1 month ago

D-FINE

State-of-the-art real-time object detection model with Apache 2.0 licence • 15 items • Updated May 5 • 55