Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2503.08685

"Principal Components" Enable A New Language of Images

Paper • 2503.08685 • Published Mar 11 • 12

Image-Video General Tasks

Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos

Paper • 2501.04001 • Published Jan 7 • 47
LLaVA-Mini: Efficient Image and Video Large Multimodal Models with One Vision Token

Paper • 2501.03895 • Published Jan 7 • 53
An Empirical Study of Autoregressive Pre-training from Videos

Paper • 2501.05453 • Published Jan 9 • 42
MatchAnything: Universal Cross-Modality Image Matching with Large-Scale Pre-Training

Paper • 2501.07556 • Published Jan 13 • 6

GenEx: Generating an Explorable World

Paper • 2412.09624 • Published Dec 12, 2024 • 98
IamCreateAI/Ruyi-Mini-7B

Image-to-Video • Updated Dec 25, 2024 • 91 • 612
Track4Gen: Teaching Video Diffusion Models to Track Points Improves Video Generation

Paper • 2412.06016 • Published Dec 8, 2024 • 20
Byte Latent Transformer: Patches Scale Better Than Tokens

Paper • 2412.09871 • Published Dec 13, 2024 • 109

RuCCoD: Towards Automated ICD Coding in Russian

Paper • 2502.21263 • Published Feb 28 • 133
Unified Reward Model for Multimodal Understanding and Generation

Paper • 2503.05236 • Published Mar 7 • 124
Sketch-of-Thought: Efficient LLM Reasoning with Adaptive Cognitive-Inspired Sketching

Paper • 2503.05179 • Published Mar 7 • 47
R1-Searcher: Incentivizing the Search Capability in LLMs via Reinforcement Learning

Paper • 2503.05592 • Published Mar 7 • 27

Data and other things

MegaPairs: Massive Data Synthesis For Universal Multimodal Retrieval

Paper • 2412.14475 • Published Dec 19, 2024 • 55
How to Synthesize Text Data without Model Collapse?

Paper • 2412.14689 • Published Dec 19, 2024 • 53
Token-Budget-Aware LLM Reasoning

Paper • 2412.18547 • Published Dec 24, 2024 • 47
WavePulse: Real-time Content Analytics of Radio Livestreams

Paper • 2412.17998 • Published Dec 23, 2024 • 11

"Principal Components" Enable A New Language of Images

Paper • 2503.08685 • Published Mar 11 • 12

RuCCoD: Towards Automated ICD Coding in Russian

Paper • 2502.21263 • Published Feb 28 • 133
Unified Reward Model for Multimodal Understanding and Generation

Paper • 2503.05236 • Published Mar 7 • 124
Sketch-of-Thought: Efficient LLM Reasoning with Adaptive Cognitive-Inspired Sketching

Paper • 2503.05179 • Published Mar 7 • 47
R1-Searcher: Incentivizing the Search Capability in LLMs via Reinforcement Learning

Paper • 2503.05592 • Published Mar 7 • 27

Image-Video General Tasks

Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos

Paper • 2501.04001 • Published Jan 7 • 47
LLaVA-Mini: Efficient Image and Video Large Multimodal Models with One Vision Token

Paper • 2501.03895 • Published Jan 7 • 53
An Empirical Study of Autoregressive Pre-training from Videos

Paper • 2501.05453 • Published Jan 9 • 42
MatchAnything: Universal Cross-Modality Image Matching with Large-Scale Pre-Training

Paper • 2501.07556 • Published Jan 13 • 6

Data and other things

MegaPairs: Massive Data Synthesis For Universal Multimodal Retrieval

Paper • 2412.14475 • Published Dec 19, 2024 • 55
How to Synthesize Text Data without Model Collapse?

Paper • 2412.14689 • Published Dec 19, 2024 • 53
Token-Budget-Aware LLM Reasoning

Paper • 2412.18547 • Published Dec 24, 2024 • 47
WavePulse: Real-time Content Analytics of Radio Livestreams

Paper • 2412.17998 • Published Dec 23, 2024 • 11

GenEx: Generating an Explorable World

Paper • 2412.09624 • Published Dec 12, 2024 • 98
IamCreateAI/Ruyi-Mini-7B

Image-to-Video • Updated Dec 25, 2024 • 91 • 612
Track4Gen: Teaching Video Diffusion Models to Track Points Improves Video Generation

Paper • 2412.06016 • Published Dec 8, 2024 • 20
Byte Latent Transformer: Patches Scale Better Than Tokens

Paper • 2412.09871 • Published Dec 13, 2024 • 109

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs