Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2402.13217

about 17 hours ago

EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters

Paper • 2402.04252 • Published Feb 6, 2024 • 29
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models

Paper • 2402.03749 • Published Feb 6, 2024 • 13
ScreenAI: A Vision-Language Model for UI and Infographics Understanding

Paper • 2402.04615 • Published Feb 7, 2024 • 44
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss

Paper • 2402.05008 • Published Feb 7, 2024 • 23

AI Paper of the Day

A collection of papers that I think are interesting, one added each day

about 2 hours ago

Can Large Language Models Understand Context?

Paper • 2402.00858 • Published Feb 1, 2024 • 24
OLMo: Accelerating the Science of Language Models

Paper • 2402.00838 • Published Feb 1, 2024 • 84
Self-Rewarding Language Models

Paper • 2401.10020 • Published Jan 18, 2024 • 152
SemScore: Automated Evaluation of Instruction-Tuned LLMs based on Semantic Textual Similarity

Paper • 2401.17072 • Published Jan 30, 2024 • 24

VideoPrism: A Foundational Visual Encoder for Video Understanding

Paper • 2402.13217 • Published Feb 20, 2024 • 37

VideoAgent: Long-form Video Understanding with Large Language Model as Agent

Paper • 2403.10517 • Published Mar 15, 2024 • 38
VideoAgent: A Memory-augmented Multimodal Agent for Video Understanding

Paper • 2403.11481 • Published Mar 18, 2024 • 13
VideoMamba: State Space Model for Efficient Video Understanding

Paper • 2403.06977 • Published Mar 11, 2024 • 31
MovieLLM: Enhancing Long Video Understanding with AI-Generated Movies

Paper • 2403.01422 • Published Mar 3, 2024 • 30

VideoPrism: A Foundational Visual Encoder for Video Understanding

Paper • 2402.13217 • Published Feb 20, 2024 • 37
World Model on Million-Length Video And Language With RingAttention

Paper • 2402.08268 • Published Feb 13, 2024 • 41
microsoft/xclip-base-patch16-zero-shot

Video Classification • 0.2B • Updated Sep 12, 2023 • 1.6k • 26
MCG-NJU/videomae-base

Video Classification • 0.1B • Updated Mar 29, 2024 • 83.7k • 47

VideoPrism is a foundational video encoder that enables state-of-the-art performance on a large variety of video understanding tasks.

VideoPrism: A Foundational Visual Encoder for Video Understanding

Paper • 2402.13217 • Published Feb 20, 2024 • 37
google/videoprism-base-f16r288

Video Classification • Updated 22 days ago • 28.4k • 82
google/videoprism-large-f8r288

Video Classification • Updated 22 days ago • 5.33k • 16
google/videoprism-lvt-base-f16r288

Video Classification • Updated 22 days ago • 26.4k • 8

vrgamedevgirl84/Wan14BT2VFusioniX

Text-to-Video • Updated Jun 21 • 534
TheStageAI/Elastic-mochi-1-preview

Text-to-Video • Updated 7 days ago • 8 • 2
nesaorg/animatediff-base

Text-to-Video • Updated Jun 22 • 164
4Real-Video-V2: Fused View-Time Attention and Feedforward Reconstruction for 4D Scene Generation

Paper • 2506.18839 • Published Jun 18 • 11

VideoPrism: A Foundational Visual Encoder for Video Understanding

Paper • 2402.13217 • Published Feb 20, 2024 • 37

Video Understanding, Video Embedding, Video Tasks

Video as the New Language for Real-World Decision Making

Paper • 2402.17139 • Published Feb 27, 2024 • 22
VideoPrism: A Foundational Visual Encoder for Video Understanding

Paper • 2402.13217 • Published Feb 20, 2024 • 37
World Model on Million-Length Video And Language With RingAttention

Paper • 2402.08268 • Published Feb 13, 2024 • 41
microsoft/xclip-base-patch16-zero-shot

Video Classification • 0.2B • Updated Sep 12, 2023 • 1.6k • 26

VideoPrism: A Foundational Visual Encoder for Video Understanding

Paper • 2402.13217 • Published Feb 20, 2024 • 37
No Time to Waste: Squeeze Time into Channel for Mobile Video Understanding

Paper • 2405.08344 • Published May 14, 2024 • 16

about 17 hours ago

EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters

Paper • 2402.04252 • Published Feb 6, 2024 • 29
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models

Paper • 2402.03749 • Published Feb 6, 2024 • 13
ScreenAI: A Vision-Language Model for UI and Infographics Understanding

Paper • 2402.04615 • Published Feb 7, 2024 • 44
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss

Paper • 2402.05008 • Published Feb 7, 2024 • 23

VideoPrism is a foundational video encoder that enables state-of-the-art performance on a large variety of video understanding tasks.

VideoPrism: A Foundational Visual Encoder for Video Understanding

Paper • 2402.13217 • Published Feb 20, 2024 • 37
google/videoprism-base-f16r288

Video Classification • Updated 22 days ago • 28.4k • 82
google/videoprism-large-f8r288

Video Classification • Updated 22 days ago • 5.33k • 16
google/videoprism-lvt-base-f16r288

Video Classification • Updated 22 days ago • 26.4k • 8

AI Paper of the Day

A collection of papers that I think are interesting, one added each day

about 2 hours ago

Can Large Language Models Understand Context?

Paper • 2402.00858 • Published Feb 1, 2024 • 24
OLMo: Accelerating the Science of Language Models

Paper • 2402.00838 • Published Feb 1, 2024 • 84
Self-Rewarding Language Models

Paper • 2401.10020 • Published Jan 18, 2024 • 152
SemScore: Automated Evaluation of Instruction-Tuned LLMs based on Semantic Textual Similarity

Paper • 2401.17072 • Published Jan 30, 2024 • 24

vrgamedevgirl84/Wan14BT2VFusioniX

Text-to-Video • Updated Jun 21 • 534
TheStageAI/Elastic-mochi-1-preview

Text-to-Video • Updated 7 days ago • 8 • 2
nesaorg/animatediff-base

Text-to-Video • Updated Jun 22 • 164
4Real-Video-V2: Fused View-Time Attention and Feedforward Reconstruction for 4D Scene Generation

Paper • 2506.18839 • Published Jun 18 • 11

VideoPrism: A Foundational Visual Encoder for Video Understanding

Paper • 2402.13217 • Published Feb 20, 2024 • 37

VideoPrism: A Foundational Visual Encoder for Video Understanding

Paper • 2402.13217 • Published Feb 20, 2024 • 37

VideoAgent: Long-form Video Understanding with Large Language Model as Agent

Paper • 2403.10517 • Published Mar 15, 2024 • 38
VideoAgent: A Memory-augmented Multimodal Agent for Video Understanding

Paper • 2403.11481 • Published Mar 18, 2024 • 13
VideoMamba: State Space Model for Efficient Video Understanding

Paper • 2403.06977 • Published Mar 11, 2024 • 31
MovieLLM: Enhancing Long Video Understanding with AI-Generated Movies

Paper • 2403.01422 • Published Mar 3, 2024 • 30

Video Understanding, Video Embedding, Video Tasks

Video as the New Language for Real-World Decision Making

Paper • 2402.17139 • Published Feb 27, 2024 • 22
VideoPrism: A Foundational Visual Encoder for Video Understanding

Paper • 2402.13217 • Published Feb 20, 2024 • 37
World Model on Million-Length Video And Language With RingAttention

Paper • 2402.08268 • Published Feb 13, 2024 • 41
microsoft/xclip-base-patch16-zero-shot

Video Classification • 0.2B • Updated Sep 12, 2023 • 1.6k • 26

VideoPrism: A Foundational Visual Encoder for Video Understanding

Paper • 2402.13217 • Published Feb 20, 2024 • 37
World Model on Million-Length Video And Language With RingAttention

Paper • 2402.08268 • Published Feb 13, 2024 • 41
microsoft/xclip-base-patch16-zero-shot

Video Classification • 0.2B • Updated Sep 12, 2023 • 1.6k • 26
MCG-NJU/videomae-base

Video Classification • 0.1B • Updated Mar 29, 2024 • 83.7k • 47

VideoPrism: A Foundational Visual Encoder for Video Understanding

Paper • 2402.13217 • Published Feb 20, 2024 • 37
No Time to Waste: Squeeze Time into Channel for Mobile Video Understanding

Paper • 2405.08344 • Published May 14, 2024 • 16

Previous
1
2
Next

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs