Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2405.09798

about 18 hours ago

EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters

Paper • 2402.04252 • Published Feb 6, 2024 • 29
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models

Paper • 2402.03749 • Published Feb 6, 2024 • 13
ScreenAI: A Vision-Language Model for UI and Infographics Understanding

Paper • 2402.04615 • Published Feb 7, 2024 • 44
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss

Paper • 2402.05008 • Published Feb 7, 2024 • 23

Many-Shot In-Context Learning in Multimodal Foundation Models

Paper • 2405.09798 • Published May 16, 2024 • 33
From Sparse to Dense: GPT-4 Summarization with Chain of Density Prompting

Paper • 2309.04269 • Published Sep 8, 2023 • 33
Show, Don't Tell: Aligning Language Models with Demonstrated Feedback

Paper • 2406.00888 • Published Jun 2, 2024 • 34
To Believe or Not to Believe Your LLM

Paper • 2406.02543 • Published Jun 4, 2024 • 36

Many-Shot In-Context Learning in Multimodal Foundation Models

Paper • 2405.09798 • Published May 16, 2024 • 33

vision foundation modesl

vision foundation models

ViTAR: Vision Transformer with Any Resolution

Paper • 2403.18361 • Published Mar 27, 2024 • 56
BRAVE: Broadening the visual encoding of vision-language models

Paper • 2404.07204 • Published Apr 10, 2024 • 19
CatLIP: CLIP-level Visual Recognition Accuracy with 2.7x Faster Pre-training on Web-scale Image-Text Data

Paper • 2404.15653 • Published Apr 24, 2024 • 30
Chameleon: Mixed-Modal Early-Fusion Foundation Models

Paper • 2405.09818 • Published May 16, 2024 • 131

FAX: Scalable and Differentiable Federated Primitives in JAX

Paper • 2403.07128 • Published Mar 11, 2024 • 13
mPLUG-DocOwl 1.5: Unified Structure Learning for OCR-free Document Understanding

Paper • 2403.12895 • Published Mar 19, 2024 • 33
Measuring Style Similarity in Diffusion Models

Paper • 2404.01292 • Published Apr 1, 2024 • 17
JetMoE: Reaching Llama2 Performance with 0.1M Dollars

Paper • 2404.07413 • Published Apr 11, 2024 • 39

Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation

Paper • 2406.06525 • Published Jun 10, 2024 • 72
Husky: A Unified, Open-Source Language Agent for Multi-Step Reasoning

Paper • 2406.06469 • Published Jun 10, 2024 • 30
Buffer of Thoughts: Thought-Augmented Reasoning with Large Language Models

Paper • 2406.04271 • Published Jun 6, 2024 • 31
Block Transformer: Global-to-Local Language Modeling for Fast Inference

Paper • 2406.02657 • Published Jun 4, 2024 • 42

Enhancing multimodality

Chameleon: Mixed-Modal Early-Fusion Foundation Models

Paper • 2405.09818 • Published May 16, 2024 • 131
Many-Shot In-Context Learning in Multimodal Foundation Models

Paper • 2405.09798 • Published May 16, 2024 • 33

Many-Shot In-Context Learning in Multimodal Foundation Models

Paper • 2405.09798 • Published May 16, 2024 • 33

To read... eventually

A collection of papers that i have read or plan to read all in one place. Includes a wide range of topics.

MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training

Paper • 2403.09611 • Published Mar 14, 2024 • 129
Evolutionary Optimization of Model Merging Recipes

Paper • 2403.13187 • Published Mar 19, 2024 • 58
MobileVLM V2: Faster and Stronger Baseline for Vision Language Model

Paper • 2402.03766 • Published Feb 6, 2024 • 15
LLM Agent Operating System

Paper • 2403.16971 • Published Mar 25, 2024 • 71

Daily paper that is inspiring (abstract is enough)

World Model on Million-Length Video And Language With RingAttention

Paper • 2402.08268 • Published Feb 13, 2024 • 41
Improving Text Embeddings with Large Language Models

Paper • 2401.00368 • Published Dec 31, 2023 • 83
Chain-of-Thought Reasoning Without Prompting

Paper • 2402.10200 • Published Feb 15, 2024 • 110
FiT: Flexible Vision Transformer for Diffusion Model

Paper • 2402.12376 • Published Feb 19, 2024 • 49

about 18 hours ago

EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters

Paper • 2402.04252 • Published Feb 6, 2024 • 29
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models

Paper • 2402.03749 • Published Feb 6, 2024 • 13
ScreenAI: A Vision-Language Model for UI and Infographics Understanding

Paper • 2402.04615 • Published Feb 7, 2024 • 44
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss

Paper • 2402.05008 • Published Feb 7, 2024 • 23

Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation

Paper • 2406.06525 • Published Jun 10, 2024 • 72
Husky: A Unified, Open-Source Language Agent for Multi-Step Reasoning

Paper • 2406.06469 • Published Jun 10, 2024 • 30
Buffer of Thoughts: Thought-Augmented Reasoning with Large Language Models

Paper • 2406.04271 • Published Jun 6, 2024 • 31
Block Transformer: Global-to-Local Language Modeling for Fast Inference

Paper • 2406.02657 • Published Jun 4, 2024 • 42

Many-Shot In-Context Learning in Multimodal Foundation Models

Paper • 2405.09798 • Published May 16, 2024 • 33
From Sparse to Dense: GPT-4 Summarization with Chain of Density Prompting

Paper • 2309.04269 • Published Sep 8, 2023 • 33
Show, Don't Tell: Aligning Language Models with Demonstrated Feedback

Paper • 2406.00888 • Published Jun 2, 2024 • 34
To Believe or Not to Believe Your LLM

Paper • 2406.02543 • Published Jun 4, 2024 • 36

Enhancing multimodality

Chameleon: Mixed-Modal Early-Fusion Foundation Models

Paper • 2405.09818 • Published May 16, 2024 • 131
Many-Shot In-Context Learning in Multimodal Foundation Models

Paper • 2405.09798 • Published May 16, 2024 • 33

Many-Shot In-Context Learning in Multimodal Foundation Models

Paper • 2405.09798 • Published May 16, 2024 • 33

Many-Shot In-Context Learning in Multimodal Foundation Models

Paper • 2405.09798 • Published May 16, 2024 • 33

vision foundation modesl

vision foundation models

ViTAR: Vision Transformer with Any Resolution

Paper • 2403.18361 • Published Mar 27, 2024 • 56
BRAVE: Broadening the visual encoding of vision-language models

Paper • 2404.07204 • Published Apr 10, 2024 • 19
CatLIP: CLIP-level Visual Recognition Accuracy with 2.7x Faster Pre-training on Web-scale Image-Text Data

Paper • 2404.15653 • Published Apr 24, 2024 • 30
Chameleon: Mixed-Modal Early-Fusion Foundation Models

Paper • 2405.09818 • Published May 16, 2024 • 131

To read... eventually

A collection of papers that i have read or plan to read all in one place. Includes a wide range of topics.

MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training

Paper • 2403.09611 • Published Mar 14, 2024 • 129
Evolutionary Optimization of Model Merging Recipes

Paper • 2403.13187 • Published Mar 19, 2024 • 58
MobileVLM V2: Faster and Stronger Baseline for Vision Language Model

Paper • 2402.03766 • Published Feb 6, 2024 • 15
LLM Agent Operating System

Paper • 2403.16971 • Published Mar 25, 2024 • 71

FAX: Scalable and Differentiable Federated Primitives in JAX

Paper • 2403.07128 • Published Mar 11, 2024 • 13
mPLUG-DocOwl 1.5: Unified Structure Learning for OCR-free Document Understanding

Paper • 2403.12895 • Published Mar 19, 2024 • 33
Measuring Style Similarity in Diffusion Models

Paper • 2404.01292 • Published Apr 1, 2024 • 17
JetMoE: Reaching Llama2 Performance with 0.1M Dollars

Paper • 2404.07413 • Published Apr 11, 2024 • 39

Daily paper that is inspiring (abstract is enough)

World Model on Million-Length Video And Language With RingAttention

Paper • 2402.08268 • Published Feb 13, 2024 • 41
Improving Text Embeddings with Large Language Models

Paper • 2401.00368 • Published Dec 31, 2023 • 83
Chain-of-Thought Reasoning Without Prompting

Paper • 2402.10200 • Published Feb 15, 2024 • 110
FiT: Flexible Vision Transformer for Diffusion Model

Paper • 2402.12376 • Published Feb 19, 2024 • 49

Previous
1
2
Next

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs