-
EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters
Paper • 2402.04252 • Published • 29 -
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models
Paper • 2402.03749 • Published • 13 -
ScreenAI: A Vision-Language Model for UI and Infographics Understanding
Paper • 2402.04615 • Published • 44 -
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss
Paper • 2402.05008 • Published • 23
Collections
Discover the best community collections!
Collections including paper arxiv:2502.07864
-
CineMaster: A 3D-Aware and Controllable Framework for Cinematic Text-to-Video Generation
Paper • 2502.08639 • Published • 43 -
TransMLA: Multi-head Latent Attention Is All You Need
Paper • 2502.07864 • Published • 57 -
Next Block Prediction: Video Generation via Semi-Autoregressive Modeling
Paper • 2502.07737 • Published • 9 -
Enhance-A-Video: Better Generated Video for Free
Paper • 2502.07508 • Published • 21
-
MLLM-as-a-Judge for Image Safety without Human Labeling
Paper • 2501.00192 • Published • 32 -
2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining
Paper • 2501.00958 • Published • 107 -
Xmodel-2 Technical Report
Paper • 2412.19638 • Published • 27 -
HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs
Paper • 2412.18925 • Published • 105
-
Byte Latent Transformer: Patches Scale Better Than Tokens
Paper • 2412.09871 • Published • 109 -
Causal Diffusion Transformers for Generative Modeling
Paper • 2412.12095 • Published • 23 -
Tensor Product Attention Is All You Need
Paper • 2501.06425 • Published • 89 -
TransMLA: Multi-head Latent Attention Is All You Need
Paper • 2502.07864 • Published • 57
-
Tensor Product Attention Is All You Need
Paper • 2501.06425 • Published • 89 -
TransMLA: Multi-head Latent Attention Is All You Need
Paper • 2502.07864 • Published • 57 -
Union of Experts: Adapting Hierarchical Routing to Equivalently Decomposed Transformer
Paper • 2503.02495 • Published • 8 -
BitNet v2: Native 4-bit Activations with Hadamard Transformation for 1-bit LLMs
Paper • 2504.18415 • Published • 47
-
deepseek-ai/DeepSeek-V3-Base
685B • Updated • 11k • 1.67k -
TransMLA: Multi-head Latent Attention Is All You Need
Paper • 2502.07864 • Published • 57 -
2
Qwen2.5 Bakeneko 32b Instruct Awq
⚡Generate text-based responses for chat interactions
-
2
Deepseek R1 Distill Qwen2.5 Bakeneko 32b Awq
⚡Generate detailed responses based on user queries
-
Selective Attention Improves Transformer
Paper • 2410.02703 • Published • 24 -
Differential Transformer
Paper • 2410.05258 • Published • 180 -
TidalDecode: Fast and Accurate LLM Decoding with Position Persistent Sparse Attention
Paper • 2410.05076 • Published • 8 -
SeerAttention: Learning Intrinsic Sparse Attention in Your LLMs
Paper • 2410.13276 • Published • 30
-
EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters
Paper • 2402.04252 • Published • 29 -
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models
Paper • 2402.03749 • Published • 13 -
ScreenAI: A Vision-Language Model for UI and Infographics Understanding
Paper • 2402.04615 • Published • 44 -
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss
Paper • 2402.05008 • Published • 23
-
CineMaster: A 3D-Aware and Controllable Framework for Cinematic Text-to-Video Generation
Paper • 2502.08639 • Published • 43 -
TransMLA: Multi-head Latent Attention Is All You Need
Paper • 2502.07864 • Published • 57 -
Next Block Prediction: Video Generation via Semi-Autoregressive Modeling
Paper • 2502.07737 • Published • 9 -
Enhance-A-Video: Better Generated Video for Free
Paper • 2502.07508 • Published • 21
-
Tensor Product Attention Is All You Need
Paper • 2501.06425 • Published • 89 -
TransMLA: Multi-head Latent Attention Is All You Need
Paper • 2502.07864 • Published • 57 -
Union of Experts: Adapting Hierarchical Routing to Equivalently Decomposed Transformer
Paper • 2503.02495 • Published • 8 -
BitNet v2: Native 4-bit Activations with Hadamard Transformation for 1-bit LLMs
Paper • 2504.18415 • Published • 47
-
MLLM-as-a-Judge for Image Safety without Human Labeling
Paper • 2501.00192 • Published • 32 -
2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining
Paper • 2501.00958 • Published • 107 -
Xmodel-2 Technical Report
Paper • 2412.19638 • Published • 27 -
HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs
Paper • 2412.18925 • Published • 105
-
deepseek-ai/DeepSeek-V3-Base
685B • Updated • 11k • 1.67k -
TransMLA: Multi-head Latent Attention Is All You Need
Paper • 2502.07864 • Published • 57 -
2
Qwen2.5 Bakeneko 32b Instruct Awq
⚡Generate text-based responses for chat interactions
-
2
Deepseek R1 Distill Qwen2.5 Bakeneko 32b Awq
⚡Generate detailed responses based on user queries
-
Byte Latent Transformer: Patches Scale Better Than Tokens
Paper • 2412.09871 • Published • 109 -
Causal Diffusion Transformers for Generative Modeling
Paper • 2412.12095 • Published • 23 -
Tensor Product Attention Is All You Need
Paper • 2501.06425 • Published • 89 -
TransMLA: Multi-head Latent Attention Is All You Need
Paper • 2502.07864 • Published • 57
-
Selective Attention Improves Transformer
Paper • 2410.02703 • Published • 24 -
Differential Transformer
Paper • 2410.05258 • Published • 180 -
TidalDecode: Fast and Accurate LLM Decoding with Position Persistent Sparse Attention
Paper • 2410.05076 • Published • 8 -
SeerAttention: Learning Intrinsic Sparse Attention in Your LLMs
Paper • 2410.13276 • Published • 30