-
Video Creation by Demonstration
Paper • 2412.09551 • Published • 9 -
DiffSensei: Bridging Multi-Modal LLMs and Diffusion Models for Customized Manga Generation
Paper • 2412.07589 • Published • 49 -
Unraveling the Complexity of Memory in RL Agents: an Approach for Classification and Evaluation
Paper • 2412.06531 • Published • 73 -
APOLLO: SGD-like Memory, AdamW-level Performance
Paper • 2412.05270 • Published • 39
Collections
Discover the best community collections!
Collections including paper arxiv:2410.17243
-
Breaking the Memory Barrier: Near Infinite Batch Size Scaling for Contrastive Loss
Paper • 2410.17243 • Published • 95 -
AnimateAnything: Consistent and Controllable Animation for Video Generation
Paper • 2411.10836 • Published • 25 -
LLaVA-o1: Let Vision Language Models Reason Step-by-Step
Paper • 2411.10440 • Published • 128 -
MagicQuill: An Intelligent Interactive Image Editing System
Paper • 2411.09703 • Published • 79
-
HtmlRAG: HTML is Better Than Plain Text for Modeling Retrieved Knowledge in RAG Systems
Paper • 2411.02959 • Published • 71 -
"Give Me BF16 or Give Me Death"? Accuracy-Performance Trade-Offs in LLM Quantization
Paper • 2411.02355 • Published • 52 -
CORAL: Benchmarking Multi-turn Conversational Retrieval-Augmentation Generation
Paper • 2410.23090 • Published • 56 -
RARe: Retrieval Augmented Retrieval with In-Context Examples
Paper • 2410.20088 • Published • 5
-
Breaking the Memory Barrier: Near Infinite Batch Size Scaling for Contrastive Loss
Paper • 2410.17243 • Published • 95 -
StyleMaster: Stylize Your Video with Artistic Generation and Translation
Paper • 2412.07744 • Published • 20 -
Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference
Paper • 2412.13663 • Published • 154
-
MiniPLM: Knowledge Distillation for Pre-Training Language Models
Paper • 2410.17215 • Published • 17 -
LOGO -- Long cOntext aliGnment via efficient preference Optimization
Paper • 2410.18533 • Published • 44 -
Breaking the Memory Barrier: Near Infinite Batch Size Scaling for Contrastive Loss
Paper • 2410.17243 • Published • 95 -
LongReward: Improving Long-context Large Language Models with AI Feedback
Paper • 2410.21252 • Published • 18
-
What Matters in Transformers? Not All Attention is Needed
Paper • 2406.15786 • Published • 32 -
Breaking the Memory Barrier: Near Infinite Batch Size Scaling for Contrastive Loss
Paper • 2410.17243 • Published • 95 -
Forgetting Transformer: Softmax Attention with a Forget Gate
Paper • 2503.02130 • Published • 32 -
Transformers without Normalization
Paper • 2503.10622 • Published • 168
-
Video Creation by Demonstration
Paper • 2412.09551 • Published • 9 -
DiffSensei: Bridging Multi-Modal LLMs and Diffusion Models for Customized Manga Generation
Paper • 2412.07589 • Published • 49 -
Unraveling the Complexity of Memory in RL Agents: an Approach for Classification and Evaluation
Paper • 2412.06531 • Published • 73 -
APOLLO: SGD-like Memory, AdamW-level Performance
Paper • 2412.05270 • Published • 39
-
HtmlRAG: HTML is Better Than Plain Text for Modeling Retrieved Knowledge in RAG Systems
Paper • 2411.02959 • Published • 71 -
"Give Me BF16 or Give Me Death"? Accuracy-Performance Trade-Offs in LLM Quantization
Paper • 2411.02355 • Published • 52 -
CORAL: Benchmarking Multi-turn Conversational Retrieval-Augmentation Generation
Paper • 2410.23090 • Published • 56 -
RARe: Retrieval Augmented Retrieval with In-Context Examples
Paper • 2410.20088 • Published • 5
-
Breaking the Memory Barrier: Near Infinite Batch Size Scaling for Contrastive Loss
Paper • 2410.17243 • Published • 95 -
StyleMaster: Stylize Your Video with Artistic Generation and Translation
Paper • 2412.07744 • Published • 20 -
Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference
Paper • 2412.13663 • Published • 154
-
Breaking the Memory Barrier: Near Infinite Batch Size Scaling for Contrastive Loss
Paper • 2410.17243 • Published • 95 -
AnimateAnything: Consistent and Controllable Animation for Video Generation
Paper • 2411.10836 • Published • 25 -
LLaVA-o1: Let Vision Language Models Reason Step-by-Step
Paper • 2411.10440 • Published • 128 -
MagicQuill: An Intelligent Interactive Image Editing System
Paper • 2411.09703 • Published • 79
-
MiniPLM: Knowledge Distillation for Pre-Training Language Models
Paper • 2410.17215 • Published • 17 -
LOGO -- Long cOntext aliGnment via efficient preference Optimization
Paper • 2410.18533 • Published • 44 -
Breaking the Memory Barrier: Near Infinite Batch Size Scaling for Contrastive Loss
Paper • 2410.17243 • Published • 95 -
LongReward: Improving Long-context Large Language Models with AI Feedback
Paper • 2410.21252 • Published • 18
-
What Matters in Transformers? Not All Attention is Needed
Paper • 2406.15786 • Published • 32 -
Breaking the Memory Barrier: Near Infinite Batch Size Scaling for Contrastive Loss
Paper • 2410.17243 • Published • 95 -
Forgetting Transformer: Softmax Attention with a Forget Gate
Paper • 2503.02130 • Published • 32 -
Transformers without Normalization
Paper • 2503.10622 • Published • 168