-
Human-like Episodic Memory for Infinite Context LLMs
Paper • 2407.09450 • Published • 63 -
MUSCLE: A Model Update Strategy for Compatible LLM Evolution
Paper • 2407.09435 • Published • 23 -
Refuse Whenever You Feel Unsafe: Improving Safety in LLMs via Decoupled Refusal Training
Paper • 2407.09121 • Published • 6 -
ChatQA 2: Bridging the Gap to Proprietary LLMs in Long Context and RAG Capabilities
Paper • 2407.14482 • Published • 27
Collections
Discover the best community collections!
Collections including paper arxiv:2409.03752
-
RLHF Workflow: From Reward Modeling to Online RLHF
Paper • 2405.07863 • Published • 72 -
Chameleon: Mixed-Modal Early-Fusion Foundation Models
Paper • 2405.09818 • Published • 131 -
Meteor: Mamba-based Traversal of Rationale for Large Language and Vision Models
Paper • 2405.15574 • Published • 56 -
An Introduction to Vision-Language Modeling
Paper • 2405.17247 • Published • 90
-
Can large language models explore in-context?
Paper • 2403.15371 • Published • 34 -
Advancing LLM Reasoning Generalists with Preference Trees
Paper • 2404.02078 • Published • 47 -
Long-context LLMs Struggle with Long In-context Learning
Paper • 2404.02060 • Published • 38 -
Direct Nash Optimization: Teaching Language Models to Self-Improve with General Preferences
Paper • 2404.03715 • Published • 62
-
Linear Transformers with Learnable Kernel Functions are Better In-Context Models
Paper • 2402.10644 • Published • 82 -
GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints
Paper • 2305.13245 • Published • 6 -
ChunkAttention: Efficient Self-Attention with Prefix-Aware KV Cache and Two-Phase Partition
Paper • 2402.15220 • Published • 23 -
Sequence Parallelism: Long Sequence Training from System Perspective
Paper • 2105.13120 • Published • 5
-
Self-Rewarding Language Models
Paper • 2401.10020 • Published • 152 -
Orion-14B: Open-source Multilingual Large Language Models
Paper • 2401.12246 • Published • 14 -
MambaByte: Token-free Selective State Space Model
Paper • 2401.13660 • Published • 61 -
MM-LLMs: Recent Advances in MultiModal Large Language Models
Paper • 2401.13601 • Published • 49
-
A Systematic Survey of Text Summarization: From Statistical Methods to Large Language Models
Paper • 2406.11289 • Published • 5 -
Best Practices and Lessons Learned on Synthetic Data for Language Models
Paper • 2404.07503 • Published • 32 -
Spectra: A Comprehensive Study of Ternary, Quantized, and FP16 Language Models
Paper • 2407.12327 • Published • 80 -
Authorship Attribution in the Era of LLMs: Problems, Methodologies, and Challenges
Paper • 2408.08946 • Published • 12
-
Compression Represents Intelligence Linearly
Paper • 2404.09937 • Published • 29 -
MiniCPM: Unveiling the Potential of Small Language Models with Scalable Training Strategies
Paper • 2404.06395 • Published • 23 -
Long-context LLMs Struggle with Long In-context Learning
Paper • 2404.02060 • Published • 38 -
Are large language models superhuman chemists?
Paper • 2404.01475 • Published • 19
-
Unlocking the conversion of Web Screenshots into HTML Code with the WebSight Dataset
Paper • 2403.09029 • Published • 56 -
LLMLingua-2: Data Distillation for Efficient and Faithful Task-Agnostic Prompt Compression
Paper • 2403.12968 • Published • 26 -
RAFT: Adapting Language Model to Domain Specific RAG
Paper • 2403.10131 • Published • 73 -
Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking
Paper • 2403.09629 • Published • 79
-
Specialized Language Models with Cheap Inference from Limited Domain Data
Paper • 2402.01093 • Published • 48 -
Attention Heads of Large Language Models: A Survey
Paper • 2409.03752 • Published • 93 -
General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model
Paper • 2409.01704 • Published • 84 -
jina-embeddings-v3: Multilingual Embeddings With Task LoRA
Paper • 2409.10173 • Published • 35
-
UFOGen: You Forward Once Large Scale Text-to-Image Generation via Diffusion GANs
Paper • 2311.09257 • Published • 48 -
VideoPoet: A Large Language Model for Zero-Shot Video Generation
Paper • 2312.14125 • Published • 47 -
TinyGPT-V: Efficient Multimodal Large Language Model via Small Backbones
Paper • 2312.16862 • Published • 31 -
VideoDrafter: Content-Consistent Multi-Scene Video Generation with LLM
Paper • 2401.01256 • Published • 22
-
Human-like Episodic Memory for Infinite Context LLMs
Paper • 2407.09450 • Published • 63 -
MUSCLE: A Model Update Strategy for Compatible LLM Evolution
Paper • 2407.09435 • Published • 23 -
Refuse Whenever You Feel Unsafe: Improving Safety in LLMs via Decoupled Refusal Training
Paper • 2407.09121 • Published • 6 -
ChatQA 2: Bridging the Gap to Proprietary LLMs in Long Context and RAG Capabilities
Paper • 2407.14482 • Published • 27
-
A Systematic Survey of Text Summarization: From Statistical Methods to Large Language Models
Paper • 2406.11289 • Published • 5 -
Best Practices and Lessons Learned on Synthetic Data for Language Models
Paper • 2404.07503 • Published • 32 -
Spectra: A Comprehensive Study of Ternary, Quantized, and FP16 Language Models
Paper • 2407.12327 • Published • 80 -
Authorship Attribution in the Era of LLMs: Problems, Methodologies, and Challenges
Paper • 2408.08946 • Published • 12
-
RLHF Workflow: From Reward Modeling to Online RLHF
Paper • 2405.07863 • Published • 72 -
Chameleon: Mixed-Modal Early-Fusion Foundation Models
Paper • 2405.09818 • Published • 131 -
Meteor: Mamba-based Traversal of Rationale for Large Language and Vision Models
Paper • 2405.15574 • Published • 56 -
An Introduction to Vision-Language Modeling
Paper • 2405.17247 • Published • 90
-
Compression Represents Intelligence Linearly
Paper • 2404.09937 • Published • 29 -
MiniCPM: Unveiling the Potential of Small Language Models with Scalable Training Strategies
Paper • 2404.06395 • Published • 23 -
Long-context LLMs Struggle with Long In-context Learning
Paper • 2404.02060 • Published • 38 -
Are large language models superhuman chemists?
Paper • 2404.01475 • Published • 19
-
Can large language models explore in-context?
Paper • 2403.15371 • Published • 34 -
Advancing LLM Reasoning Generalists with Preference Trees
Paper • 2404.02078 • Published • 47 -
Long-context LLMs Struggle with Long In-context Learning
Paper • 2404.02060 • Published • 38 -
Direct Nash Optimization: Teaching Language Models to Self-Improve with General Preferences
Paper • 2404.03715 • Published • 62
-
Unlocking the conversion of Web Screenshots into HTML Code with the WebSight Dataset
Paper • 2403.09029 • Published • 56 -
LLMLingua-2: Data Distillation for Efficient and Faithful Task-Agnostic Prompt Compression
Paper • 2403.12968 • Published • 26 -
RAFT: Adapting Language Model to Domain Specific RAG
Paper • 2403.10131 • Published • 73 -
Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking
Paper • 2403.09629 • Published • 79
-
Linear Transformers with Learnable Kernel Functions are Better In-Context Models
Paper • 2402.10644 • Published • 82 -
GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints
Paper • 2305.13245 • Published • 6 -
ChunkAttention: Efficient Self-Attention with Prefix-Aware KV Cache and Two-Phase Partition
Paper • 2402.15220 • Published • 23 -
Sequence Parallelism: Long Sequence Training from System Perspective
Paper • 2105.13120 • Published • 5
-
Specialized Language Models with Cheap Inference from Limited Domain Data
Paper • 2402.01093 • Published • 48 -
Attention Heads of Large Language Models: A Survey
Paper • 2409.03752 • Published • 93 -
General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model
Paper • 2409.01704 • Published • 84 -
jina-embeddings-v3: Multilingual Embeddings With Task LoRA
Paper • 2409.10173 • Published • 35
-
Self-Rewarding Language Models
Paper • 2401.10020 • Published • 152 -
Orion-14B: Open-source Multilingual Large Language Models
Paper • 2401.12246 • Published • 14 -
MambaByte: Token-free Selective State Space Model
Paper • 2401.13660 • Published • 61 -
MM-LLMs: Recent Advances in MultiModal Large Language Models
Paper • 2401.13601 • Published • 49
-
UFOGen: You Forward Once Large Scale Text-to-Image Generation via Diffusion GANs
Paper • 2311.09257 • Published • 48 -
VideoPoet: A Large Language Model for Zero-Shot Video Generation
Paper • 2312.14125 • Published • 47 -
TinyGPT-V: Efficient Multimodal Large Language Model via Small Backbones
Paper • 2312.16862 • Published • 31 -
VideoDrafter: Content-Consistent Multi-Scene Video Generation with LLM
Paper • 2401.01256 • Published • 22