-
A New Federated Learning Framework Against Gradient Inversion Attacks
Paper • 2412.07187 • Published • 3 -
Selective Aggregation for Low-Rank Adaptation in Federated Learning
Paper • 2410.01463 • Published • 19 -
Exploring Federated Pruning for Large Language Models
Paper • 2505.13547 • Published • 13 -
It's All Connected: A Journey Through Test-Time Memorization, Attentional Bias, Retention, and Online Optimization
Paper • 2504.13173 • Published • 19
Collections
Discover the best community collections!
Collections including paper arxiv:2501.00663
-
Internal Consistency and Self-Feedback in Large Language Models: A Survey
Paper • 2407.14507 • Published • 47 -
Large Language Models are Zero-Shot Reasoners
Paper • 2205.11916 • Published • 1 -
Let's Verify Step by Step
Paper • 2305.20050 • Published • 10 -
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Paper • 2201.11903 • Published • 13
-
Unraveling the Complexity of Memory in RL Agents: an Approach for Classification and Evaluation
Paper • 2412.06531 • Published • 73 -
Memory Gym: Towards Endless Tasks to Benchmark Memory Capabilities of Agents
Paper • 2309.17207 • Published -
Titans: Learning to Memorize at Test Time
Paper • 2501.00663 • Published • 24 -
Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention
Paper • 2502.11089 • Published • 159
-
LM2: Large Memory Models
Paper • 2502.06049 • Published • 30 -
Titans: Learning to Memorize at Test Time
Paper • 2501.00663 • Published • 24 -
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training
Paper • 2501.17161 • Published • 122 -
You Do Not Fully Utilize Transformer's Representation Capacity
Paper • 2502.09245 • Published • 38
-
DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model
Paper • 2405.04434 • Published • 21 -
Titans: Learning to Memorize at Test Time
Paper • 2501.00663 • Published • 24 -
Transformer^2: Self-adaptive LLMs
Paper • 2501.06252 • Published • 55 -
Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention
Paper • 2502.11089 • Published • 159
-
Let's Verify Step by Step
Paper • 2305.20050 • Published • 10 -
LLM Critics Help Catch LLM Bugs
Paper • 2407.00215 • Published -
Large Language Monkeys: Scaling Inference Compute with Repeated Sampling
Paper • 2407.21787 • Published • 13 -
Generative Verifiers: Reward Modeling as Next-Token Prediction
Paper • 2408.15240 • Published • 13
-
STaR: Bootstrapping Reasoning With Reasoning
Paper • 2203.14465 • Published • 8 -
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models
Paper • 2401.06066 • Published • 55 -
DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model
Paper • 2405.04434 • Published • 21 -
Prompt Cache: Modular Attention Reuse for Low-Latency Inference
Paper • 2311.04934 • Published • 34