DLP: Dynamic Layerwise Pruning in Large Language Models
Abstract
A dynamic layerwise pruning method adaptively determines layer importance by combining model weights and activation information to maintain performance in large language models at high sparsity.
Pruning has recently been widely adopted to reduce the parameter scale and improve the inference efficiency of Large Language Models (LLMs). Mainstream pruning techniques often rely on uniform layerwise pruning strategies, which can lead to severe performance degradation at high sparsity levels. Recognizing the varying contributions of different layers in LLMs, recent studies have shifted their focus toward non-uniform layerwise pruning. However, these approaches often rely on pre-defined values, which can result in suboptimal performance. To overcome these limitations, we propose a novel method called Dynamic Layerwise Pruning (DLP). This approach adaptively determines the relative importance of each layer by integrating model weights with input activation information, assigning pruning rates accordingly. Experimental results show that DLP effectively preserves model performance at high sparsity levels across multiple LLMs. Specifically, at 70% sparsity, DLP reduces the perplexity of LLaMA2-7B by 7.79 and improves the average accuracy by 2.7% compared to state-of-the-art methods. Moreover, DLP is compatible with various existing LLM compression techniques and can be seamlessly integrated into Parameter-Efficient Fine-Tuning (PEFT). We release the code at https://github.com/ironartisan/DLP to facilitate future research.
Community
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Efficient Shapley Value-based Non-Uniform Pruning of Large Language Models (2025)
- SlimLLM: Accurate Structured Pruning for Large Language Models (2025)
- ELDeR: Getting Efficient LLMs through Data-Driven Regularized Layer-wise Pruning (2025)
- ACE: Exploring Activation Cosine Similarity and Variance for Accurate and Calibration-Efficient LLM Pruning (2025)
- TAMP: Token-Adaptive Layerwise Pruning in Multimodal Large Language Models (2025)
- TRIM: Achieving Extreme Sparsity with Targeted Row-wise Iterative Metric-driven Pruning (2025)
- Mosaic: Composite Projection Pruning for Resource-efficient LLMs (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper