Papers
arxiv:2304.04926

Life Regression based Patch Slimming for Vision Transformers

Published on Apr 11, 2023
Authors:
,
,
,
,
,

Abstract

A life regression module determines patch lifespan in vision transformers to improve inference speed without extra computation.

AI-generated summary

Vision transformers have achieved remarkable success in computer vision tasks by using multi-head self-attention modules to capture long-range dependencies within images. However, the high inference computation cost poses a new challenge. Several methods have been proposed to address this problem, mainly by slimming patches. In the inference stage, these methods classify patches into two classes, one to keep and the other to discard in multiple layers. This approach results in additional computation at every layer where patches are discarded, which hinders inference acceleration. In this study, we tackle the patch slimming problem from a different perspective by proposing a life regression module that determines the lifespan of each image patch in one go. During inference, the patch is discarded once the current layer index exceeds its life. Our proposed method avoids additional computation and parameters in multiple layers to enhance inference speed while maintaining competitive performance. Additionally, our approach requires fewer training epochs than other patch slimming methods.

Community

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2304.04926 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2304.04926 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2304.04926 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.