Fast Vocabulary Transfer for Language Model Compression
Abstract
Vocabulary transfer combined with other compression techniques reduces model size and inference time for language models with minimal impact on performance across various domains and tasks.
Real-world business applications require a trade-off between language model performance and size. We propose a new method for model compression that relies on vocabulary transfer. We evaluate the method on various vertical domains and downstream tasks. Our results indicate that vocabulary transfer can be effectively used in combination with other compression techniques, yielding a significant reduction in model size and inference time while marginally compromising on performance.
Community
Amazing idea!
Thanks, if you are interested you can also look at the code here: https://github.com/LeonidasY/fast-vocabulary-transfer
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper