Papers
arxiv:2203.05794

BERTopic: Neural topic modeling with a class-based TF-IDF procedure

Published on Mar 11, 2022

Abstract

BERTopic extends topic modeling by combining pre-trained transformer embeddings and class-based TF-IDF to generate coherent topics, outperforming both classical and clustering-based models.

AI-generated summary

Topic models can be useful tools to discover latent topics in collections of documents. Recent studies have shown the feasibility of approach topic modeling as a clustering task. We present BERTopic, a topic model that extends this process by extracting coherent topic representation through the development of a class-based variation of TF-IDF. More specifically, BERTopic generates document embedding with pre-trained transformer-based language models, clusters these embeddings, and finally, generates topic representations with the class-based TF-IDF procedure. BERTopic generates coherent topics and remains competitive across a variety of benchmarks involving classical models and those that follow the more recent clustering approach of topic modeling.

Community

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2203.05794 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2203.05794 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2203.05794 in a Space README.md to link it from this page.

Collections including this paper 2