Papers
arxiv:1906.08101

Pre-Training with Whole Word Masking for Chinese BERT

Published on Jun 19, 2019
Authors:
,
,
,
,

Abstract

MacBERT, a model with a new MLM as correction strategy, achieves state-of-the-art performance on various Chinese NLP tasks compared to existing models like BERT and RoBERTa.

AI-generated summary

Bidirectional Encoder Representations from Transformers (BERT) has shown marvelous improvements across various NLP tasks, and its consecutive variants have been proposed to further improve the performance of the pre-trained language models. In this paper, we aim to first introduce the whole word masking (wwm) strategy for Chinese BERT, along with a series of Chinese pre-trained language models. Then we also propose a simple but effective model called MacBERT, which improves upon RoBERTa in several ways. Especially, we propose a new masking strategy called MLM as correction (Mac). To demonstrate the effectiveness of these models, we create a series of Chinese pre-trained language models as our baselines, including BERT, RoBERTa, ELECTRA, RBT, etc. We carried out extensive experiments on ten Chinese NLP tasks to evaluate the created Chinese pre-trained language models as well as the proposed MacBERT. Experimental results show that MacBERT could achieve state-of-the-art performances on many NLP tasks, and we also ablate details with several findings that may help future research. We open-source our pre-trained language models for further facilitating our research community. Resources are available: https://github.com/ymcui/Chinese-BERT-wwm

Community

Sign up or log in to comment

Models citing this paper 11

Browse 11 models citing this paper

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/1906.08101 in a dataset README.md to link it from this page.

Spaces citing this paper 262

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.