arxiv:2005.12592

GECToR -- Grammatical Error Correction: Tag, Not Rewrite

Published on May 26, 2020

Authors:

Abstract

A simple and efficient GEC sequence tagger using a Transformer encoder achieves high performance with fast inference speed and is pre-trained and fine-tuned on errorful and error-free corpora.

AI-generated summary

In this paper, we present a simple and efficient GEC sequence tagger using a Transformer encoder. Our system is pre-trained on synthetic data and then fine-tuned in two stages: first on errorful corpora, and second on a combination of errorful and error-free parallel corpora. We design custom token-level transformations to map input tokens to target corrections. Our best single-model/ensemble GEC tagger achieves an F_{0.5} of 65.3/66.5 on CoNLL-2014 (test) and F_{0.5} of 72.4/73.6 on BEA-2019 (test). Its inference speed is up to 10 times as fast as a Transformer-based seq2seq GEC system. The code and trained models are publicly available.

View arXiv page View PDF Add to collection

Community

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2005.12592 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2005.12592 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2005.12592 in a Space README.md to link it from this page.