arxiv:2311.09122

Universal NER: A Gold-Standard Multilingual Named Entity Recognition Benchmark

Published on Nov 15, 2023

Upvote

Authors:

Hila Gonen ,

Joseph Marvin Imperial ,

Börje F. Karlsson ,

Nikola Ljubešić ,

LJ Miranda ,

Yuval Pinter

Abstract

An open project, Universal NER (UNER), aims to standardize multilingual named entity recognition through high-quality, cross-lingually consistent benchmarks and models.

AI-generated summary

We introduce Universal NER (UNER), an open, community-driven project to develop gold-standard NER benchmarks in many languages. The overarching goal of UNER is to provide high-quality, cross-lingually consistent annotations to facilitate and standardize multilingual NER research. UNER v1 contains 18 datasets annotated with named entities in a cross-lingual consistent schema across 12 diverse languages. In this paper, we detail the dataset creation and composition of UNER; we also provide initial modeling baselines on both in-language and cross-lingual learning settings. We release the data, code, and fitted models to the public.