arxiv:2208.03898

DialogSum Challenge: Results of the Dialogue Summarization Shared Task

Published on Aug 8, 2022

Upvote

Authors:

Yulong Chen ,

Naihao Deng ,

Abstract

The shared task in INLG 2022 on summarizing real-life dialogues shows improved automatic metrics for dialogue summarization but reveals significant gaps between model outputs and human summaries, suggesting the need for better evaluation metrics.

AI-generated summary

We report the results of DialogSum Challenge, the shared task on summarizing real-life scenario dialogues at INLG 2022. Four teams participate in this shared task and three submit their system reports, exploring different methods to improve the performance of dialogue summarization. Although there is a great improvement over the baseline models regarding automatic evaluation metrics, such as Rouge scores, we find that there is a salient gap between model generated outputs and human annotated summaries by human evaluation from multiple aspects. These findings demonstrate the difficulty of dialogue summarization and suggest that more fine-grained evaluatuion metrics are in need.

View arXiv page View PDF Add to collection

Community

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2208.03898 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2208.03898 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2208.03898 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.