- Chapter Captor: Text Segmentation in Novels Books are typically segmented into chapters and sections, representing coherent subnarratives and topics. We investigate the task of predicting chapter boundaries, as a proxy for the general task of segmenting long texts. We build a Project Gutenberg chapter segmentation data set of 9,126 English novels, using a hybrid approach combining neural inference and rule matching to recognize chapter title headers in books, achieving an F1-score of 0.77 on this task. Using this annotated data as ground truth after removing structural cues, we present cut-based and neural methods for chapter segmentation, achieving an F1-score of 0.453 on the challenging task of exact break prediction over book-length documents. Finally, we reveal interesting historical trends in the chapter structure of novels. 3 authors · Nov 8, 2020
- BiPaR: A Bilingual Parallel Dataset for Multilingual and Cross-lingual Reading Comprehension on Novels This paper presents BiPaR, a bilingual parallel novel-style machine reading comprehension (MRC) dataset, developed to support multilingual and cross-lingual reading comprehension. The biggest difference between BiPaR and existing reading comprehension datasets is that each triple (Passage, Question, Answer) in BiPaR is written parallelly in two languages. We collect 3,667 bilingual parallel paragraphs from Chinese and English novels, from which we construct 14,668 parallel question-answer pairs via crowdsourced workers following a strict quality control procedure. We analyze BiPaR in depth and find that BiPaR offers good diversification in prefixes of questions, answer types and relationships between questions and passages. We also observe that answering questions of novels requires reading comprehension skills of coreference resolution, multi-sentence reasoning, and understanding of implicit causality, etc. With BiPaR, we build monolingual, multilingual, and cross-lingual MRC baseline models. Even for the relatively simple monolingual MRC on this dataset, experiments show that a strong BERT baseline is over 30 points behind human in terms of both EM and F1 score, indicating that BiPaR provides a challenging testbed for monolingual, multilingual and cross-lingual MRC on novels. The dataset is available at https://multinlp.github.io/BiPaR/. 3 authors · Oct 11, 2019
1 NovelQA: A Benchmark for Long-Range Novel Question Answering The rapid advancement of Large Language Models (LLMs) has introduced a new frontier in natural language processing, particularly in understanding and processing long-context information. However, the evaluation of these models' long-context abilities remains a challenge due to the limitations of current benchmarks. To address this gap, we introduce NovelQA, a benchmark specifically designed to test the capabilities of LLMs with extended texts. Constructed from English novels, NovelQA offers a unique blend of complexity, length, and narrative coherence, making it an ideal tool for assessing deep textual understanding in LLMs. This paper presents the design and construction of NovelQA, highlighting its manual annotation, and diverse question types. Our evaluation of Long-context LLMs on NovelQA reveals significant insights into the models' performance, particularly emphasizing the challenges they face with multi-hop reasoning, detail-oriented questions, and extremely long input with more than 100,000 tokens. The results underscore the necessity for further advancements in LLMs to improve their long-context comprehension and computational literary studies. 9 authors · Mar 18, 2024
1 DetectiveQA: Evaluating Long-Context Reasoning on Detective Novels With the rapid advancement of Large Language Models (LLMs), long-context information understanding and processing have become a hot topic in academia and industry. However, benchmarks for evaluating the ability of LLMs to handle long-context information do not seem to have kept pace with the development of LLMs. Despite the emergence of various long-context evaluation benchmarks, the types of capability assessed are still limited, without new capability dimensions. In this paper, we introduce DetectiveQA, a narrative reasoning benchmark featured with an average context length of over 100K tokens. DetectiveQA focuses on evaluating the long-context reasoning ability of LLMs, which not only requires a full understanding of context but also requires extracting important evidences from the context and reasoning according to extracted evidences to answer the given questions. This is a new dimension of capability evaluation, which is more in line with the current intelligence level of LLMs. We use detective novels as data sources, which naturally have various reasoning elements. Finally, we manually annotated 600 questions in Chinese and then also provided an English edition of the context information and questions. We evaluate many long-context LLMs on DetectiveQA, including commercial and open-sourced models, and the results indicate that existing long-context LLMs still require significant advancements to effectively process true long-context dependency questions. 10 authors · Sep 4, 2024
- A Bilingual Parallel Corpus with Discourse Annotations Machine translation (MT) has almost achieved human parity at sentence-level translation. In response, the MT community has, in part, shifted its focus to document-level translation. However, the development of document-level MT systems is hampered by the lack of parallel document corpora. This paper describes BWB, a large parallel corpus first introduced in Jiang et al. (2022), along with an annotated test set. The BWB corpus consists of Chinese novels translated by experts into English, and the annotated test set is designed to probe the ability of machine translation systems to model various discourse phenomena. Our resource is freely available, and we hope it will serve as a guide and inspiration for more work in document-level machine translation. 6 authors · Oct 26, 2022
- Locations of Characters in Narratives: Andersen and Persuasion Datasets The ability of machines to grasp spatial understanding within narrative contexts is an intriguing aspect of reading comprehension that continues to be studied. Motivated by the goal to test the AI's competence in understanding the relationship between characters and their respective locations in narratives, we introduce two new datasets: Andersen and Persuasion. For the Andersen dataset, we selected fifteen children's stories from "Andersen's Fairy Tales" by Hans Christian Andersen and manually annotated the characters and their respective locations throughout each story. Similarly, for the Persuasion dataset, characters and their locations in the novel "Persuasion" by Jane Austen were also manually annotated. We used these datasets to prompt Large Language Models (LLMs). The prompts are created by extracting excerpts from the stories or the novel and combining them with a question asking the location of a character mentioned in that excerpt. Out of the five LLMs we tested, the best-performing one for the Andersen dataset accurately identified the location in 61.85% of the examples, while for the Persuasion dataset, the best-performing one did so in 56.06% of the cases. 3 authors · Apr 4
- Ex3: Automatic Novel Writing by Extracting, Excelsior and Expanding Generating long-term texts such as novels using artificial intelligence has always been a challenge. A common approach is to use large language models (LLMs) to construct a hierarchical framework that first plans and then writes. Despite the fact that the generated novels reach a sufficient length, they exhibit poor logical coherence and appeal in their plots and deficiencies in character and event depiction, ultimately compromising the overall narrative quality. In this paper, we propose a method named Extracting Excelsior and Expanding. Ex3 initially extracts structure information from raw novel data. By combining this structure information with the novel data, an instruction-following dataset is meticulously crafted. This dataset is then utilized to fine-tune the LLM, aiming for excelsior generation performance. In the final stage, a tree-like expansion method is deployed to facilitate the generation of arbitrarily long novels. Evaluation against previous methods showcases Ex3's ability to produce higher-quality long-form novels. 8 authors · Aug 15, 2024
- Annotation Guidelines for Corpus Novelties: Part 2 -- Alias Resolution Version 1.0 The Novelties corpus is a collection of novels (and parts of novels) annotated for Alias Resolution, among other tasks. This document describes the guidelines applied during the annotation process. It contains the instructions used by the annotators, as well as a number of examples retrieved from the annotated novels, and illustrating how canonical names should be defined, and which names should be considered as referring to the same entity. 2 authors · Oct 1, 2024
- Does Burrows' Delta really confirm that Rowling and Galbraith are the same author? The stylo package includes a frequency table that can be used to calculate distances between texts and thus independently solve the problem of attribution of The Cuckoo's Calling, a novel that J.K. Rowling said she wrote. However, the set of texts for this table is very vulnerable to criticism. The authors there are not modern, they wrote in a different genre. I set out to test the performance of the method on texts that are more relevant to the research question. 1 authors · Jul 14, 2024
- Romantic-Computing In this paper we compare various text generation models' ability to write poetry in the style of early English Romanticism. These models include: Character-Level Recurrent Neural Networks with Long Short-Term Memory, Hugging Face's GPT-2, OpenAI's GPT-3, and EleutherAI's GPT-NEO. Quality was measured based syllable count and coherence with the automatic evaluation metric GRUEN. Character-Level Recurrent Neural Networks performed far worse compared to transformer models. And, as parameter-size increased, the quality of transformer models' poems improved. These models are typically not compared in a creative context, and we are happy to contribute. 1 authors · Jun 1, 2022
- Annotation Guidelines for Corpus Novelties: Part 1 -- Named Entity Recognition The Novelties corpus is a collection of novels (and parts of novels) annotated for Named Entity Recognition (NER) among other tasks. This document describes the guidelines applied during its annotation. It contains the instructions used by the annotators, as well as a number of examples retrieved from the annotated novels, and illustrating expressions that should be marked as entities as well as expressions that should not. 2 authors · Oct 3, 2024
- Interconnected Kingdoms: Comparing 'A Song of Ice and Fire' Adaptations Across Media Using Complex Networks In this article, we propose and apply a method to compare adaptations of the same story across different media. We tackle this task by modelling such adaptations through character networks. We compare them by leveraging two concepts at the core of storytelling: the characters involved, and the dynamics of the story. We propose several methods to match characters between media and compare their position in the networks; and perform narrative matching, i.e. match the sequences of narrative units that constitute the plots. We apply these methods to the novel series A Song of Ice and Fire, by G.R.R. Martin, and its comics and TV show adaptations. Our results show that interactions between characters are not sufficient to properly match individual characters between adaptations, but that using some additional information such as character affiliation or gender significantly improves the performance. On the contrary, character interactions convey enough information to perform narrative matching, and allow us to detect the divergence between the original novels and its TV show adaptation. 5 authors · Oct 7, 2024