论文标题

章节绑架者:小说中的文本细分

Chapter Captor: Text Segmentation in Novels

论文作者

Pethe, Charuta, Kim, Allen, Skiena, Steven

论文摘要

书籍通常分为章节和部分,代表连贯的亚措施和主题。我们调查了预测章节边界的任务,作为对长期细分的一般任务的代理。我们使用混合方法结合了神经推理和规则匹配,以识别书籍中的章节标题,在此任务上达到了0.77的F1次数,我们建立了一个9,126本英语小说的Gutenberg分会细分数据集。在删除结构提示后,使用这些注释的数据作为基础真理,我们提出了基于剪切的神经方法,用于章节分割,以0.453的F1得分达到0.453,这是对账面长度文档的精确中断预测的具有挑战性的任务。最后,我们揭示了小说一章结构中有趣的历史趋势。

Books are typically segmented into chapters and sections, representing coherent subnarratives and topics. We investigate the task of predicting chapter boundaries, as a proxy for the general task of segmenting long texts. We build a Project Gutenberg chapter segmentation data set of 9,126 English novels, using a hybrid approach combining neural inference and rule matching to recognize chapter title headers in books, achieving an F1-score of 0.77 on this task. Using this annotated data as ground truth after removing structural cues, we present cut-based and neural methods for chapter segmentation, achieving an F1-score of 0.453 on the challenging task of exact break prediction over book-length documents. Finally, we reveal interesting historical trends in the chapter structure of novels.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源