提供科学出版物的可视化摘要可以增加读者获得信息的机会,从而有助于应对科学出版物数量的指数级增长。然而,很少有提供视觉出版物摘要, 而且主要侧重于生物医学领域。这主要是因为有注释的黄金标准有限,这妨碍了可靠和高绩效监督学习技术的应用。为了解决这些问题 ,我们创建一个新的基准数据集,用于选择数字,以根据出版物摘要作为出版物的可视化摘要,涵盖计算机科学的几个领域。此外,我们开发一种自我监督的学习方法,基于对数字与数字标题的内联引用的启发式匹配。生物医学和计算机科学领域的实验表明,尽管我们自我监督,并因此不依赖任何带注释的培训数据,但我们的模型能够超越最先进的技术。
Providing visual summaries of scientific publications can increase information access for readers and thereby help deal with the exponential growth in the number of scientific publications. Nonetheless, efforts in providing visual publication summaries have been few and fart apart, primarily focusing on the biomedical domain. This is primarily because of the limited availability of annotated gold standards, which hampers the application of robust and high-performing supervised learning techniques. To address these problems we create a new benchmark dataset for selecting figures to serve as visual summaries of publications based on their abstracts, covering several domains in computer science. Moreover, we develop a self-supervised learning approach, based on heuristic matching of inline references to figures with figure captions. Experiments in both biomedical and computer science domains show that our model is able to outperform the state of the art despite being self-supervised and therefore not relying on any annotated training data.
本文系外文翻译,前往查看
如有侵权,请联系 cloudcommunity@tencent.com 删除。
本文系外文翻译,前往查看
如有侵权,请联系 cloudcommunity@tencent.com 删除。