延迟基于变压器的编码器中的交互层，实现高效的开放域问题解答

zstt8054929

修改于 2020-12-18 03:12:05

4020

维萨姆·西布利尼，穆罕默德·查拉尔，夏洛特·帕斯夸尔

大量文件（如维基百科）上的开放领域问题解答（ODQA）是计算机科学中的一个关键挑战。尽管基于变压器的语言模型（如 Bert）在 SQuAD 上展示了在文本小段落中提取答案的能力，但它们在面对更大的搜索空间时，其复杂性很高。解决此问题的最常见方式是添加初步的信息检索步骤，以筛选语料库，并仅保留相关段落。在这篇论文中，我们提出了一个更直接和互补的解决方案，它包括应用基于变压器的模型架构的通用更改，以延缓输入子部分之间的注意，并允许更有效地管理计算。由此产生的变型与采掘任务上的原始型号具有竞争力，并且允许在 ODQA 设置上显著加速，甚至在许多情况下性能提高。

Delaying Interaction Layers in Transformer-based Encoders for Efficient Open Domain Question Answering

Wissam Siblini, Mohamed Challal, Charlotte Pasqual

Open Domain Question Answering (ODQA) on a large-scale corpus of documents (e.g. Wikipedia) is a key challenge in computer science. Although transformer-based language models such as Bert have shown on SQuAD the ability to surpass humans for extracting answers in small passages of text, they suffer from their high complexity when faced to a much larger search space. The most common way to tackle this problem is to add a preliminary Information Retrieval step to heavily filter the corpus and only keep the relevant passages. In this paper, we propose a more direct and complementary solution which consists in applying a generic change in the architecture of transformer-based models to delay the attention between subparts of the input and allow a more efficient management of computations. The resulting variants are competitive with the original models on the extractive task and allow, on the ODQA setting, a significant speedup and even a performance improvement in many cases.

本文系外文翻译，前往查看

如有侵权，请联系 cloudcommunity@tencent.com 删除。

linux

css

本文系外文翻译，前往查看

如有侵权，请联系 cloudcommunity@tencent.com 删除。

linux

css

登录后参与评论

暂无评论