人工智能学术速递[8.18]

公众号-arXiv每日学术速递

发布于 2021-08-24 16:29:10

6450

发布于 2021-08-24 16:29:10

Update！H5支持摘要折叠，体验更佳！点击阅读原文访问arxivdaily.com，涵盖CS|物理|数学|经济|统计|金融|生物|电气领域，更有搜索、收藏等功能！

cs.AI人工智能，共计28篇

【1】 Group-aware Contrastive Regression for Action Quality Assessment 标题：群体意识对比回归在行动质量评估中的应用链接：https://arxiv.org/abs/2108.07797

作者：Xumin Yu,Yongming Rao,Wenliang Zhao,Jiwen Lu,Jie Zhou 机构：Department of Automation, Tsinghua University, China, State Key Lab of Intelligent Technologies and Systems, China, Beijing National Research Center for Information Science and Technology, China 备注：Accepted to ICCV 2021 摘要：由于视频之间的细微差异和分数的巨大差异，评估动作质量具有挑战性。大多数现有的方法都是通过从单个视频中回归质量分数来解决这个问题，因为视频间的分数变化很大。在本文中，我们发现视频之间的关系可以为训练和推理过程中更准确的动作质量评估提供重要线索。具体而言，我们将动作质量评估问题重新表述为参考另一个具有共同属性（例如类别和难度）的视频回归相对分数，而不是学习未参考分数。根据这个公式，我们提出了一个新的对比回归（CoRe）框架，通过成对比较来学习相对分数，该框架突出了视频之间的差异，并指导模型学习评估的关键提示。为了进一步利用两个视频之间的相关信息，我们设计了一个群体感知回归树，将传统的分数回归转化为两个更简单的子问题：从粗到细的分类和小间隔回归。为了证明CoRe的有效性，我们在三个主流AQA数据集上进行了广泛的实验，包括AQA-7、MTL-AQA和JIGSAWS。我们的方法大大优于以前的方法，并在所有三个基准上建立了新的最先进水平。摘要：Assessing action quality is challenging due to the subtle differences between videos and large variations in scores. Most existing approaches tackle this problem by regressing a quality score from a single video, suffering a lot from the large inter-video score variations. In this paper, we show that the relations among videos can provide important clues for more accurate action quality assessment during both training and inference. Specifically, we reformulate the problem of action quality assessment as regressing the relative scores with reference to another video that has shared attributes (e.g., category and difficulty), instead of learning unreferenced scores. Following this formulation, we propose a new Contrastive Regression (CoRe) framework to learn the relative scores by pair-wise comparison, which highlights the differences between videos and guides the models to learn the key hints for assessment. In order to further exploit the relative information between two videos, we devise a group-aware regression tree to convert the conventional score regression into two easier sub-problems: coarse-to-fine classification and regression in small intervals. To demonstrate the effectiveness of CoRe, we conduct extensive experiments on three mainstream AQA datasets including AQA-7, MTL-AQA and JIGSAWS. Our approach outperforms previous methods by a large margin and establishes new state-of-the-art on all three benchmarks.

【2】 Feature Recommendation for Structural Equation Model Discovery in Process Mining 标题：过程挖掘中结构方程模型发现的特征推荐链接：https://arxiv.org/abs/2108.07795

作者：Mahnaz Sadat Qafari,Wil van der Aalst 机构：Rheinisch-Westf¨alische Technische Hochschule Aachen(RWTH), Aachen, Germany 备注：28 pages, 16 figures 摘要：流程挖掘技术可以帮助组织改进其运营流程。在发现和修改性能或法规遵从性问题的根本原因方面，企业可以从流程挖掘技术中获益。考虑到当今公司信息系统捕获的数据量和特征数量，发现根本原因分析中应考虑的特征集的任务可能相当复杂。在本文中，我们提出了一种方法来寻找（聚合）特征集，这可能会对问题产生影响。根本原因分析任务通常通过将机器学习技术应用于从支持流程的信息系统收集的数据来完成。为了防止由于将机器学习技术的发现解释为因果关系而可能发生的相关性和因果关系混淆，我们提出了一种发现过程结构方程模型的方法，该模型可用于根本原因分析。我们已经在ProM中实现了所提出的方法作为插件，并使用两个真实和合成事件日志对其进行了评估。这些实验证明了所提方法的有效性和有效性。摘要：Process mining techniques can help organizations to improve their operational processes. Organizations can benefit from process mining techniques in finding and amending the root causes of performance or compliance problems. Considering the volume of the data and the number of features captured by the information system of today's companies, the task of discovering the set of features that should be considered in root cause analysis can be quite involving. In this paper, we propose a method for finding the set of (aggregated) features with a possible effect on the problem. The root cause analysis task is usually done by applying a machine learning technique to the data gathered from the information system supporting the processes. To prevent mixing up correlation and causation, which may happen because of interpreting the findings of machine learning techniques as causal, we propose a method for discovering the structural equation model of the process that can be used for root cause analysis. We have implemented the proposed method as a plugin in ProM and we have evaluated it using two real and synthetic event logs. These experiments show the validity and effectiveness of the proposed methods.

【3】 RandomRooms: Unsupervised Pre-training from Synthetic Shapes and Randomized Layouts for 3D Object Detection 标题：RandomRoom：用于3D目标检测的合成形状和随机布局的无监督预训练链接：https://arxiv.org/abs/2108.07794

作者：Yongming Rao,Benlin Liu,Yi Wei,Jiwen Lu,Cho-Jui Hsieh,Jie Zhou 机构：Tsinghua University,UCLA,University of Washington 备注：Accepted to ICCV 2021 摘要：三维点云理解近年来取得了很大进展。然而，一个主要的瓶颈是缺少带注释的真实数据集，特别是与2D对象检测任务相比，因为注释场景的真实扫描需要大量的劳动力。解决这个问题的一个很有希望的方法是更好地利用由CAD对象模型组成的合成数据集，以促进对真实数据集的学习。这可以通过预训练和微调程序实现。然而，最近关于3D预训练的工作在将合成对象上学习到的特征转移到其他实际应用中时显示出失败。在这项工作中，我们提出了一种称为随机房间的新方法来实现这一目标。特别是，我们建议利用合成CAD数据集中的对象生成场景的随机布局，并通过对同一组合成对象生成的两个随机场景应用对象级对比学习来学习3D场景表示。在以后对3D对象检测任务进行微调时，以这种方式预先训练的模型可以作为更好的初始化。从经验上看，我们在几个基础模型上显示了下游3D检测任务的持续改进，特别是当使用较少的训练数据时，这有力地证明了我们方法的有效性和泛化性。得益于合成数据中丰富的语义知识和多样的对象，我们的方法在广泛使用的3D检测基准ScanNetV2和SUN RGB-D上建立了新的技术水平。我们期望我们的尝试能够提供一个新的视角，用于连接对象和场景级别的3D理解。摘要：3D point cloud understanding has made great progress in recent years. However, one major bottleneck is the scarcity of annotated real datasets, especially compared to 2D object detection tasks, since a large amount of labor is involved in annotating the real scans of a scene. A promising solution to this problem is to make better use of the synthetic dataset, which consists of CAD object models, to boost the learning on real datasets. This can be achieved by the pre-training and fine-tuning procedure. However, recent work on 3D pre-training exhibits failure when transfer features learned on synthetic objects to other real-world applications. In this work, we put forward a new method called RandomRooms to accomplish this objective. In particular, we propose to generate random layouts of a scene by making use of the objects in the synthetic CAD dataset and learn the 3D scene representation by applying object-level contrastive learning on two random scenes generated from the same set of synthetic objects. The model pre-trained in this way can serve as a better initialization when later fine-tuning on the 3D object detection task. Empirically, we show consistent improvement in downstream 3D detection tasks on several base models, especially when less training data are used, which strongly demonstrates the effectiveness and generalization of our method. Benefiting from the rich semantic knowledge and diverse objects from synthetic data, our method establishes the new state-of-the-art on widely-used 3D detection benchmarks ScanNetV2 and SUN RGB-D. We expect our attempt to provide a new perspective for bridging object and scene-level 3D understanding.

【4】 On Limited Non-Prioritised Belief Revision Operators with Dynamic Scope 标题：关于动态作用域的有限非优先信念修正算子链接：https://arxiv.org/abs/2108.07769

作者：Kai Sauerwald,Gabriele Kern-Isberner,Christoph Beierle 机构：FernUniversit¨at in Hagen, Hagen, Germany, TU Dortmund University, Dortmund, Germany 摘要：非优先修订研究的是不接受所有新信念的修订操作员。在本文中，我们通过引入动态有限修正的概念，对这一研究方向做出了贡献，动态有限修正是一种可通过有限世界集合上的总前序来表达的修正。对于信念改变运算符，我们考虑范围，它包括产生修正成功的信念。我们证明了对于满足单句闭包和析取完备性的每一个集合，都存在一个动态有限修正，该修正以该集合与信念集的并集为范围。我们研究了信念和范围动力学的迭代假设，并对其进行了动态有限修正。作为一个应用，我们使用动态有限修正来研究在所谓的内在信念的背景下的信念修正，这些信念是被agent普遍接受的。这导致了我们称之为inherence limited的修订运算符。我们提出了一个内在有限修正的表示定理，并将这些算子和动态有限修正算子与密切相关的可信有限修正算子进行了比较。摘要：The research on non-prioritized revision studies revision operators which do not accept all new beliefs. In this paper, we contribute to this line of research by introducing the concept of dynamic-limited revision, which are revisions expressible by a total preorder over a limited set of worlds. For a belief change operator, we consider the scope, which consists of those beliefs which yield success of revision. We show that for each set satisfying single sentence closure and disjunction completeness there exists a dynamic-limited revision having the union of this set with the beliefs set as scope. We investigate iteration postulates for belief and scope dynamics and characterise them for dynamic-limited revision. As an application, we employ dynamic-limited revision to studying belief revision in the context of so-called inherent beliefs, which are beliefs globally accepted by the agent. This leads to revision operators which we call inherence-limited. We present a representation theorem for inherence-limited revision, and we compare these operators and dynamic-limited revision with the closely related credible-limited revision operators.

【5】 Prediction of Students performance with Artificial Neural Network using Demographic Traits 标题：基于人口学特征的人工神经网络对学生成绩的预测链接：https://arxiv.org/abs/2108.07717

作者：Adeniyi Jide Kehinde,Abidemi Emmanuel Adeniyi,Roseline Oluwaseun Ogundokun,Himanshu Gupta,Sanjay Misra 机构：Department of Computer Science, Landmark University Omu Aran, Nigeria, Birla Institute of Technology Pilani, Hyderabad, Department of Electrical and Information Engineering, Covenant University, Ota, Nigeria 备注：10 pages, 7 figures, 3 Tables, Fourth International Conference on Recent Innovations in Computing (IRCIC-2021) 摘要：许多研究人员使用多种数据挖掘技术研究了学生在有监督和无监督学习中的学习成绩。神经网络通常需要更多的观测数据来获得足够的预测能力。由于贫困毕业生的增长率，有必要设计一个系统，帮助减少这种威胁，并减少学生不得不重复因成绩不佳或不得不辍学在整个职业生涯中期的发生率。因此，有必要研究每一种方法及其优缺点，以确定哪种方法更有效，以及在何种情况下应优先于另一种方法。该研究的目的是开发一个系统，用人工神经网络来预测学生的表现，利用学生的人口特征，以帮助大学选择候选人（学生）的录取成功率高预测使用以前的招生录取学生的学习记录，最终会导致质量。该机构的毕业生。该模型是基于某些选定变量作为输入而建立的。它达到了92.3%以上的准确率，显示了人工神经网络的潜在有效性作为预测工具和选择标准的考生寻求进入大学。摘要：Many researchers have studied student academic performance in supervised and unsupervised learning using numerous data mining techniques. Neural networks often need a greater collection of observations to achieve enough predictive ability. Due to the increase in the rate of poor graduates, it is necessary to design a system that helps to reduce this menace as well as reduce the incidence of students having to repeat due to poor performance or having to drop out of school altogether in the middle of the pursuit of their career. It is therefore necessary to study each one as well as their advantages and disadvantages, so as to determine which is more efficient in and in what case one should be preferred over the other. The study aims to develop a system to predict student performance with Artificial Neutral Network using the student demographic traits so as to assist the university in selecting candidates (students) with a high prediction of success for admission using previous academic records of students granted admissions which will eventually lead to quality graduates of the institution. The model was developed based on certain selected variables as the input. It achieved an accuracy of over 92.3 percent, showing Artificial Neural Network potential effectiveness as a predictive tool and a selection criterion for candidates seeking admission to a university.

【6】 Demonstrating REACT: a Real-time Educational AI-powered Classroom Tool 标题：演示反应：一个实时教育人工智能支持的课堂工具链接：https://arxiv.org/abs/2108.07693

作者：Ajay Kulkarni,Olga Gkountouna 机构：George Mason University 备注：Published in the 14th International Conference on Educational Data Mining (EDM21) 摘要：我们展示了REACT，这是一种新的实时教育AI课堂工具，采用EDM技术支持教育者的决策过程。REACT是一种数据驱动工具，具有用户友好的图形界面。它分析学生的表现数据，并提供基于上下文的警报，以及向教育者提供课程规划建议。此外，它还结合了模型不可知论的解释，以便在决策过程中带来可解释性和可解释性。本文使用一个真实的数据集演示了我们提出的工具的一个用例场景，并给出了它的体系结构和用户界面的设计。本演示侧重于基于学生在课堂活动中的表现（即，错误的回答和使用的提示）的聚集性聚集。这种优势和劣势相似的学生群的形成可能有助于教育工作者通过识别风险学生、组建学习小组或鼓励不同优势学生之间的辅导来改进课程规划。摘要：We present a demonstration of REACT, a new Real-time Educational AI-powered Classroom Tool that employs EDM techniques for supporting the decision-making process of educators. REACT is a data-driven tool with a user-friendly graphical interface. It analyzes students' performance data and provides context-based alerts as well as recommendations to educators for course planning. Furthermore, it incorporates model-agnostic explanations for bringing explainability and interpretability in the process of decision making. This paper demonstrates a use case scenario of our proposed tool using a real-world dataset and presents the design of its architecture and user interface. This demonstration focuses on the agglomerative clustering of students based on their performance (i.e., incorrect responses and hints used) during an in-class activity. This formation of clusters of students with similar strengths and weaknesses may help educators to improve their course planning by identifying at-risk students, forming study groups, or encouraging tutoring between students of different strengths.

【7】 Visual Enhanced 3D Point Cloud Reconstruction from A Single Image 标题：基于单幅图像的视觉增强三维点云重建链接：https://arxiv.org/abs/2108.07685

作者：Guiju Ping,Mahdi Abolfazli Esfahani,Han Wang 机构：Nanyang Technological University, Singapore 备注：8 pages 摘要：解决从单个图像重建三维物体的挑战性问题，使现有技术能够使用单个单目摄像机而不需要深度传感器。近年来，由于深度学习的发展，单个图像的三维重建已经取得了令人瞩目的进展。现有的研究使用倒角距离作为损失函数来指导神经网络的训练。但是，倒角损失将为三维点云内的所有点提供相等的权重。它倾向于牺牲细粒度和薄结构，以避免产生高损失，这将导致视觉效果不理想。本文提出了一个框架，通过更多地关注边界（边缘和角点），可以从单个图像中恢复详细的三维点云。实验结果表明，该方法在定性和定量上均优于现有方法，且训练参数较少。摘要：Solving the challenging problem of 3D object reconstruction from a single image appropriately gives existing technologies the ability to perform with a single monocular camera rather than requiring depth sensors. In recent years, thanks to the development of deep learning, 3D reconstruction of a single image has demonstrated impressive progress. Existing researches use Chamfer distance as a loss function to guide the training of the neural network. However, the Chamfer loss will give equal weights to all points inside the 3D point clouds. It tends to sacrifice fine-grained and thin structures to avoid incurring a high loss, which will lead to visually unsatisfactory results. This paper proposes a framework that can recover a detailed three-dimensional point cloud from a single image by focusing more on boundaries (edge and corner points). Experimental results demonstrate that the proposed method outperforms existing techniques significantly, both qualitatively and quantitatively, and has fewer training parameters.

【8】 ImitAL: Learning Active Learning Strategies from Synthetic Data 标题：ImitAL：从合成数据中学习主动学习策略链接：https://arxiv.org/abs/2108.07670

作者：Julius Gonsior,Maik Thiele,Wolfgang Lehner 机构：Technische Universit¨at Dresden, Dresden, Germany 摘要：使应用监督机器学习复杂化的最大挑战之一是需要大量的标记数据。主动学习（AL）是一种众所周知的标准方法，通过基于查询策略首先标记包含最多信息的样本来有效地获取标记数据。尽管过去已经提出了许多查询策略的方法，但是还没有发现一种明显的适用于所有领域的优越方法。此外，许多策略的计算成本很高，这进一步阻碍了AL在大规模注释项目中的广泛使用。因此，我们提出了ImitAL，一种新的查询策略，它将AL编码为一个学习排序问题。为了训练底层神经网络，我们选择了模仿学习。训练所需的演示专家经验来自纯合成数据。为了显示\ImitAL{}的通用性和优越性，我们对来自广泛领域的15个不同数据集上的策略与10种不同的最先进的查询策略进行了广泛的比较。我们还表明，与大多数其他策略相比，我们的方法具有更高的运行时性能，特别是在非常大的数据集上。摘要：One of the biggest challenges that complicates applied supervised machine learning is the need for huge amounts of labeled data. Active Learning (AL) is a well-known standard method for efficiently obtaining labeled data by first labeling the samples that contain the most information based on a query strategy. Although many methods for query strategies have been proposed in the past, no clear superior method that works well in general for all domains has been found yet. Additionally, many strategies are computationally expensive which further hinders the widespread use of AL for large-scale annotation projects. We, therefore, propose ImitAL, a novel query strategy, which encodes AL as a learning-to-rank problem. For training the underlying neural network we chose Imitation Learning. The required demonstrative expert experience for training is generated from purely synthetic data. To show the general and superior applicability of \ImitAL{}, we perform an extensive evaluation comparing our strategy on 15 different datasets, from a wide range of domains, with 10 different state-of-the-art query strategies. We also show that our approach is more runtime performant than most other strategies, especially on very large datasets.

【9】 Thirty years of Epistemic Specifications 标题：三十年的认识论规范链接：https://arxiv.org/abs/2108.07669

作者：Jorge Fandinno,Wolfgang Faber,Michael Gelfond 机构：University of Nebraska Omaha, USA, University of Potsdam, Germany, Alpen-Adria-Universit¨at Klagenfurt, Austria, Texas Tech University, USA 备注：Under consideration in Theory and Practice of Logic Programming (TPLP) 摘要：认知规范语言和认知逻辑程序语言在稳定模型语义下扩展了析取逻辑程序，并使用称为主观文本的模态结构。使用主观文字，可以检查程序的每个或某些稳定模型中的常规文字是否为真，这些模型在本文中也称为信念集，收集在称为世界视图的集合中。这允许在语言中表示某个命题是否应该根据开放世界或封闭世界的假设来理解。为了通过形式语义学来捕捉语言背后的直觉，进行了几次尝试，结果产生了大量的建议，使得理解当前的技术状态变得困难。在这篇文章中，我们提供了一个该领域的开端和它适合的知识表示和推理任务的概述。我们还详细分析了所提出的语义的性质，并展望了该领域未来研究所要解决的挑战。在逻辑编程（TPLP）的理论和实践中正在考虑摘要：The language of epistemic specifications and epistemic logic programs extends disjunctive logic programs under the stable model semantics with modal constructs called subjective literals. Using subjective literals, it is possible to check whether a regular literal is true in every or some stable models of the program, those models, in this context also called \emph{belief sets}, being collected in a set called world view. This allows for representing, within the language, whether some proposition should be understood accordingly to the open or the closed world assumption. Several attempts for capturing the intuitions underlying the language by means of a formal semantics were given, resulting in a multitude of proposals that makes it difficult to understand the current state of the art. In this paper, we provide an overview of the inception of the field and the knowledge representation and reasoning tasks it is suitable for. We also provide a detailed analysis of properties of proposed semantics, and an outlook of challenges to be tackled by future research in the area. Under consideration in Theory and Practice of Logic Programming (TPLP)

【10】 MVCNet: Multiview Contrastive Network for Unsupervised Representation Learning for 3D CT Lesions 标题：MVCNet：三维CT病变无监督表征学习的多视图对比网络链接：https://arxiv.org/abs/2108.07662

作者：Penghua Zhai,Huaiwei Cong,Gangming Zhao,Chaowei Fang,Jinpeng Li 机构：Ting Cai, and Huiguang He, Center for Pattern Recognition and Intelligent Medicine, HwaMei Hospital, University of Chinese Academy of Sciences, Ningbo , China, Ningbo Institute of Life and Health Industry, University of Chinese Academy of 备注：This 16-page manuscript has been submitted to Meidcal Image Analysis for possible publication 摘要：随着深度学习的复兴，计算机断层扫描（CT）的自动诊断系统已经取得了许多成功的应用。然而，它们大多归因于仔细的专家注释，而在实践中通常很少。这促使我们对无监督表征学习产生兴趣。最近的研究表明，自我监督学习是学习表征的一种有效方法，但大多数研究依赖于转换和借口任务的经验设计。为了避免与这些方法相关的主观性，我们提出了MVCNet，一种新的无监督三维（3D）表示学习方法，以无变换的方式工作。我们从不同方向查看每个3D病变，以收集多个二维（2D）视图。然后，通过最小化对比损失学习嵌入函数，从而聚集相同3D病变的2D视图，分离不同病变的2D视图。我们通过在嵌入层上训练一个简单的分类头来评估表示。实验结果表明，MVCNet在LIDC-IDRI（89.55%）、LNDb（77.69%）和天池（79.96%）数据集上实现了最先进的无监督表征学习精度。当对10%的标记数据进行微调时，准确度与监督学习模型相当（在三个数据集上分别为89.46%和85.03%，73.85%和73.44%，83.56%和83.34%），表明MVCNet在有限注释的学习表示方面具有优势。代码发布于：https://github.com/penghuazhai/MVCNet. 摘要：With the renaissance of deep learning, automatic diagnostic systems for computed tomography (CT) have achieved many successful applications. However, they are mostly attributed to careful expert annotations, which are often scarce in practice. This drives our interest to the unsupervised representation learning. Recent studies have shown that self-supervised learning is an effective approach for learning representations, but most of them rely on the empirical design of transformations and pretext tasks. To avoid the subjectivity associated with these methods, we propose the MVCNet, a novel unsupervised three dimensional (3D) representation learning method working in a transformation-free manner. We view each 3D lesion from different orientations to collect multiple two dimensional (2D) views. Then, an embedding function is learned by minimizing a contrastive loss so that the 2D views of the same 3D lesion are aggregated, and the 2D views of different lesions are separated. We evaluate the representations by training a simple classification head upon the embedding layer. Experimental results show that MVCNet achieves state-of-the-art accuracies on the LIDC-IDRI (89.55%), LNDb (77.69%) and TianChi (79.96%) datasets for unsupervised representation learning. When fine-tuned on 10% of the labeled data, the accuracies are comparable to the supervised learning model (89.46% vs. 85.03%, 73.85% vs. 73.44%, 83.56% vs. 83.34% on the three datasets, respectively), indicating the superiority of MVCNet in learning representations with limited annotations. Code is released at: https://github.com/penghuazhai/MVCNet.

【11】 Learning C to x86 Translation: An Experiment in Neural Compilation 标题：学习C到x86的翻译：神经编译的实验链接：https://arxiv.org/abs/2108.07639

作者：Jordi Armengol-Estapé,Michael F. P. O'Boyle 机构： O’BoyleSchool of InformaticsUniversity of Edinburghjordi 摘要：深度学习对许多领域产生了重大影响。最近，代码到代码的神经模型已被用于代码翻译、代码细化和反编译。然而，这些模型是否能够自动编译的问题还有待研究。在这项工作中，我们探索神经编译、构建和评估转换器模型，学习如何从C代码生成x86汇编程序。虽然初步结果相对较弱，但我们公开了我们的数据、模型和代码，以鼓励在这一领域的进一步研究。摘要：Deep learning has had a significant impact on many fields. Recently, code-to-code neural models have been used in code translation, code refinement and decompilation. However, the question of whether these models can automate compilation has yet to be investigated. In this work, we explore neural compilation, building and evaluating Transformer models that learn how to produce x86 assembler from C code. Although preliminary results are relatively weak, we make our data, models and code publicly available to encourage further research in this area.

【12】 Indoor Semantic Scene Understanding using Multi-modality Fusion 标题：基于多模态融合的室内语义场景理解链接：https://arxiv.org/abs/2108.07616

作者：Muraleekrishna Gopinathan,Giang Truong,Jumana Abu-Khalaf 机构：School of Science, Edith Cowan University, Western Australia, Australia 备注：International Conference on Digital Image Computing: Techniques and Applications (DICTA), 5 figures, 8 pages 摘要：无缝人机交互是开发服务机器人系统的最终目标。为此，机器人代理必须了解其周围环境，以便更好地完成给定任务。语义场景理解允许机器人代理提取关于环境中对象的语义知识。在这项工作中，我们提出了一个语义场景理解管道，该管道融合了2D和3D检测分支，以生成环境的语义地图。来自最先进的2D检测器的2D掩模方案反向投影到3D空间，并与来自点分割网络的3D检测相结合。与之前在收集的数据集上进行评估的工作不同，我们在一个活动的照片逼真的机器人环境——BenchBot上测试了我们的管道。我们的新颖之处包括使用投影2D检测和基于对象大小的模态融合纠正3D提议。这项工作是机器人视觉场景理解挑战（RVSU）的一部分。性能评估表明，我们的流水线改进了基线方法，没有明显的计算瓶颈。摘要：Seamless Human-Robot Interaction is the ultimate goal of developing service robotic systems. For this, the robotic agents have to understand their surroundings to better complete a given task. Semantic scene understanding allows a robotic agent to extract semantic knowledge about the objects in the environment. In this work, we present a semantic scene understanding pipeline that fuses 2D and 3D detection branches to generate a semantic map of the environment. The 2D mask proposals from state-of-the-art 2D detectors are inverse-projected to the 3D space and combined with 3D detections from point segmentation networks. Unlike previous works that were evaluated on collected datasets, we test our pipeline on an active photo-realistic robotic environment - BenchBot. Our novelty includes rectification of 3D proposals using projected 2D detections and modality fusion based on object size. This work is done as part of the Robotic Vision Scene Understanding Challenge (RVSU). The performance evaluation demonstrates that our pipeline has improved on baseline methods without significant computational bottleneck.

【13】 Coalesced Multi-Output Tsetlin Machines with Clause Sharing 标题：具有子句共享的联合多输出Tsetlin机链接：https://arxiv.org/abs/2108.07594

作者：Sondre Glimsdal,Ole-Christoffer Granmo 备注：23 pages, 9 figures 摘要：通过使用有限状态机学习模式，Tsetlin机器（TMs）在多个基准测试中获得了具有竞争力的精度和学习速度，并且节省了内存和能源。TM将模式表示为命题逻辑（和规则）中的连词从句，每个从句对特定输出投赞成票或反对票。虽然对单输出问题有效，但对于多输出问题，每个输出需要单独的TM。使用多个TM会阻碍模式重用，因为每个TM都在一个思洛存储器中运行。在本文中，我们引入子句共享，将多个TMs合并为单个TMs。每个子句通过使用权重与每个输出相关。正权重使子句投票给输出$1$，而负权重使子句投票给输出$0$。因此，这些子句合并产生多个输出。由此产生的联合Tsetlin机器（CoTM）通过在线交互随机搜索（SSL）和Tsetlin自动机（TA）团队同时学习每个子句的权重和组成。我们在MNIST、Fashion MNIST和Kuzushiji MNIST上的实证结果表明，CoTM在$50$到$1$K-子句配置上获得了比TM更高的准确性，这表明有能力重新调整子句的用途。例如，当每个类使用$50$子句（22 Kb内存）时，Fashion MNIST的准确度从$71.99$%提高到$89.66$%。当每个类使用超过$1$K的子句时，TM和CoTM的精度是相似的，而在MNIST上使用$8$K子句时，CoTM达到峰值精度的速度是$3\倍。我们进一步研究了对不平衡训练数据的鲁棒性。我们对IMDb和CIFAR10数据的不平衡版本的评估表明，CoTM对高度的类不平衡具有鲁棒性。由于能够共享子句，我们相信CoTM将支持涉及多个输出的新TM应用领域，例如学习语言模型和自动编码。摘要：Using finite-state machines to learn patterns, Tsetlin machines (TMs) have obtained competitive accuracy and learning speed across several benchmarks, with frugal memory- and energy footprint. A TM represents patterns as conjunctive clauses in propositional logic (AND-rules), each clause voting for or against a particular output. While efficient for single-output problems, one needs a separate TM per output for multi-output problems. Employing multiple TMs hinders pattern reuse because each TM then operates in a silo. In this paper, we introduce clause sharing, merging multiple TMs into a single one. Each clause is related to each output by using a weight. A positive weight makes the clause vote for output $1$, while a negative weight makes the clause vote for output $0$. The clauses thus coalesce to produce multiple outputs. The resulting coalesced Tsetlin Machine (CoTM) simultaneously learns both the weights and the composition of each clause by employing interacting Stochastic Searching on the Line (SSL) and Tsetlin Automata (TA) teams. Our empirical results on MNIST, Fashion-MNIST, and Kuzushiji-MNIST show that CoTM obtains significantly higher accuracy than TM on $50$- to $1$K-clause configurations, indicating an ability to repurpose clauses. E.g., accuracy goes from $71.99$% to $89.66$% on Fashion-MNIST when employing $50$ clauses per class (22 Kb memory). While TM and CoTM accuracy is similar when using more than $1$K clauses per class, CoTM reaches peak accuracy $3\times$ faster on MNIST with $8$K clauses. We further investigate robustness towards imbalanced training data. Our evaluations on imbalanced versions of IMDb- and CIFAR10 data show that CoTM is robust towards high degrees of class imbalance. Being able to share clauses, we believe CoTM will enable new TM application domains that involve multiple outputs, such as learning language models and auto-encoding.

【14】 MigrationsKB: A Knowledge Base of Public Attitudes towards Migrations and their Driving Factors 标题：移民KB：公众对移民的态度及其驱动因素的知识库链接：https://arxiv.org/abs/2108.07593

作者：Yiyi Chen,Harald Sack,Mehwish Alam 机构：FIZ Karlsruhe – Leibniz Institute for Information Infrastructure, Germany, Karlsruhe Institute of Technology, Institute AIFB, Germany 备注：19 pages, 11 figures 摘要：随着欧洲移民话题的日益增多，公众现在更多地通过Twitter等各种平台表达自己的观点。因此，理解网络话语对于捕捉公众舆论至关重要。本研究的目标是分析社交媒体平台，以量化公众对移民的态度，并确定导致这些态度的不同因素。使用先进的主题建模技术，收集、预处理和过滤了2013年至2021年7月期间欧洲国家移民的推文。通过基于BERT的实体链接和情感分析，以及基于注意的仇恨语音检测对策划的推文进行注释。此外，还利用外部数据库查明造成人们对移徙持消极态度的潜在社会和经济因素。为了进一步促进社会科学和计算机科学跨学科领域的研究，成果被纳入知识库（KB），即移民知识库，该知识库大大扩展了现有模型，以考虑公众对移民的态度和经济指标。该知识库使用公平原则公开，可以通过SPARQL端点查询。Zenodo上提供了数据转储。摘要：With the increasing trend in the topic of migration in Europe, the public is now more engaged in expressing their opinions through various platforms such as Twitter. Understanding the online discourses is therefore essential to capture the public opinion. The goal of this study is the analysis of social media platform to quantify public attitudes towards migrations and the identification of different factors causing these attitudes. The tweets spanning from 2013 to Jul-2021 in the European countries which are hosts to immigrants are collected, pre-processed, and filtered using advanced topic modeling technique. BERT-based entity linking and sentiment analysis, and attention-based hate speech detection are performed to annotate the curated tweets. Moreover, the external databases are used to identify the potential social and economic factors causing negative attitudes of the people about migration. To further promote research in the interdisciplinary fields of social science and computer science, the outcomes are integrated into a Knowledge Base (KB), i.e., MigrationsKB which significantly extends the existing models to take into account the public attitudes towards migrations and the economic indicators. This KB is made public using FAIR principles, which can be queried through SPARQL endpoint. Data dumps are made available on Zenodo.

【15】 The Ecosystem Path to General AI 标题：通向通用人工智能的生态系统之路链接：https://arxiv.org/abs/2108.07578

作者：Claes Strannegård,Niklas Engsner,Pietro Ferrari,Hans Glimmerfors,Marcus Hilding Södergren,Tobias Karlsson,Birger Kleve,Victor Skoglund 机构：Department of Computer Science and Engineering, Chalmers University of Technology and University of Gothenburg, Sweden 备注：10 pages. Submitted to AGI-21 摘要：我们首先讨论生态系统模拟器和一般人工智能之间的联系。然后，我们介绍了开源生态系统模拟器Ecotwin，它基于游戏引擎Unity，在包含无生命物体（如山脉和湖泊）以及生物（如动物和植物）的生态系统上运行。动物认知通过整合三个独立的网络来建模：（i）硬连线反射的\textit{reflection network}(ii）将氧气、水、能量和气味等感官数据映射到标量幸福值的\text{happiness network}；和（iii）用于选择操作的\textit{policy network}。策略网络通过强化学习（RL）进行训练，其中奖励信号定义为从一个时间步到下一个时间步的幸福感差异。所有有机体都能有性繁殖或无性繁殖，如果关键资源耗尽，它们就会死亡。我们报告了Ecotwin的三项研究的结果，其中自然现象出现在模型中而没有硬连线。首先，我们研究了一个有狼、鹿和草的陆地生态系统，其中出现了Lotka-Volterra式的种群动态。其次，我们研究了一个含有浮游植物、桡足类和磷虾的海洋生态系统，其中出现了昼夜垂直迁移行为。第三，我们研究了一个涉及致命危险的生态系统，在这个生态系统中，某些结合了RL和反射的药剂优于纯RL药剂。摘要：We start by discussing the link between ecosystem simulators and general AI. Then we present the open-source ecosystem simulator Ecotwin, which is based on the game engine Unity and operates on ecosystems containing inanimate objects like mountains and lakes, as well as organisms such as animals and plants. Animal cognition is modeled by integrating three separate networks: (i) a \textit{reflex network} for hard-wired reflexes; (ii) a \textit{happiness network} that maps sensory data such as oxygen, water, energy, and smells, to a scalar happiness value; and (iii) a \textit{policy network} for selecting actions. The policy network is trained with reinforcement learning (RL), where the reward signal is defined as the happiness difference from one time step to the next. All organisms are capable of either sexual or asexual reproduction, and they die if they run out of critical resources. We report results from three studies with Ecotwin, in which natural phenomena emerge in the models without being hardwired. First, we study a terrestrial ecosystem with wolves, deer, and grass, in which a Lotka-Volterra style population dynamics emerges. Second, we study a marine ecosystem with phytoplankton, copepods, and krill, in which a diel vertical migration behavior emerges. Third, we study an ecosystem involving lethal dangers, in which certain agents that combine RL with reflexes outperform pure RL agents.

【16】 Revisiting State Augmentation methods for Reinforcement Learning with Stochastic Delays 标题：随机延迟强化学习的重访状态增强方法链接：https://arxiv.org/abs/2108.07555

作者：Somjit Nath,Mayank Baranwal,Harshad Khadilkar 机构：TCS Research, Mumbai, India, IIT Bombay 备注：Accepted at CIKM'21 摘要：一些真实场景，如远程控制和传感，由行动和观测延迟组成。延迟的存在会降低强化学习（RL）算法的性能，通常会导致算法无法学习任何实质性内容。本文形式化地描述了具有随机延迟的马尔可夫决策过程（MDP）的概念，并证明了延迟MDP可以转化为具有显著简化的成本结构的等价标准MDP（无延迟）。我们利用这种等价性导出了一个无模型延迟解析RL框架，并证明了即使是建立在该框架上的简单RL算法，在行动和观测具有随机延迟的环境中也能获得接近最优的回报。延迟解析深度Q网络（DRDQN）算法在包括多步延迟和随机延迟的各种环境中进行了基准测试，与当前建立的算法相比，在实现接近最优的回报和最小化其计算开销方面，该算法具有更好的性能。摘要：Several real-world scenarios, such as remote control and sensing, are comprised of action and observation delays. The presence of delays degrades the performance of reinforcement learning (RL) algorithms, often to such an extent that algorithms fail to learn anything substantial. This paper formally describes the notion of Markov Decision Processes (MDPs) with stochastic delays and shows that delayed MDPs can be transformed into equivalent standard MDPs (without delays) with significantly simplified cost structure. We employ this equivalence to derive a model-free Delay-Resolved RL framework and show that even a simple RL algorithm built upon this framework achieves near-optimal rewards in environments with stochastic delays in actions and observations. The delay-resolved deep Q-network (DRDQN) algorithm is bench-marked on a variety of environments comprising of multi-step and stochastic delays and results in better performance, both in terms of achieving near-optimal rewards and minimizing the computational overhead thereof, with respect to the currently established algorithms.

【17】 Neural Photofit: Gaze-based Mental Image Reconstruction 标题：神经图像拟合：基于凝视的心理图像重建链接：https://arxiv.org/abs/2108.07524

作者：Florian Strohm,Ekta Sood,Sven Mayer,Philipp Müller,Mihai Bâce,Andreas Bulling 机构： Andreas Bulling 1University of Stuttgart {florian 摘要：我们提出了一种新的方法，利用人类的注视，将一个人心目中的图像视觉解码为一个photofit（面部合成）。我们的方法结合了三个神经网络：编码器、评分网络和解码器。编码器提取图像特征并预测人类观察者看到的每个人脸的神经激活图。神经评分网络比较人类和神经注意，并预测每个提取的图像特征的相关评分。最后，图像特征被聚合成单个特征向量，作为所有特征的线性组合，通过相关性加权，解码器解码成最终的光拟合。我们在一个新的数据集上训练神经评分网络，该数据集包含19名观看合成人脸拼贴的参与者的凝视数据。我们表明，我们的方法明显优于平均基线预测值，并报告了一项人类研究，该研究表明，我们可以解码视觉上合理且接近观察者心理图像的照片拟合。摘要：We propose a novel method that leverages human fixations to visually decode the image a person has in mind into a photofit (facial composite). Our method combines three neural networks: An encoder, a scoring network, and a decoder. The encoder extracts image features and predicts a neural activation map for each face looked at by a human observer. A neural scoring network compares the human and neural attention and predicts a relevance score for each extracted image feature. Finally, image features are aggregated into a single feature vector as a linear combination of all features weighted by relevance which a decoder decodes into the final photofit. We train the neural scoring network on a novel dataset containing gaze data of 19 participants looking at collages of synthetic faces. We show that our method significantly outperforms a mean baseline predictor and report on a human study that shows that we can decode photofits that are visually plausible and close to the observer's mental image.

【18】 Monolithic vs. hybrid controller for multi-objective Sim-to-Real learning 标题：用于多目标仿真学习的单片控制器与混合控制器链接：https://arxiv.org/abs/2108.07514

作者：Atakan Dag,Alexandre Angleraud,Wenyan Yang,Nataliya Strokina,Roel S. Pieters,Minna Lanz,Joni-Kristian Kamarainen 机构： Tampere University 备注：IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2021 摘要：仿真到真实（Sim-to-real）是一种非常有吸引力的方法，可以为机器人任务构造控制器，这种方法比解析求解更容易模拟。对于具有明确单一目标（如“达到目标”）的任务，已演示了与实际解决方案类似的工作模式。然而，现实世界的应用程序通常由多个同时的目标组成，例如“达到目标”但“避免障碍”。在强化学习（RL）的背景下，一个简单的解决方案是将多个目标组合成一个多项奖励函数，并训练单个单片控制器。最近，提出了一种基于预训练单目标控制器和它们之间切换规则的混合解决方案。在这项工作中，我们比较了这两种方法在多目标设置的机器人机械手，以达到目标，同时避免障碍。我们的研究结果表明，与单片控制器相比，混合控制器的训练更容易，并且获得了更好的成败权衡。在模拟器中训练的控制器通过实际设置进行了验证。摘要：Simulation to real (Sim-to-Real) is an attractive approach to construct controllers for robotic tasks that are easier to simulate than to analytically solve. Working Sim-to-Real solutions have been demonstrated for tasks with a clear single objective such as "reach the target". Real world applications, however, often consist of multiple simultaneous objectives such as "reach the target" but "avoid obstacles". A straightforward solution in the context of reinforcement learning (RL) is to combine multiple objectives into a multi-term reward function and train a single monolithic controller. Recently, a hybrid solution based on pre-trained single objective controllers and a switching rule between them was proposed. In this work, we compare these two approaches in the multi-objective setting of a robot manipulator to reach a target while avoiding an obstacle. Our findings show that the training of a hybrid controller is easier and obtains a better success-failure trade-off than a monolithic controller. The controllers trained in simulator were verified by a real set-up.

【19】 MOI-Mixer: Improving MLP-Mixer with Multi Order Interactions in Sequential Recommendation 标题：MOI-Mixer：序贯推荐中多阶交互改进的MLP-Mixer 链接：https://arxiv.org/abs/2108.07505

作者：Hojoon Lee,Dongyoon Hwang,Sunghwan Hong,Changyeon Kim,Seungryong Kim,Jaegul Choo 机构：KAIST AI, Korea University, KAKAO 备注：9 pages 摘要：成功的顺序推荐系统依赖于准确地捕捉用户的短期和长期兴趣。尽管基于Transformer的模型在序列推荐任务中取得了最先进的性能，但它们通常需要二次内存和序列长度的时间复杂度，因此很难提取用户的长期兴趣。另一方面，基于多层感知器（MLP）的模型，以其线性记忆和时间复杂性而闻名，最近在各种任务中显示出与Transformer相比的竞争结果。考虑到大量用户行为历史的可用性，基于MLP的模型的线性内存和时间复杂性使其成为序列推荐任务中一个很有希望的替代方案。为此，我们在顺序推荐中采用了基于MLP的模型，但一致发现，尽管基于MLP的方法具有计算优势，但其性能低于Transformer的方法。从实验中，我们观察到，在MLP层中引入显式的高阶相互作用可以缓解这种性能差距。作为回应，我们提出了多阶交互（MOI）层，它能够在保持MLP层的内存和时间复杂性的同时表达输入中任意阶的交互。通过将MLP层替换为MOI层，我们的模型能够实现与基于Transformer的模型相当的性能，同时保留基于MLP的模型的计算优势。摘要：Successful sequential recommendation systems rely on accurately capturing the user's short-term and long-term interest. Although Transformer-based models achieved state-of-the-art performance in the sequential recommendation task, they generally require quadratic memory and time complexity to the sequence length, making it difficult to extract the long-term interest of users. On the other hand, Multi-Layer Perceptrons (MLP)-based models, renowned for their linear memory and time complexity, have recently shown competitive results compared to Transformer in various tasks. Given the availability of a massive amount of the user's behavior history, the linear memory and time complexity of MLP-based models make them a promising alternative to explore in the sequential recommendation task. To this end, we adopted MLP-based models in sequential recommendation but consistently observed that MLP-based methods obtain lower performance than those of Transformer despite their computational benefits. From experiments, we observed that introducing explicit high-order interactions to MLP layers mitigates such performance gap. In response, we propose the Multi-Order Interaction (MOI) layer, which is capable of expressing an arbitrary order of interactions within the inputs while maintaining the memory and time complexity of the MLP layer. By replacing the MLP layer with the MOI layer, our model was able to achieve comparable performance with Transformer-based models while retaining the MLP-based models' computational benefits.

【20】 A Light-weight contextual spelling correction model for customizing transducer-based speech recognition systems 标题：一种用于定制基于换能器的语音识别系统的轻量级上下文拼写校正模型链接：https://arxiv.org/abs/2108.07493

作者：Xiaoqiang Wang,Yanqing Liu,Sheng Zhao,Jinyu Li 机构：Microsoft, China, Microsoft, USA 备注：This paper has been accepted by Interspeech 2021 摘要：基于传感器的自动语音识别（ASR）系统中的上下文信息是动态的，在模型训练过程中是不可用的。在这项工作中，我们引入了一个轻量级的上下文拼写纠正模型来纠正基于传感器的ASR系统中与上下文相关的识别错误。我们使用共享上下文编码器将上下文信息合并到拼写纠正模型中，并使用过滤算法来处理大型上下文列表。实验表明，该模型提高了基线ASR模型的性能，相对字错误率降低了约50%，这也显著优于基线方法，如上下文LM偏移。该模型还显示了在训练过程中没有看到的词汇表外术语的优异性能。摘要：It's challenging to customize transducer-based automatic speech recognition (ASR) system with context information which is dynamic and unavailable during model training. In this work, we introduce a light-weight contextual spelling correction model to correct context-related recognition errors in transducer-based ASR systems. We incorporate the context information into the spelling correction model with a shared context encoder and use a filtering algorithm to handle large-size context lists. Experiments show that the model improves baseline ASR model performance with about 50% relative word error rate reduction, which also significantly outperforms the baseline method such as contextual LM biasing. The model also shows excellent performance for out-of-vocabulary terms not seen during training.

【21】 Social influence leads to the formation of diverse local trends 标题：社会影响导致了不同的地方潮流的形成。链接：https://arxiv.org/abs/2108.07437

作者：Ziv Epstein,Matthew Groh,Abhimanyu Dubey,Alex "Sandy" Pentland 机构：LAND, MIT Media Lab, USA 备注：18 pages, to appear in CSCW October 2021 摘要：数字平台的视觉设计如何影响用户行为和由此产生的环境？大量研究表明，在内容中引入社会信号会增加其成功的不平等性和不可预测性，但这一点仅在听音乐的环境中得到证实。为了进一步检验社会影响对媒体流行度的影响，我们通过重新调整Salganik等人的音乐实验室实验，将这项研究扩展到算法生成图像的背景。在参与者发现和管理人工智能生成的杂交动物的数字平台上，我们随机分配其他参与者行为的知识和信息的视觉呈现。我们成功地复制了音乐实验室在图像背景下的发现，社会影响导致了一个不可预测的赢家通吃市场。然而，我们也发现，社会影响可能导致当地文化趋势的出现，这些趋势与现状不同，最终更加多样化。我们讨论了这些结果对平台设计者和动物保护工作的影响。摘要：How does the visual design of digital platforms impact user behavior and the resulting environment? A body of work suggests that introducing social signals to content can increase both the inequality and unpredictability of its success, but has only been shown in the context of music listening. To further examine the effect of social influence on media popularity, we extend this research to the context of algorithmically-generated images by re-adapting Salganik et al's Music Lab experiment. On a digital platform where participants discover and curate AI-generated hybrid animals, we randomly assign both the knowledge of other participants' behavior and the visual presentation of the information. We successfully replicate the Music Lab's findings in the context of images, whereby social influence leads to an unpredictable winner-take-all market. However, we also find that social influence can lead to the emergence of local cultural trends that diverge from the status quo and are ultimately more diverse. We discuss the implications of these results for platform designers and animal conservation efforts.

【22】 FARF: A Fair and Adaptive Random Forests Classifier 标题：FARF：一种公平自适应的随机森林分类器链接：https://arxiv.org/abs/2108.07403

作者：Wenbin Zhang,Albert Bifet,Xiangliang Zhang,Jeremy C. Weiss,Wolfgang Nejdl 机构： University of Maryland, Baltimore County, MD , USA, University of Waikato, Hamilton , New Zealand, T´el´ecom Paris, Institut Polytechnique de Paris, Palaiseau , France, King Abdullah University of Science and Technology, Thuwal , Saudi Arabia 摘要：随着人工智能（AI）在更多应用中的应用，需要考虑和减轻学习模型的偏差。大多数开发公平学习算法的工作都集中在离线设置上。然而，在许多现实世界的应用程序中，数据以在线方式出现，需要动态处理。此外，在实际应用中，需要考虑准确性和公平性之间的折衷，但目前的方法通常具有多个超参数，并通过非平凡的交互来实现公平性。在本文中，我们提出了一种灵活的集成算法，用于在更具挑战性的在线环境中进行公平决策。该算法称为FARF（公平和自适应随机森林），基于使用在线组件分类器并根据当前分布更新它们，这也考虑了公平性和改变公平性-准确性平衡的单个超参数。在真实世界的鉴别数据流上的实验证明了FARF的实用性。摘要：As Artificial Intelligence (AI) is used in more applications, the need to consider and mitigate biases from the learned models has followed. Most works in developing fair learning algorithms focus on the offline setting. However, in many real-world applications data comes in an online fashion and needs to be processed on the fly. Moreover, in practical application, there is a trade-off between accuracy and fairness that needs to be accounted for, but current methods often have multiple hyperparameters with non-trivial interaction to achieve fairness. In this paper, we propose a flexible ensemble algorithm for fair decision-making in the more challenging context of evolving online settings. This algorithm, called FARF (Fair and Adaptive Random Forests), is based on using online component classifiers and updating them according to the current distribution, that also accounts for fairness and a single hyperparameters that alters fairness-accuracy balance. Experiments on real-world discriminated data streams demonstrate the utility of FARF.

【23】 BOBCAT: Bilevel Optimization-Based Computerized Adaptive Testing 标题：Bobcat：基于双层优化的计算机化自适应测验链接：https://arxiv.org/abs/2108.07386

作者：Aritra Ghosh,Andrew Lan 机构：University of Massachusetts Amherst 备注：IJCAI 2021 with supplementary material 摘要：计算机自适应测试（CAT）是指针对每个学生/考生的个性化测试形式。CAT方法根据每个学生对之前问题的回答，自适应地选择下一个信息量最大的问题/项目，有效地缩短了测试长度。现有的CAT方法使用项目反应理论（IRT）模型将学生的能力与他们对问题的反应联系起来，并使用静态问题选择算法来尽快减少能力估计误差；因此，这些算法无法通过从大规模学生反应数据中学习来改进。在本文中，我们提出了一个基于双层优化的框架BOBCAT，用于CAT直接从训练数据学习数据驱动的问题选择算法。BOBCAT对潜在的学生反应模型不可知，并且在自适应测试过程中计算效率高。通过对五个真实世界的学生反应数据集的广泛实验，我们表明BOBCAT在缩短测试长度方面优于现有的CAT方法（有时显著）。摘要：Computerized adaptive testing (CAT) refers to a form of tests that are personalized to every student/test taker. CAT methods adaptively select the next most informative question/item for each student given their responses to previous questions, effectively reducing test length. Existing CAT methods use item response theory (IRT) models to relate student ability to their responses to questions and static question selection algorithms designed to reduce the ability estimation error as quickly as possible; therefore, these algorithms cannot improve by learning from large-scale student response data. In this paper, we propose BOBCAT, a Bilevel Optimization-Based framework for CAT to directly learn a data-driven question selection algorithm from training data. BOBCAT is agnostic to the underlying student response model and is computationally efficient during the adaptive testing process. Through extensive experiments on five real-world student response datasets, we show that BOBCAT outperforms existing CAT methods (sometimes significantly) at reducing test length.

【24】 Generative Relation Linking for Question Answering over Knowledge Bases 标题：知识库问答中的产生式关系链接链接：https://arxiv.org/abs/2108.07337

作者：Gaetano Rossiello,Nandana Mihindukulasooriya,Ibrahim Abdelaziz,Mihaela Bornea,Alfio Gliozzo,Tahira Naseem,Pavan Kapanipathi 机构：IBM Research, T.J. Watson Research Center, Yorktown Heights, NY, USA 备注：Accepted at the 20th International Semantic Web Conference (ISWC 2021) 摘要：关系链接对于在知识库上回答问题至关重要。尽管有各种各样的努力来提高关系链接性能，但目前最先进的方法并没有达到最佳效果，因此，对整体端到端问答性能产生了负面影响。在这项工作中，我们提出了一种新的关系链接方法，将其作为一个生成问题来构建，以便于使用预先训练好的序列到序列模型。我们将这种序列模型扩展到序列模型，其思想是注入来自目标知识库的结构化数据，主要是使这些模型能够处理知识库的细微差别。此外，我们训练模型的目的是生成由参数关系对列表组成的结构化输出，从而实现知识验证步骤。我们将我们的方法与来自DBpedia和Wikidata的四个不同数据集上的现有关系链接系统进行了比较。我们的方法报告了与最新技术相比的巨大改进，同时使用了一个更简单的模型，可以轻松地适应不同的知识库。摘要：Relation linking is essential to enable question answering over knowledge bases. Although there are various efforts to improve relation linking performance, the current state-of-the-art methods do not achieve optimal results, therefore, negatively impacting the overall end-to-end question answering performance. In this work, we propose a novel approach for relation linking framing it as a generative problem facilitating the use of pre-trained sequence-to-sequence models. We extend such sequence-to-sequence models with the idea of infusing structured data from the target knowledge base, primarily to enable these models to handle the nuances of the knowledge base. Moreover, we train the model with the aim to generate a structured output consisting of a list of argument-relation pairs, enabling a knowledge validation step. We compared our method against the existing relation linking systems on four different datasets derived from DBpedia and Wikidata. Our method reports large improvements over the state-of-the-art while using a much simpler model that can be easily adapted to different knowledge bases.

【25】 Synthesizing Pareto-Optimal Interpretations for Black-Box Models 标题：综合黑箱模型的帕累托最优解释链接：https://arxiv.org/abs/2108.07307

作者：Hazem Torfah,Shetal Shah,Supratik Chakraborty,S. Akshay,Sanjit A. Seshia 机构：University of California at Berkeley, Indian Institute of Technology, Bombay 备注：Long version of conference paper accepted at FMCAD'21 摘要：我们提出了一种新的多目标优化方法，用于综合解释“解释”黑箱机器学习模型的行为。为黑箱模型构建人类可理解的解释通常需要平衡相互冲突的目标。对于人类来说，一个简单的解释可能更容易理解，而与复杂的解释相比，它的预测更不精确。现有的综合解释方法使用单一目标函数，并且通常针对单一类别的解释进行优化。相比之下，我们提供了一个更通用的多目标综合框架，允许用户选择（1）从中合成解释的语法模板类别，以及（2）解释正确性和解释性的定量度量。对于给定的黑盒，我们的方法产生了一组关于正确性和可解释性度量的帕累托最优解释。我们证明了基本的多目标优化问题可以通过简化为定量约束求解（如加权最大可满足性）来解决。为了证明我们的方法的优点，我们将其应用于黑盒神经网络分类器的综合解释。我们的实验表明，对于现有方法所遗漏的解释，往往存在着丰富多样的选择。摘要：We present a new multi-objective optimization approach for synthesizing interpretations that "explain" the behavior of black-box machine learning models. Constructing human-understandable interpretations for black-box models often requires balancing conflicting objectives. A simple interpretation may be easier to understand for humans while being less precise in its predictions vis-a-vis a complex interpretation. Existing methods for synthesizing interpretations use a single objective function and are often optimized for a single class of interpretations. In contrast, we provide a more general and multi-objective synthesis framework that allows users to choose (1) the class of syntactic templates from which an interpretation should be synthesized, and (2) quantitative measures on both the correctness and explainability of an interpretation. For a given black-box, our approach yields a set of Pareto-optimal interpretations with respect to the correctness and explainability measures. We show that the underlying multi-objective optimization problem can be solved via a reduction to quantitative constraint solving, such as weighted maximum satisfiability. To demonstrate the benefits of our approach, we have applied it to synthesize interpretations for black-box neural-network classifiers. Our experiments show that there often exists a rich and varied set of choices for interpretations that are missed by existing approaches.

【26】 Spatio-temporal Parking Behaviour Forecasting and Analysis Before and During COVID-19 标题：冠状病毒前后停车行为的时空预测与分析链接：https://arxiv.org/abs/2108.07731

作者：Shuhui Gong,Xiaopeng Mo,Rui Cao,Yu Liu,Wei Tu,Ruibin Bai 机构：School of Information Engineering, China University of Geosciences, Beijing, China, School of Computer Science, University of Nottingham Ningbo, Ningbo, China, Dept. of LSGI & SCRI, The Hong Kong Polytechnic, Hong Kong, China, Institute of Remote Sensing and 备注：DeepSpatial '21: 2nd ACM SIGKDD Workshop on Deep Learning for Spatiotemporal Data, Applications, and Systems (this https URL) 摘要：近年来，停车需求预测和行为分析受到越来越多的关注，因为它们在缓解交通拥堵和了解出行行为方面起着至关重要的作用。然而，以往的研究通常只考虑时间依赖性，而忽略停车预测的停车场之间的空间相关性。这主要是由于它们之间缺乏直接的物理联系或可观察到的相互作用。因此，如何量化空间相关性仍然是一个重大挑战。为了弥补这一差距，在本研究中，我们提出了一个空间感知停车预测框架，该框架包括两个步骤，即空间连接图构建和时空预测。在中国宁波的一个案例研究中，使用了在新冠肺炎爆发前和爆发期间超过100万条记录的停车数据。结果表明，该方法在停车占用预测方面优于基线方法，特别是对于具有高度时间不规则性的病例，如在新冠病毒-19期间。我们的工作揭示了大流行对停车行为的影响，并强调了停车行为预测中空间依赖性建模的重要性，这有助于未来流行病学和人类出行行为的研究。摘要：Parking demand forecasting and behaviour analysis have received increasing attention in recent years because of their critical role in mitigating traffic congestion and understanding travel behaviours. However, previous studies usually only consider temporal dependence but ignore the spatial correlations among parking lots for parking prediction. This is mainly due to the lack of direct physical connections or observable interactions between them. Thus, how to quantify the spatial correlation remains a significant challenge. To bridge the gap, in this study, we propose a spatial-aware parking prediction framework, which includes two steps, i.e. spatial connection graph construction and spatio-temporal forecasting. A case study in Ningbo, China is conducted using parking data of over one million records before and during COVID-19. The results show that the approach is superior on parking occupancy forecasting than baseline methods, especially for the cases with high temporal irregularity such as during COVID-19. Our work has revealed the impact of the pandemic on parking behaviour and also accentuated the importance of modelling spatial dependence in parking behaviour forecasting, which can benefit future studies on epidemiology and human travel behaviours.

【27】 SURFNet: Super-resolution of Turbulent Flows with Transfer Learning using Small Datasets 标题：SURFNet：使用小数据集进行传递学习的湍流超分辨率链接：https://arxiv.org/abs/2108.07667

作者：Octavi Obiols-Sales,Abhinav Vishnu,Nicholas Malaya,Aparna Chandramowlishwaran 机构：University of California, Irvine, Irvine, California, Advanced Micro Devices, Inc., Austin, Texas, rd Nicholas P. Malaya 摘要：深度学习（DL）算法正在成为计算成本高昂的CFD模拟的关键替代方案。然而，最先进的DL方法需要大量高分辨率的训练数据来学习精确的模型。此类数据集的大小和可用性是下一代湍流数据驱动替代模型开发的主要限制。介绍了一种基于迁移学习的超分辨率流网络SURFNet。SURFNet主要在低分辨率数据集上训练DL模型，并在少数高分辨率流动问题上学习transfer模型-加速传统数值解算器，而不依赖于输入大小。我们提出了两种用于超分辨率任务的迁移学习方法，即一次性学习和增量学习。这两种方法都需要在一个几何体上进行转移学习，以解释精细网格流场，与粗略模型的微小分辨率（64x256）相比，需要在高分辨率输入上少15倍的训练数据，从而显著减少数据收集和训练的时间。我们通过求解湍流区域的Navier-Stokes方程，在输入分辨率高达粗模型256倍的情况下，对SURFNet的性能进行了经验评估。在四个测试几何体和八个训练期间未看到的流动配置上，我们观察到OpenFOAM物理解算器的一致加速比为2-2.1x，与测试几何体和分辨率大小无关（高达2048x2048），证明了分辨率不变性和泛化能力。我们的方法解决了从使用低分辨率输入（超分辨率）训练的粗网格模型重建高分辨率解决方案的挑战，而不损失精度，并且需要有限的计算资源。摘要：Deep Learning (DL) algorithms are emerging as a key alternative to computationally expensive CFD simulations. However, state-of-the-art DL approaches require large and high-resolution training data to learn accurate models. The size and availability of such datasets are a major limitation for the development of next-generation data-driven surrogate models for turbulent flows. This paper introduces SURFNet, a transfer learning-based super-resolution flow network. SURFNet primarily trains the DL model on low-resolution datasets and transfer learns the model on a handful of high-resolution flow problems - accelerating the traditional numerical solver independent of the input size. We propose two approaches to transfer learning for the task of super-resolution, namely one-shot and incremental learning. Both approaches entail transfer learning on only one geometry to account for fine-grid flow fields requiring 15x less training data on high-resolution inputs compared to the tiny resolution (64x256) of the coarse model, significantly reducing the time for both data collection and training. We empirically evaluate SURFNet's performance by solving the Navier-Stokes equations in the turbulent regime on input resolutions up to 256x larger than the coarse model. On four test geometries and eight flow configurations unseen during training, we observe a consistent 2-2.1x speedup over the OpenFOAM physics solver independent of the test geometry and the resolution size (up to 2048x2048), demonstrating both resolution-invariance and generalization capabilities. Our approach addresses the challenge of reconstructing high-resolution solutions from coarse grid models trained using low-resolution inputs (super-resolution) without loss of accuracy and requiring limited computational resources.

【28】 InfoGram and Admissible Machine Learning 标题：信息图与容许机器学习链接：https://arxiv.org/abs/2108.07380

作者：Subhadeep Mukhopadhyay 备注：Keywords: Admissible machine learning; InfoGram; L-Features; Information-theory; ALFA-testing, Algorithmic risk management; Fairness; Interpretability; COREml; FINEml 摘要：我们已经进入了一个机器学习（ML）的新时代，在这个时代，具有卓越预测能力的最精确算法甚至可能无法部署，除非它在监管约束下是可接受的。这引起了人们对开发公平、透明和可信的ML方法的极大兴趣。本文的目的是介绍一种新的信息理论学习框架（可接受的机器学习）和算法风险管理工具（信息图、L特征、阿尔法测试），可以指导分析师重新设计现成的ML方法，使其符合监管要求，同时保持良好的预测准确性。我们使用了来自金融部门、生物医学研究、营销活动和刑事司法系统的几个真实数据示例来说明我们的方法。摘要：We have entered a new era of machine learning (ML), where the most accurate algorithm with superior predictive power may not even be deployable, unless it is admissible under the regulatory constraints. This has led to great interest in developing fair, transparent and trustworthy ML methods. The purpose of this article is to introduce a new information-theoretic learning framework (admissible machine learning) and algorithmic risk-management tools (InfoGram, L-features, ALFA-testing) that can guide an analyst to redesign off-the-shelf ML methods to be regulatory compliant, while maintaining good prediction accuracy. We have illustrated our approach using several real-data examples from financial sectors, biomedical research, marketing campaigns, and the criminal justice system.

本文参与腾讯云自媒体同步曝光计划，分享自微信公众号。

原始发表：2021-08-18，如有侵权请联系 cloudcommunity@tencent.com 删除

linux