Update!H5支持摘要折叠,体验更佳!点击阅读原文访问arxivdaily.com,涵盖CS|物理|数学|经济|统计|金融|生物|电气领域,更有搜索、收藏等功能!
cs.LG 方向,今日共计242篇
Graph相关(图学习|图神经网络|图优化等)(16篇)
【1】 Beltrami Flow and Neural Diffusion on Graphs 标题:图上的Beltrami流与神经扩散 链接:https://arxiv.org/abs/2110.09443
作者:Benjamin Paul Chamberlain,James Rowbottom,Davide Eynard,Francesco Di Giovanni,Xiaowen Dong,Michael M Bronstein 机构:University of Oxford, Michael M. Bronstein, Twitter Inc. and Imperial College London 备注:21 pages, 5 figures. Proceedings of the Thirty-fifth Conference on Neural Information Processing Systems (NeurIPS) 2021 摘要:我们提出了一类新的基于离散Beltrami流的图神经网络,这是一种非欧几里德扩散偏微分方程。在我们的模型中,节点特征补充了从图拓扑派生的位置编码,并由Beltrami流联合进化,同时产生连续的特征学习和拓扑进化。由此产生的模型推广了许多流行的图形神经网络,并在几个基准上实现了最先进的结果。 摘要:We propose a novel class of graph neural networks based on the discretised Beltrami flow, a non-Euclidean diffusion PDE. In our model, node features are supplemented with positional encodings derived from the graph topology and jointly evolved by the Beltrami flow, producing simultaneously continuous feature learning and topology evolution. The resulting model generalises many popular graph neural networks and achieves state-of-the-art results on several benchmarks.
【2】 Intrusion-Free Graph Mixup 标题:无入侵图形混合 链接:https://arxiv.org/abs/2110.09344
作者:Hongyu Guo,Yongyi Mao 机构:National Research Council Canada, Montreal Road, Ottawa, School of Electrical Engineering & Computer Science, University of Ottawa, Ottawa, Ontario 摘要:我们提出了一种简单而有效的基于插值的正则化技术来提高图神经网络(GNN)的泛化能力。我们利用视觉和文本混合正则化器的最新进展,其中对随机样本对及其标签进行插值以创建用于训练的合成样本。与采用网格或线性序列格式的图像或自然句子不同,图形具有任意的结构和拓扑,对图形的语义信息起着至关重要的作用。因此,即使只是简单地从一个图中删除或添加一条边,也可以极大地改变其语义。这使得插值图输入非常具有挑战性,因为混合随机图对可能会自然地创建具有相同结构但具有不同标签的图,从而导致流形入侵问题。为了克服这一障碍,我们提出了第一种用于图上混合的输入混合模式。我们从理论上证明了我们的混合策略可以从混合图中恢复源图,并保证混合图是流形无入侵的。我们还实证表明,我们的方法可以有效地正则化图分类学习,从而比流行的图扩充基线具有更高的预测精度。 摘要:We present a simple and yet effective interpolation-based regularization technique to improve the generalization of Graph Neural Networks (GNNs). We leverage the recent advances in Mixup regularizer for vision and text, where random sample pairs and their labels are interpolated to create synthetic samples for training. Unlike images or natural sentences, which embrace a grid or linear sequence format, graphs have arbitrary structure and topology, which play a vital role on the semantic information of a graph. Consequently, even simply deleting or adding one edge from a graph can dramatically change its semantic meanings. This makes interpolating graph inputs very challenging because mixing random graph pairs may naturally create graphs with identical structure but with different labels, causing the manifold intrusion issue. To cope with this obstacle, we propose the first input mixing schema for Mixup on graph. We theoretically prove that our mixing strategy can recover the source graphs from the mixed graph, and guarantees that the mixed graphs are manifold intrusion free. We also empirically show that our method can effectively regularize the graph classification learning, resulting in superior predictive accuracy over popular graph augmentation baselines.
【3】 pygrank: A Python Package for Graph Node Ranking 标题:Pygrank:一个用于图节点排序的Python包 链接:https://arxiv.org/abs/2110.09274
作者:Emmanouil Krasanakis,Symeon Papadopoulos,Ioannis Kompatsiaris,Andreas Symeonidis 机构:Centre for Research and Technology—Hellas, Aristotle University of Thessaloniki, Editor: This is an author preprint 备注:6 pages, 1 figure, 2 tables, 3 code snippets 摘要:我们介绍pygrank,一个开源Python包,用于定义、运行和评估节点排序算法。我们提供面向对象和广泛的单元测试算法组件,如图过滤器、后处理器、度量、基准测试和在线调优。计算可以委托给numpy、tensorflow或pytorch后端,并适合反向传播管道。类可以组合起来定义可互操作的复杂算法。在本文的上下文中,我们将该包与相关的备选方案进行比较,并通过代码示例演示其灵活性和易用性。 摘要:We introduce pygrank, an open source Python package to define, run and evaluate node ranking algorithms. We provide object-oriented and extensively unit-tested algorithm components, such as graph filters, post-processors, measures, benchmarks and online tuning. Computations can be delegated to numpy, tensorflow or pytorch backends and fit in back-propagation pipelines. Classes can be combined to define interoperable complex algorithms. Within the context of this paper we compare the package with related alternatives and demonstrate its flexibility and ease of use with code examples.
【4】 Graph Partner Neural Networks for Semi-Supervised Learning on Graphs 标题:图伙伴神经网络在图的半监督学习中的应用 链接:https://arxiv.org/abs/2110.09182
作者:Langzhang Liang,Cuiyun Gao,Shiyi Chen,Shishi Duan,Yu pan,Junjin Zheng,Lei Wang,Zenglin Xu 机构:Harbin Institute of Technology, Shenzhen, China, NetEase 摘要:图卷积网络(GCN)是处理图结构数据的强大工具,在节点分类、链路预测和图分类等任务中取得了最先进的性能。然而,深度GCN不可避免地会遇到过度平滑问题,即在重复的图卷积操作后,节点的表示往往无法区分。为了解决这个问题,我们提出了图伙伴神经网络(GPNN),它包含一个非参数化GCN和一个参数共享MLP。我们提供了经验和理论证据,以证明拟议的MLP合作伙伴在解决过度平滑问题时的有效性,同时受益于适当的平滑度。为了进一步解决过度平滑和调节学习过程,我们引入了精心设计的一致性对比损失和KL发散损失。此外,我们还提出了一种图增强技术来提高图中边的整体质量。虽然大多数GCN只能用于浅层结构,但GPNN可以通过增加模型深度获得更好的结果。在各种节点分类任务上的实验证明了GPNN的最新性能。同时,进行了广泛的烧蚀研究,以调查每个组件在解决过度平滑和提高性能方面的贡献。 摘要:Graph Convolutional Networks (GCNs) are powerful for processing graph-structured data and have achieved state-of-the-art performance in several tasks such as node classification, link prediction, and graph classification. However, it is inevitable for deep GCNs to suffer from an over-smoothing issue that the representations of nodes will tend to be indistinguishable after repeated graph convolution operations. To address this problem, we propose the Graph Partner Neural Network (GPNN) which incorporates a de-parameterized GCN and a parameter-sharing MLP. We provide empirical and theoretical evidence to demonstrate the effectiveness of the proposed MLP partner on tackling over-smoothing while benefiting from appropriate smoothness. To further tackle over-smoothing and regulate the learning process, we introduce a well-designed consistency contrastive loss and KL divergence loss. Besides, we present a graph enhancement technique to improve the overall quality of edges in graphs. While most GCNs can work with shallow architecture only, GPNN can obtain better results through increasing model depth. Experiments on various node classification tasks have demonstrated the state-of-the-art performance of GPNN. Meanwhile, extensive ablation studies are conducted to investigate the contributions of each component in tackling over-smoothing and improving performance.
【5】 Capsule Graph Neural Networks with EM Routing 标题:基于EM路由的胶囊图神经网络 链接:https://arxiv.org/abs/2110.09039
作者:Yu Lei,Jing Zhang 机构:School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, Jiangsu, China 摘要:为了有效地对图实例进行分类,图神经网络需要具有捕获图中存在的部分-整体关系的能力。胶囊是一组代表实体复杂属性的神经元,在传统的卷积神经网络中显示出其优越性。本文提出了一种新的胶囊图神经网络,它使用EM路由机制(CapsGNNEM)生成高质量的图嵌入。在大量真实图形数据集上的实验结果表明,所提出的CapsGNNEM在图形分类任务中优于九种最先进的模型。 摘要:To effectively classify graph instances, graph neural networks need to have the capability to capture the part-whole relationship existing in a graph. A capsule is a group of neurons representing complicated properties of entities, which has shown its advantages in traditional convolutional neural networks. This paper proposed novel Capsule Graph Neural Networks that use the EM routing mechanism (CapsGNNEM) to generate high-quality graph embeddings. Experimental results on a number of real-world graph datasets demonstrate that the proposed CapsGNNEM outperforms nine state-of-the-art models in graph classification tasks.
【6】 Temporal Knowledge Graph Reasoning Triggered by Memories 标题:记忆触发的时态知识图推理 链接:https://arxiv.org/abs/2110.08765
作者:Mengnan Zhao,Lihe Zhang,Yuqiu Kong,Baocai Yin 摘要:推断时态知识图中缺失的事实是一项关键任务,已经得到了广泛的探索。时间推理任务中的外推更具挑战性,并逐渐引起研究者的注意,因为没有直接的历史事实进行预测。以前的工作试图应用进化表示学习来解决外推问题。然而,这些技术没有明确利用各种时间感知属性表示,即推理性能受到历史长度的显著影响。为了减轻推理未来缺失事实时的时间依赖性,我们提出了一种记忆触发决策(MTDM)网络,该网络包含瞬时记忆、长-短期记忆和深度记忆。具体而言,瞬态学习网络将瞬态记忆视为静态知识图,时间感知的循环进化网络通过一系列循环进化单元从长-短期记忆中学习表示。每个演化单元包括一个用于聚合边缘信息的结构编码器,一个带有选通单元的时间编码器,用于更新实体的属性表示。MTDM利用特制的剩余多关系聚合器作为结构编码器来解决多跳覆盖问题。我们还引入了溶解学习约束,以便更好地理解事件溶解过程。大量实验表明,MTDM减轻了历史依赖性,实现了最先进的预测性能。此外,与最先进的基线相比,MTDM具有更快的收敛速度和训练速度。 摘要:Inferring missing facts in temporal knowledge graphs is a critical task and has been widely explored. Extrapolation in temporal reasoning tasks is more challenging and gradually attracts the attention of researchers since no direct history facts for prediction. Previous works attempted to apply evolutionary representation learning to solve the extrapolation problem. However, these techniques do not explicitly leverage various time-aware attribute representations, i.e. the reasoning performance is significantly affected by the history length. To alleviate the time dependence when reasoning future missing facts, we propose a memory-triggered decision-making (MTDM) network, which incorporates transient memories, long-short-term memories, and deep memories. Specifically, the transient learning network considers transient memories as a static knowledge graph, and the time-aware recurrent evolution network learns representations through a sequence of recurrent evolution units from long-short-term memories. Each evolution unit consists of a structural encoder to aggregate edge information, a time encoder with a gating unit to update attribute representations of entities. MTDM utilizes the crafted residual multi-relational aggregator as the structural encoder to solve the multi-hop coverage problem. We also introduce the dissolution learning constraint for better understanding the event dissolution process. Extensive experiments demonstrate the MTDM alleviates the history dependence and achieves state-of-the-art prediction performance. Moreover, compared with the most advanced baseline, MTDM shows a faster convergence speed and training speed.
【7】 Adapting Membership Inference Attacks to GNN for Graph Classification: Approaches and Implications 标题:将隶属度推理攻击适应GNN进行图分类:方法和启示 链接:https://arxiv.org/abs/2110.08760
作者:Bang Wu,Xiangwen Yang,Shirui Pan,Xingliang Yuan 机构:Monash University, Melbourne, Australia 备注:The short version of this paper has been published in the IEEE International Conference on Data Mining (ICDM) 2021 摘要:图形神经网络(GNNs)被广泛用于分析非欧几里德数据,如化学网络、大脑网络和社会网络,模拟对象之间的复杂关系和相互依赖关系。最近,针对GNN的成员推断攻击(MIA)引起了严重的隐私问题,训练数据可能会从训练过的GNN模型中泄漏。然而,以前的研究只关注于推断图中组件的成员关系,例如单个节点或边。如何推断整个图形记录的成员身份还有待探索。在本文中,我们在针对GNNs的MIA中迈出了图形级分类的第一步。我们的目标是推断图形样本是否用于训练GNN模型。我们提出并实现了两种类型的攻击,即基于训练的攻击和来自不同对抗能力的基于阈值的攻击。我们使用五个具有代表性的GNN模型在七个真实数据集中进行综合实验,以评估我们的攻击。我们的两种攻击都被证明是有效的,并且可以实现高性能,即在大多数情况下达到0.7以上的攻击F1分数。此外,我们还分析了MIA对GNNs的影响。我们的研究结果证实,GNNs比具有非图结构的模型更容易受到MIA的影响。与节点级分类器不同,图级分类任务上的MIAs更多地与GNNs的过拟合程度相关,而不是与训练图的统计特性相关。 摘要:Graph Neural Networks (GNNs) are widely adopted to analyse non-Euclidean data, such as chemical networks, brain networks, and social networks, modelling complex relationships and interdependency between objects. Recently, Membership Inference Attack (MIA) against GNNs raises severe privacy concerns, where training data can be leaked from trained GNN models. However, prior studies focus on inferring the membership of only the components in a graph, e.g., an individual node or edge. How to infer the membership of an entire graph record is yet to be explored. In this paper, we take the first step in MIA against GNNs for graph-level classification. Our objective is to infer whether a graph sample has been used for training a GNN model. We present and implement two types of attacks, i.e., training-based attacks and threshold-based attacks from different adversarial capabilities. We perform comprehensive experiments to evaluate our attacks in seven real-world datasets using five representative GNN models. Both our attacks are shown effective and can achieve high performance, i.e., reaching over 0.7 attack F1 scores in most cases. Furthermore, we analyse the implications behind the MIA against GNNs. Our findings confirm that GNNs can be even more vulnerable to MIA than the models with non-graph structures. And unlike the node-level classifier, MIAs on graph-level classification tasks are more co-related with the overfitting level of GNNs rather than the statistic property of their training graphs.
【8】 Graph-less Neural Networks: Teaching Old MLPs New Tricks via Distillation 标题:无图神经网络:通过蒸馏教授旧的MLP新技巧 链接:https://arxiv.org/abs/2110.08727
作者:Shichang Zhang,Yozen Liu,Yizhou Sun,Neil Shah 机构:University of California, Los Angeles, Snap Inc. 摘要:图神经网络(GNNs)近年来在图机学习中得到了广泛的应用,并在宽节点分类任务中取得了良好的效果。然而,由于数据依赖性带来的可扩展性挑战,GNN在业界的实际部署中不太受欢迎。也就是说,GNN推断依赖于距离目标多跳的邻居节点,获取这些节点会给延迟受限的应用程序带来负担。现有的推理加速方法,如剪枝和量化,通过减少乘法和累加(MAC)运算,可以在一定程度上加速GNN。然而,由于数据依赖性没有得到解决,它们的改进是有限的。相反,多层感知器(MLP)对图形数据没有依赖性,推理速度比GNNs快得多,尽管它们在节点分类方面通常不如GNNs准确。基于这些互补的优势和劣势,我们通过知识提炼(KD)将GNN和MLP结合在一起。我们的工作表明,使用GNN KD可以大幅度提高MLPs的性能。我们称提取的MLPs无图神经网络(GLNNs),因为它们没有推理图依赖性。我们表明,具有竞争性能的GLNN推断速度比GNNs快146X-273X,比其他加速方法快14X-27X。同时,在涉及7个数据集的转换和归纳预测的生产环境下,GLNN的准确度比独立MLP平均提高12.36%,并与6/7数据集的GNN相匹配。对GLNN的全面分析显示了GLNN何时以及为什么能够取得与GNNs竞争的结果,并建议GLNN作为延迟受限应用程序的便捷选择。 摘要:Graph Neural Networks (GNNs) have recently become popular for graph machine learning and have shown great results on wide node classification tasks. Yet, GNNs are less popular for practical deployments in the industry owing to their scalability challenges incurred by data dependency. Namely, GNN inference depends on neighbor nodes multiple hops away from the target, and fetching these nodes burdens latency-constrained applications. Existing inference acceleration methods like pruning and quantization can speed up GNNs to some extent by reducing Multiplication-and-ACcumulation (MAC) operations. However, their improvements are limited given the data dependency is not resolved. Conversely, multi-layer perceptrons (MLPs) have no dependency on graph data and infer much faster than GNNs, even though they are less accurate than GNNs for node classification in general. Motivated by these complementary strengths and weaknesses, we bring GNNs and MLPs together via knowledge distillation (KD). Our work shows that the performance of MLPs can be improved by large margins with GNN KD. We call the distilled MLPs Graph-less Neural Networks (GLNNs) as they have no inference graph dependency. We show that GLNN with competitive performance infer faster than GNNs by 146X-273X and faster than other acceleration methods by 14X-27X. Meanwhile, under a production setting involving both transductive and inductive predictions across 7 datasets, GLNN accuracies improve over stand alone MLPs by 12.36% on average and match GNNs on 6/7 datasets. A comprehensive analysis of GLNN shows when and why GLNN can achieve competitive results to GNNs and suggests GLNN as a handy choice for latency-constrained applications.
【9】 Deep Learning and Spectral Embedding for Graph Partitioning 标题:深度学习和谱嵌入在图划分中的应用 链接:https://arxiv.org/abs/2110.08614
作者:Alice Gatti,Zhixiong Hu,Tess Smidt,Esmond G. Ng,Pieter Ghysels 摘要:提出了一种基于图神经网络的图二分和划分算法。对于图中的每个节点,网络输出每个分区的概率。图神经网络由两个模块组成:嵌入阶段和划分阶段。嵌入阶段首先由谱图理论启发,通过最小化损失函数进行训练。分区模块通过一个与归一化割的期望值相对应的损失函数进行训练。神经网络的两个部分都依赖于SAGE卷积层和使用重边匹配的图形粗化。神经网络的多级结构受到多重网格算法的启发。我们的方法可以很好地推广到更大的图,并且具有与METIS、Scotch和Spectrum分区相当的分区质量,与METIS和Spectrum分区相比运行时间更短。 摘要:We present a graph bisection and partitioning algorithm based on graph neural networks. For each node in the graph, the network outputs probabilities for each of the partitions. The graph neural network consists of two modules: an embedding phase and a partitioning phase. The embedding phase is trained first by minimizing a loss function inspired by spectral graph theory. The partitioning module is trained through a loss function that corresponds to the expected value of the normalized cut. Both parts of the neural network rely on SAGE convolutional layers and graph coarsening using heavy edge matching. The multilevel structure of the neural network is inspired by the multigrid algorithm. Our approach generalizes very well to bigger graphs and has partition quality comparable to METIS, Scotch and spectral partitioning, with shorter runtime compared to METIS and spectral partitioning.
【10】 Dynamic Graph Echo State Networks 标题:动态图回声状态网络 链接:https://arxiv.org/abs/2110.08565
作者:Domenico Tortorella,Alessio Micheli 机构:- University of Pisa - Department of Computer Science, Largo B. Pontecorvo , Pisa - Italy 备注:Accepted for oral presentation at ESANN 2021 摘要:动态时态图表示实体之间不断演变的关系,例如社交网络用户之间的交互或感染传播。我们提出了一种图回声状态网络的扩展,用于有效处理动态时态图,并给出了其回声状态特性的充分条件,以及水库布局影响的实验分析。与需要保存顶点交互的整个历史的时态图内核相比,我们的模型为动态图提供了向量编码,在每个时间步更新,而无需训练。实验表明,在12个传播过程分类任务上,其精度与近似时态图核相当。 摘要:Dynamic temporal graphs represent evolving relations between entities, e.g. interactions between social network users or infection spreading. We propose an extension of graph echo state networks for the efficient processing of dynamic temporal graphs, with a sufficient condition for their echo state property, and an experimental analysis of reservoir layout impact. Compared to temporal graph kernels that need to hold the entire history of vertex interactions, our model provides a vector encoding for the dynamic graph that is updated at each time-step without requiring training. Experiments show accuracy comparable to approximate temporal graph kernels on twelve dissemination process classification tasks.
【11】 A Heterogeneous Graph Based Framework for Multimodal Neuroimaging Fusion Learning 标题:一种基于异构图的多模态神经影像融合学习框架 链接:https://arxiv.org/abs/2110.08465
作者:Gen Shi,Yifan Zhu,Wenjin Liu,Xuesong Li 机构:the School of Computer Science and Technology, Beijing Institute of Technology, Beijing, China, the Department of Internal Medicine, University of Utah, SALT LAKE CITY, Utah , USA 摘要:在此,我们提出了一种用于多模态神经影像融合学习(HGM)的异构图神经网络。传统的基于GNN的模型通常假设大脑网络是一个具有单一类型节点和边的齐次图。然而,大量的文献显示了人脑的异质性,特别是在两个半球之间。同质脑网络不足以模拟复杂的脑状态。因此,在这项工作中,我们首先将大脑网络建模为具有多类型节点(即左半球和右半球节点)和多类型边缘(即半球内和半球间边缘)的异构图。此外,我们还提出了一种基于异质脑网络的自监督预训练策略来解决模型复杂、样本量小的过度拟合问题。我们在两个数据集上的结果表明,该模型在疾病预测任务中优于其他多模态方法。此外,烧蚀实验表明,采用预训练策略的模型可以缓解训练样本量有限的问题。 摘要:Here, we present a Heterogeneous Graph neural network for Multimodal neuroimaging fusion learning (HGM). Traditional GNN-based models usually assume the brain network is a homogeneous graph with single type of nodes and edges. However, vast literatures have shown the heterogeneity of the human brain especially between the two hemispheres. Homogeneous brain network is insufficient to model the complicated brain state. Therefore, in this work we firstly model the brain network as heterogeneous graph with multi-type nodes (i.e., left and right hemispheric nodes) and multi-type edges (i.e., intra- and inter-hemispheric edges). Besides, we also propose a self-supervised pre-training strategy based on heterogeneou brain network to address the overfitting problem due to the complex model and small sample size. Our results on two datasets show the superiority of proposed model over other multimodal methods for disease prediction task. Besides, ablation experiments show that our model with pre-training strategy can alleviate the problem of limited training sample size.
【12】 Accelerating Training and Inference of Graph Neural Networks with Fast Sampling and Pipelining 标题:用快速采样和流水线加速图神经网络的训练和推理 链接:https://arxiv.org/abs/2110.08450
作者:Tim Kaler,Nickolas Stathas,Anne Ouyang,Alexandros-Stavros Iliopoulos,Tao B. Schardl,Charles E. Leiserson,Jie Chen 摘要:提高图神经网络(GNNs)的训练和推理性能面临着一个在一般神经网络中不常见的挑战:由于多跳图邻域沿网络层呈指数增长,创建小批量需要大量计算和数据移动。这种独特的挑战带来了一系列不同的系统设计选择。我们主张在分布式多GPU环境中使用邻域采样进行小批量训练,在此环境下,我们确定了开发人员迄今未充分探索的主要性能瓶颈:小批量准备和传输。我们提出了一系列的改进来缓解这些瓶颈,包括性能工程邻域采样器、共享内存并行化策略以及使用GPU计算的批处理传输流水线。我们还进行了实证分析,支持使用抽样进行推理,表明测试精度并未受到重大影响。这样的观察将训练和推理结合起来,简化了模型的实现。我们报告了多个基准数据集和GNN体系结构的综合实验结果,包括一个演示,即对于ogbn-papers100M数据集,我们的系统在使用单个GPU的标准Pyrotch几何实现的基础上实现了3倍的加速,在使用16个GPU的基础上进一步实现了8倍的并行加速。其中,使用抽样扇出(15,10,5)训练3层图形图像模型每历元需要2.0秒,使用扇出(20,20,20)进行推理需要2.4秒,达到测试精度64.58%。 摘要:Improving the training and inference performance of graph neural networks (GNNs) is faced with a challenge uncommon in general neural networks: creating mini-batches requires a lot of computation and data movement due to the exponential growth of multi-hop graph neighborhoods along network layers. Such a unique challenge gives rise to a diverse set of system design choices. We argue in favor of performing mini-batch training with neighborhood sampling in a distributed multi-GPU environment, under which we identify major performance bottlenecks hitherto under-explored by developers: mini-batch preparation and transfer. We present a sequence of improvements to mitigate these bottlenecks, including a performance-engineered neighborhood sampler, a shared-memory parallelization strategy, and the pipelining of batch transfer with GPU computation. We also conduct an empirical analysis that supports the use of sampling for inference, showing that test accuracies are not materially compromised. Such an observation unifies training and inference, simplifying model implementation. We report comprehensive experimental results with several benchmark data sets and GNN architectures, including a demonstration that, for the ogbn-papers100M data set, our system SALIENT achieves a speedup of 3x over a standard PyTorch-Geometric implementation with a single GPU and a further 8x parallel speedup with 16 GPUs. Therein, training a 3-layer GraphSAGE model with sampling fanout (15, 10, 5) takes 2.0 seconds per epoch and inference with fanout (20, 20, 20) takes 2.4 seconds, attaining test accuracy 64.58%.
【13】 Self-supervised Contrastive Attributed Graph Clustering 标题:自监督对比属性图聚类 链接:https://arxiv.org/abs/2110.08264
作者:Wei Xia,Quanxue Gao,Ming Yang,Xinbo Gao 机构:Xidian University, Westfield State University, Chongqing University of Posts and Telecommunications 摘要:属性图聚类是图分析中的一项基本而富有挑战性的任务,它从节点属性和拓扑图中学习节点表示进行聚类。最近,基于图对比学习(GCL)的方法在这项任务上取得了令人印象深刻的聚类性能。然而,我们观察到现有的基于GCL的方法1)不能从不精确的聚类标签中获益;2) 需要进行后处理操作以获取群集标签;3) 无法解决样本外(OOS)问题。为了解决这些问题,我们提出了一种新的属性图聚类网络,即自监督对比属性图聚类(SCAGC)。在SCAGC中,通过利用不准确的聚类标签,设计了一种自我监督的对比损失,其目的是最大化簇内节点的相似性,同时最小化簇间节点的相似性,用于节点表示学习。同时,构建了聚类模块,通过对比不同聚类的表示,直接输出聚类标签。因此,对于OOS节点,SCAGC可以直接计算它们的集群标签。在四个基准数据集上的大量实验结果表明,SCAGC始终优于11种竞争聚类方法。 摘要:Attributed graph clustering, which learns node representation from node attribute and topological graph for clustering, is a fundamental but challenging task for graph analysis. Recently, methods based on graph contrastive learning (GCL) have obtained impressive clustering performance on this task. Yet, we observe that existing GCL-based methods 1) fail to benefit from imprecise clustering labels; 2) require a post-processing operation to get clustering labels; 3) cannot solve out-of-sample (OOS) problem. To address these issues, we propose a novel attributed graph clustering network, namely Self-supervised Contrastive Attributed Graph Clustering (SCAGC). In SCAGC, by leveraging inaccurate clustering labels, a self-supervised contrastive loss, which aims to maximize the similarities of intra-cluster nodes while minimizing the similarities of inter-cluster nodes, are designed for node representation learning. Meanwhile, a clustering module is built to directly output clustering labels by contrasting the representation of different clusters. Thus, for the OOS nodes, SCAGC can directly calculate their clustering labels. Extensive experimental results on four benchmark datasets have shown that SCAGC consistently outperforms 11 competitive clustering methods.
【14】 SGEN: Single-cell Sequencing Graph Self-supervised Embedding Network 标题:SGen:单细胞测序图自监督嵌入网络 链接:https://arxiv.org/abs/2110.09413
作者:Ziyi Liu,Minghui Liao,Fulin luo,Bo Du 机构:National Engineering Research Center for Multimedia Software, Institute of Artificial Intelligence, School of Computer Science, and Hubei Key Laboratory of Multimedia and Network Communication, Engineering, Wuhan University, Wuhan, China 备注:6 pages body + 2 pages reference 摘要:单细胞测序在探索胚胎发育、癌症进化和细胞分化等生物学过程中具有重要作用。这些生物学特性可用二维散点图表示。然而,单细胞测序数据通常具有很高的维数。因此,应使用降维来处理高维测序数据,以进行二维可视化和后续生物学分析。传统的维数约简方法不考虑单细胞序列数据的结构特征,难以揭示二维表示中的数据结构。在本文中,我们开发了一种基于图卷积网络(GCN)的二维特征表示方法,用于单细胞数据的可视化,称为单细胞序列图嵌入网络(SGEN)。该方法利用细胞间的相似关系构造图,采用GCN分析样本的邻域嵌入信息,使相似细胞在二维散点图上更接近。结果表明,SGEN具有明显的二维分布,保持了不同细胞之间的高维关系。同时,相似的细胞簇具有空间连续性,而不是严重依赖随机初始化,这可以在散点图中反映细胞的发展轨迹。 摘要:Single-cell sequencing has a significant role to explore biological processes such as embryonic development, cancer evolution, and cell differentiation. These biological properties can be presented by a two-dimensional scatter plot. However, single-cell sequencing data generally has very high dimensionality. Therefore, dimensionality reduction should be used to process the high dimensional sequencing data for 2D visualization and subsequent biological analysis. The traditional dimensionality reduction methods, which do not consider the structure characteristics of single-cell sequencing data, are difficult to reveal the data structure in the 2D representation. In this paper, we develop a 2D feature representation method based on graph convolutional networks (GCN) for the visualization of single-cell data, termed single-cell sequencing graph embedding networks (SGEN). This method constructs the graph by the similarity relationship between cells and adopts GCN to analyze the neighbor embedding information of samples, which makes the similar cell closer to each other on the 2D scatter plot. The results show SGEN achieves obvious 2D distribution and preserves the high-dimensional relationship of different cells. Meanwhile, similar cell clusters have spatial continuity rather than relying heavily on random initialization, which can reflect the trajectory of cell development in this scatter plot.
【15】 Graph-based Local Climate Classification in Iran 标题:基于图形的伊朗当地气候分类 链接:https://arxiv.org/abs/2110.09209
作者:Neda Akrami,Koorush Ziarati,Soumyabrata Dev 机构:|, Department of Engineering and Computer, Science and Information Technology, Shiraz, ADAPT SFI Research Centre, University, College Dublin, Ireland, Correspondence, and Computer Science and Information, Technology, Shiraz University, Shiraz, Iran 备注:Accepted in International Journal of Climatology, 2021 摘要:在这篇文章中,我们介绍了一种新的基于图的方法来对一个区域内气候相似的区域进行分类。我们将我们提出的方法称为基于图划分的方法(GPBM)。我们提出的方法试图克服现有文献中最先进方法的缺点。它对可以使用的变量数量没有限制,同时也保留了气候数据的性质。为了说明我们提出的算法的能力,我们用其他最先进的气候分类技术对其性能进行了测试。气候数据来自伊朗南部法尔斯省的24个天气观测站。数据包括1951年至2017年作为时间序列存储的七个气候变量。我们的结果表明,我们提出的方法以较少的计算时间实现了更真实的气候分类。它可以在气候分类过程中保存更多信息,因此在进一步的数据分析中是有效的。此外,使用我们的方法,我们可以引入季节图来更好地研究季节性气候变化。据我们所知,我们提出的方法是第一个基于图形的气候分类系统。 摘要:In this paper, we introduce a novel graph-based method to classify the regions with similar climate in a local area. We refer our proposed method as Graph Partition Based Method (GPBM). Our proposed method attempts to overcome the shortcomings of the current state-of-the-art methods in the literature. It has no limit on the number of variables that can be used and also preserves the nature of climate data. To illustrate the capability of our proposed algorithm, we benchmark its performance with other state-of-the-art climate classification techniques. The climate data is collected from 24 synoptic stations in Fars province in southern Iran. The data includes seven climate variables stored as time series from 1951 to 2017. Our results exhibit that our proposed method performs a more realistic climate classification with less computational time. It can save more information during the climate classification process and is therefore efficient in further data analysis. Furthermore, using our method, we can introduce seasonal graphs to better investigate seasonal climate changes. To the best of our knowledge, our proposed method is the first graph-based climate classification system.
【16】 On Model Selection Consistency of Lasso for High-Dimensional Ising Models on Tree-like Graphs 标题:关于树形图上高维Ising模型的套索模型选择一致性 链接:https://arxiv.org/abs/2110.08500
作者:Xiangming Meng,Tomoyuki Obuchi,Yoshiyuki Kabashima 机构:Institute for Physics of Intelligence and Department of Physics, Graduate School, of Science, The University of Tokyo, Tokyo, Japan, Department of Systems Science, Graduate School of Informatics, Kyoto University, Kyoto, Japan 备注:30 pages, 4 figures 摘要:考虑基于邻域最小绝对收缩和选择算子(LASSO)的高维Ising模型选择问题。严格证明了在伊辛模型总体协方差矩阵上的一些弱相干条件下,对于顺磁相的任何树状图,样本大小为$n=\Omega{(d^3\log{p}}}$,其中$p$是变量数,$d$是最大节点度,可以实现一致的模型选择。当对样本协方差矩阵直接施加相同的条件时,表明减小的样本大小$n=\Omega{(d^2\log{p}}}$就足够了。所获得的与Lasso一致的模型选择的充分条件在样本复杂度的标度上与$\ellu 1$正则化logistic回归相同。鉴于套索的流行性和有效性,我们的严格分析为其在伊辛模型选择中的实际应用提供了理论支持。 摘要:We consider the problem of high-dimensional Ising model selection using neighborhood-based least absolute shrinkage and selection operator (Lasso). It is rigorously proved that under some mild coherence conditions on the population covariance matrix of the Ising model, consistent model selection can be achieved with sample sizes $n=\Omega{(d^3\log{p})}$ for any tree-like graph in the paramagnetic phase, where $p$ is the number of variables and $d$ is the maximum node degree. When the same conditions are imposed directly on the sample covariance matrices, it is shown that a reduced sample size $n=\Omega{(d^2\log{p})}$ suffices. The obtained sufficient conditions for consistent model selection with Lasso are the same in the scaling of the sample complexity as that of $\ell_1$-regularized logistic regression. Given the popularity and efficiency of Lasso, our rigorous analysis provides a theoretical backing for its practical use in Ising model selection.
Transformer(8篇)
【1】 SentimentArcs: A Novel Method for Self-Supervised Sentiment Analysis of Time Series Shows SOTA Transformers Can Struggle Finding Narrative Arcs 标题:SentimentArcs:一种时间序列自监督情感分析的新方法显示SOTATransformer很难找到叙事弧 链接:https://arxiv.org/abs/2110.09454
作者:Jon Chun 机构:Digital Humanities Colab, Integrated Program for Humane Studies, Kenyon College, Gambier, OH 备注:87 pages, 97 figures 摘要:SOTA Transformer和DNN短文本情感分类器在IMDB电影评论等狭窄领域的准确率超过97%。现实世界中的性能要低得多,因为传统的模型过于适合基准测试,不能很好地推广到不同的或更开放的领域文本。本文介绍了一种新的自监督时间序列情感分析方法——情感弧,它解决了传统监督情感分析的两个主要局限性:有限的标记训练数据集和较差的泛化能力。大量不同的模型为自监督学习提供了一个综合的基本事实。新的指标联合优化了所有可能语料库的穷举搜索:模型组合。语料库和模型的联合优化解决了泛化问题。简单的可视化利用了叙述中的时间结构,因此领域专家可以快速发现趋势,识别关键特征,并注意到数百条弧和数百万个数据点上的异常情况。据我们所知,这是时间序列情绪分析的第一种自我监督方法,也是直接比较长形式叙事的真实世界模型性能的最大调查。 摘要:SOTA Transformer and DNN short text sentiment classifiers report over 97% accuracy on narrow domains like IMDB movie reviews. Real-world performance is significantly lower because traditional models overfit benchmarks and generalize poorly to different or more open domain texts. This paper introduces SentimentArcs, a new self-supervised time series sentiment analysis methodology that addresses the two main limitations of traditional supervised sentiment analysis: limited labeled training datasets and poor generalization. A large ensemble of diverse models provides a synthetic ground truth for self-supervised learning. Novel metrics jointly optimize an exhaustive search across every possible corpus:model combination. The joint optimization over both the corpus and model solves the generalization problem. Simple visualizations exploit the temporal structure in narratives so domain experts can quickly spot trends, identify key features, and note anomalies over hundreds of arcs and millions of data points. To our knowledge, this is the first self-supervised method for time series sentiment analysis and the largest survey directly comparing real-world model performance on long-form narratives.
【2】 Contextual Hate Speech Detection in Code Mixed Text using Transformer Based Approaches 标题:使用基于转换器的方法检测代码混合文本中的上下文仇恨语音 链接:https://arxiv.org/abs/2110.09338
作者:Ravindra Nayak,Raviraj Joshi 机构:Sri Jayachamarajendra College of Engineering, Mysore, Indian Institute of Technology Madras, Chennai 备注:Accepted at HASOC @Forum for Information Retrieval Evaluation(FIRE) 2021 摘要:在最近的过去,社交媒体平台帮助人们与更广泛的受众进行联系和沟通。但这也导致了网络欺凌的急剧增加。检测和遏制仇恨言论对于保持社交媒体平台的健全至关重要。此外,这些平台上经常使用包含多种语言的代码混合文本。因此,我们提出了一种自动的技术,用于在从刮擦过的Twitter上混合代码的文本中检测仇恨语音。我们特别关注代码混合的英语印地语文本和基于转换器的方法。虽然常规方法独立分析文本,但我们也以父推文的形式使用内容文本。我们尝试在单编码器和双编码器设置下评估多语言BERT和Indic BERT的性能。第一种方法是使用分隔符标记连接目标文本和上下文文本,并从BERT模型中获得单个表示。第二种方法使用双BERT编码器独立地对两个文本进行编码,并对相应的表示进行平均。我们证明了使用独立表示的双编码器方法可以产生更好的性能。我们还使用简单的集成方法来进一步提高性能。使用这些方法,我们能够在HASOC 2021 ICHCL代码混合数据集上获得73.07%的最佳F1分数。 摘要:In the recent past, social media platforms have helped people in connecting and communicating to a wider audience. But this has also led to a drastic increase in cyberbullying. It is essential to detect and curb hate speech to keep the sanity of social media platforms. Also, code mixed text containing more than one language is frequently used on these platforms. We, therefore, propose automated techniques for hate speech detection in code mixed text from scraped Twitter. We specifically focus on code mixed English-Hindi text and transformer-based approaches. While regular approaches analyze the text independently, we also make use of content text in the form of parent tweets. We try to evaluate the performances of multilingual BERT and Indic-BERT in single-encoder and dual-encoder settings. The first approach is to concatenate the target text and context text using a separator token and get a single representation from the BERT model. The second approach encodes the two texts independently using a dual BERT encoder and the corresponding representations are averaged. We show that the dual-encoder approach using independent representations yields better performance. We also employ simple ensemble methods to further improve the performance. Using these methods we were able to achieve the best F1 score of 73.07% on the HASOC 2021 ICHCL code mixed data set.
【3】 Energon: Towards Efficient Acceleration of Transformers Using Dynamic Sparse Attention 标题:Energon:使用动态稀疏注意实现Transformer的高效加速 链接:https://arxiv.org/abs/2110.09310
作者:Zhe Zhou,Junlin Liu,Zhenyu Gu,Guangyu Sun 机构: Peking University 摘要:近年来,transformer模型已经彻底改变了自然语言处理(NLP),并且在计算机视觉(CV)任务中也显示了良好的性能。尽管有效,但由于复杂的数据移动和二次计算复杂性,Transformer的注意力操作很难加速,从而禁止在资源受限的边缘计算平台上进行实时推理。为了应对这一挑战,我们提出了Energon,这是一种利用动态稀疏注意加速各种Transformer的算法架构协同设计方法。由于注意结果只依赖于几个重要的查询密钥对,因此我们提出了一种多轮过滤算法来在运行时动态识别这些密钥对。我们在每轮滤波中采用低比特宽度,在注意阶段仅使用高精度张量,以降低整体复杂度。通过这种方式,我们显著降低了计算成本,而精度损失可以忽略不计。为了使这种算法具有更低的延迟和更好的能量效率,我们还提出了一种Energon协处理器体系结构。精心设计的管道和专门的优化共同提高了性能并降低了功耗。在NLP和CV基准上进行的大量实验表明,与Intel Xeon 5220 CPU和NVIDIA V100 GPU相比,Energon实现了$161倍和$8.4倍的几何平均加速比,并实现了$10^4倍和$10^3倍的节能。与最先进的注意力加速器SpAtten和$A^3$相比,Energon还实现了$1.7倍、1.25倍和$1.6倍、1.5倍的能效提升。 摘要:In recent years, transformer models have revolutionized Natural Language Processing (NLP) and also show promising performance on Computer Vision (CV) tasks. Despite their effectiveness, transformers' attention operations are hard to accelerate due to complicated data movement and quadratic computational complexity, prohibiting the real-time inference on resource-constrained edge-computing platforms. To tackle this challenge, we propose Energon, an algorithm-architecture co-design approach that accelerates various transformers using dynamic sparse attention. With the observation that attention results only depend on a few important query-key pairs, we propose a multi-round filtering algorithm to dynamically identify such pairs at runtime. We adopt low bitwidth in each filtering round and only use high-precision tensors in the attention stage to reduce overall complexity. By this means, we significantly mitigate the computational cost with negligible accuracy loss. To enable such an algorithm with lower latency and better energy-efficiency, we also propose an Energon co-processor architecture. Elaborated pipelines and specialized optimizations jointly boost the performance and reduce power consumption. Extensive experiments on both NLP and CV benchmarks demonstrate that Energon achieves $161\times$ and $8.4\times$ geo-mean speedup and up to $10^4\times$ and $10^3\times$ energy reduction compared with Intel Xeon 5220 CPU and NVIDIA V100 GPU. Compared to state-of-the-art attention accelerators SpAtten and $A^3$, Energon also achieves $1.7\times, 1.25\times$ speedup and $1.6 \times, 1.5\times $ higher energy efficiency.
【4】 Transformer with a Mixture of Gaussian Keys 标题:混合使用高斯密钥的Transformer 链接:https://arxiv.org/abs/2110.08678
作者:Tam Nguyen,Tan M. Nguyen,Dung Le,Khuong Nguyen,Anh Tran,Richard G. Baraniuk,Nhat Ho,Stanley J. Osher 机构:FPT Software, Vietnam†, University of California, Los Angeles, USA‡, Rice University, Houston, USA⋄, University of Texas, Austin, USA◦ 备注:21 pages, 8 figures, 4 tables 摘要:多头注意力是最先进的Transformer背后的驱动力,这些Transformer在各种自然语言处理(NLP)和计算机视觉任务中实现了卓越的性能。据观察,在许多应用中,这些注意头学习冗余嵌入,并且大多数注意头可以在不降低模型性能的情况下移除。受这一观察结果的启发,我们提出了一种混合高斯密钥的Transformer(Transformer MGK),这是一种新的Transformer架构,它将Transformer中的冗余磁头替换为每个磁头上的混合密钥。这些关键点的混合遵循高斯混合模型,允许每个注意头有效地集中在输入序列的不同部分。与传统的transformer相比,transformer MGK加快了训练和推理速度,参数更少,需要的计算次数更少,同时在任务之间实现了相当或更好的精度。TransformerMGK也可以很容易地扩展到线性应用。我们以经验证明Transformer MGK在一系列实际应用中的优势,包括语言建模和涉及很长序列的任务。在Wikitext-103和远程竞技场基准测试中,具有4个磁头的TransformerMGK与具有8个磁头的基准Transformer具有相当或更好的性能。 摘要:Multi-head attention is a driving force behind state-of-the-art transformers which achieve remarkable performance across a variety of natural language processing (NLP) and computer vision tasks. It has been observed that for many applications, those attention heads learn redundant embedding, and most of them can be removed without degrading the performance of the model. Inspired by this observation, we propose Transformer with a Mixture of Gaussian Keys (Transformer-MGK), a novel transformer architecture that replaces redundant heads in transformers with a mixture of keys at each head. These mixtures of keys follow a Gaussian mixture model and allow each attention head to focus on different parts of the input sequence efficiently. Compared to its conventional transformer counterpart, Transformer-MGK accelerates training and inference, has fewer parameters, and requires less FLOPs to compute while achieving comparable or better accuracy across tasks. Transformer-MGK can also be easily extended to use with linear attentions. We empirically demonstrate the advantage of Transformer-MGK in a range of practical applications including language modeling and tasks that involve very long sequences. On the Wikitext-103 and Long Range Arena benchmark, Transformer-MGKs with 4 heads attain comparable or better performance to the baseline transformers with 8 heads.
【5】 On Learning the Transformer Kernel 标题:关于学习Transformer核心的几点思考 链接:https://arxiv.org/abs/2110.08323
作者:Sankalan Pal Chowdhury,Adamos Solomou,Avinava Dubey,Mrinmaya Sachan 机构:Department of Computer Science, ETH Z¨urich, Google Research, Mountain View, CA 备注:26 pages, of which 11 form the appendix. 6 figures of which 2 are part of appendix 摘要:在这项工作中,我们介绍了核化Transformer,一个通用的,可扩展的,数据驱动的框架,用于学习Transformer的核心功能。我们的框架将Transformer内核近似为光谱特征映射之间的点积,并通过学习光谱分布来学习内核。这不仅有助于端到端地学习通用内核,而且还降低了转换器从二次到线性的时间和空间复杂性。我们表明,核化Transformer在精度和计算效率方面达到了与现有高效Transformer结构相当的性能。我们的研究还表明,内核的选择对性能有很大的影响,内核学习变体在长序列和短序列任务中都是固定内核转换器的竞争替代品。 摘要:In this work we introduce KERNELIZED TRANSFORMER, a generic, scalable, data driven framework for learning the kernel function in Transformers. Our framework approximates the Transformer kernel as a dot product between spectral feature maps and learns the kernel by learning the spectral distribution. This not only helps in learning a generic kernel end-to-end, but also reduces the time and space complexity of Transformers from quadratic to linear. We show that KERNELIZED TRANSFORMERS achieve performance comparable to existing efficient Transformer architectures, both in terms of accuracy as well as computational efficiency. Our study also demonstrates that the choice of the kernel has a substantial impact on performance, and kernel learning variants are competitive alternatives to fixed kernel Transformers, both in long as well as short sequence tasks.
【6】 From Multimodal to Unimodal Attention in Transformers using Knowledge Distillation 标题:基于知识蒸馏的Transformer从多模态注意到单模态注意 链接:https://arxiv.org/abs/2110.08270
作者:Dhruv Agarwal,Tanay Agrawal,Laura M. Ferrari,François Bremond 机构:INRIA Sophia Antipolis - M´editerran´ee, France, Indian Institute of Information Technology, Allahabad, India, Universit´e Cˆote d’Azur, France 备注:Preprint. Final paper accepted at the 17th IEEE International Conference on Advanced Video and Signal-based Surveillance, AVSS 2021, Virtual, November 16-19, 2021. 8 pages 摘要:多模深度学习引起了人们的极大兴趣,多亏了交叉注意机制,《Transformer》引发了新的学习方法。在这里,我们提出了一种解决现有两个关键挑战的方法:高计算资源需求和缺少模式的问题。我们首次在Transformer中引入知识提取的概念,以便在推理时仅使用一种模态。我们报告了一项完整的研究,分析了多个学生-教师配置、蒸馏应用的水平以及不同的方法。在最佳配置下,我们将最先进的精度提高了3%,参数数量减少了2.5倍,推理时间减少了22%。这种性能计算折衷可以在许多应用中得到利用,我们的目标是开辟一个新的研究领域,在这个领域中,需要在有限的资源中部署复杂的模型。 摘要:Multimodal Deep Learning has garnered much interest, and transformers have triggered novel approaches, thanks to the cross-attention mechanism. Here we propose an approach to deal with two key existing challenges: the high computational resource demanded and the issue of missing modalities. We introduce for the first time the concept of knowledge distillation in transformers to use only one modality at inference time. We report a full study analyzing multiple student-teacher configurations, levels at which distillation is applied, and different methodologies. With the best configuration, we improved the state-of-the-art accuracy by 3%, we reduced the number of parameters by 2.5 times and the inference time by 22%. Such performance-computation tradeoff can be exploited in many applications and we aim at opening a new research area where the deployment of complex models with limited resources is demanded.
【7】 Yformer: U-Net Inspired Transformer Architecture for Far Horizon Time Series Forecasting 标题:Yformer:基于U-Net的远景时间序列预测Transformer体系结构 链接:https://arxiv.org/abs/2110.08255
作者:Kiran Madhusudhanan,Johannes Burchert,Nghia Duong-Trung,Stefan Born,Lars Schmidt-Thieme 机构:Department of Computer Science, University of Hildesheim, Hildesheim, Germany, Technische Universit¨at Berlin, Berlin, Germany 摘要:时间序列数据在研究和各种工业应用中无处不在。有效地分析可用的历史数据,并提供对遥远未来的见解,使我们能够做出有效的决策。最近的研究表明,基于Transformer的体系结构具有优越的性能,特别是在远地平线时间序列预测方面。然而,目前最先进的稀疏Transformer结构无法将降采样和升采样过程耦合起来,以产生与输入分辨率相似的输出。我们基于一种新颖的Y形编码器-解码器架构提出了Y形模型,该架构(1)在U-Net架构中使用从缩小的编码器层到相应的上采样解码器层的直接连接,(2)将缩小/上采样与稀疏注意相结合以捕获长程效果,以及(3)通过添加辅助重建损耗来稳定编码器-解码器堆栈。对四个基准数据集的相关基线进行了广泛的实验,结果表明,与当前技术水平相比,单变量和多变量设置的平均改善率分别为19.82%、18.41%和13.62%、11.85%。 摘要:Time series data is ubiquitous in research as well as in a wide variety of industrial applications. Effectively analyzing the available historical data and providing insights into the far future allows us to make effective decisions. Recent research has witnessed the superior performance of transformer-based architectures, especially in the regime of far horizon time series forecasting. However, the current state of the art sparse Transformer architectures fail to couple down- and upsampling procedures to produce outputs in a similar resolution as the input. We propose the Yformer model, based on a novel Y-shaped encoder-decoder architecture that (1) uses direct connection from the downscaled encoder layer to the corresponding upsampled decoder layer in a U-Net inspired architecture, (2) Combines the downscaling/upsampling with sparse attention to capture long-range effects, and (3) stabilizes the encoder-decoder stacks with the addition of an auxiliary reconstruction loss. Extensive experiments have been conducted with relevant baselines on four benchmark datasets, demonstrating an average improvement of 19.82, 18.41 percentage MSE and 13.62, 11.85 percentage MAE in comparison to the current state of the art for the univariate and the multivariate settings respectively.
【8】 CAE-Transformer: Transformer-based Model to Predict Invasiveness of Lung Adenocarcinoma Subsolid Nodules from Non-thin Section 3D CT Scans 标题:CAE-Transformer:基于Transformer的非薄层三维CT扫描预测肺腺癌亚固体结节侵袭性的模型 链接:https://arxiv.org/abs/2110.08721
作者:Shahin Heidarian,Parnian Afshar,Anastasia Oikonomou,Konstantinos N. Plataniotis,Arash Mohammadi 机构:†Department of Electrical and Computer Engineering, Concordia University, Montreal, Canada, ‡Concordia Institute for Information Systems Engineering, Concordia University, Montreal, Canada 摘要:肺癌是全世界癌症死亡的主要原因,其组织学类型多种多样,其中肺腺癌(LAUC)最近最为常见。肺腺癌分为侵袭前、微创和侵袭性腺癌。及时准确地了解肺结节的侵袭性,可以制定正确的治疗计划,降低不必要或晚期手术的风险。目前,评估和预测LAUCs侵袭性的主要成像方式是胸部CT。然而,基于CT图像的结果是主观的,与手术切除后提供的病理检查相比,准确性较低。在本文中,开发了一个基于预测Transformer的框架,称为“CAETransformer”,用于对LAUC进行分类。CAE变换器利用卷积自动编码器(CAE)从CT切片中自动提取信息特征,然后将信息特征反馈给修改后的变换器模型,以捕获全局切片间关系。在114个病理证实的亚实体结节(SSN)的内部数据集上的实验结果表明,CAE Transformer优于基于直方图/放射组学的模型及其基于深度学习的模型,准确率为87.73%,敏感性为88.67%,特异性为86.33%,AUC为0.913,使用10倍交叉验证。 摘要:Lung cancer is the leading cause of mortality from cancer worldwide and has various histologic types, among which Lung Adenocarcinoma (LAUC) has recently been the most prevalent. Lung adenocarcinomas are classified as pre-invasive, minimally invasive, and invasive adenocarcinomas. Timely and accurate knowledge of the invasiveness of lung nodules leads to a proper treatment plan and reduces the risk of unnecessary or late surgeries. Currently, the primary imaging modality to assess and predict the invasiveness of LAUCs is the chest CT. The results based on CT images, however, are subjective and suffer from a low accuracy compared to the ground truth pathological reviews provided after surgical resections. In this paper, a predictive transformer-based framework, referred to as the "CAE-Transformer", is developed to classify LAUCs. The CAE-Transformer utilizes a Convolutional Auto-Encoder (CAE) to automatically extract informative features from CT slices, which are then fed to a modified transformer model to capture global inter-slice relations. Experimental results on our in-house dataset of 114 pathologically proven Sub-Solid Nodules (SSNs) demonstrate the superiority of the CAE-Transformer over the histogram/radiomics-based models and its deep learning-based counterparts, achieving an accuracy of 87.73%, sensitivity of 88.67%, specificity of 86.33%, and AUC of 0.913, using a 10-fold cross-validation.
GAN|对抗|攻击|生成相关(21篇)
【1】 Protecting Anonymous Speech: A Generative Adversarial Network Methodology for Removing Stylistic Indicators in Text 标题:匿名言论保护:一种去除文本中文体指标的生成性对抗性网络方法 链接:https://arxiv.org/abs/2110.09495
作者:Rishi Balakrishnan,Stephen Sloan,Anil Aswani 机构:University of California, Berkeley 摘要:随着互联网用户不断留下文字线索,无论是通过博客、电子邮件还是社交媒体帖子,匿名写作和抗议的能力正在受到侵蚀,因为人工智能在获得之前工作的样本后,可以在数百名可能的候选人中将文字与其作者进行匹配。现有的作者身份匿名化方法,也称为作者身份混淆,通常侧重于保护二元人口统计属性,而不是整个身份。即使是那些专注于混淆身份的人也需要人工反馈,失去原始句子的连贯性,或者只在有限的作者子集下表现良好。在本文中,我们开发了一种新的作者身份匿名方法,通过构建一个生成性对抗网络来保护身份,并针对匿名性、流畅性和内容保留三种不同的损失进行优化。我们的全自动方法在内容保存和流畅性方面与其他方法取得了相当的结果,但在匿名化方面大大优于基线。此外,我们的方法能够很好地推广到开放集上下文,并且能够匿名化以前从未遇到过的作者的句子。 摘要:With Internet users constantly leaving a trail of text, whether through blogs, emails, or social media posts, the ability to write and protest anonymously is being eroded because artificial intelligence, when given a sample of previous work, can match text with its author out of hundreds of possible candidates. Existing approaches to authorship anonymization, also known as authorship obfuscation, often focus on protecting binary demographic attributes rather than identity as a whole. Even those that do focus on obfuscating identity require manual feedback, lose the coherence of the original sentence, or only perform well given a limited subset of authors. In this paper, we develop a new approach to authorship anonymization by constructing a generative adversarial network that protects identity and optimizes for three different losses corresponding to anonymity, fluency, and content preservation. Our fully automatic method achieves comparable results to other methods in terms of content preservation and fluency, but greatly outperforms baselines in regards to anonymization. Moreover, our approach is able to generalize well to an open-set context and anonymize sentences from authors it has not encountered before.
【2】 Don't Judge Me by My Face : An Indirect Adversarial Approach to Remove Sensitive Information From Multimodal Neural Representation in Asynchronous Job Video Interviews 标题:不要以貌取人:异步工作视频面试中从多模态神经表征中去除敏感信息的间接对抗性方法 链接:https://arxiv.org/abs/2110.09424
作者:Léo Hemamou,Arthur Guillon,Jean-Claude Martin,Chloé Clavel 机构:∗EASYRECRUE, Paris, France, †LIMSI-LISN, CNRS, Paris-Sud University, Paris-Saclay University F-, Orsay, France, ‡T´el´ecom-Paris, IP-Paris, F-, Paris, France 备注:published in ACII 2021 摘要:机器学习用于自动分析求职面试视频的se最近引起了越来越多的兴趣。尽管声称对候选人的性别或种族等敏感信息的输出是公平的,但目前的方法很少提供无偏见决策的证据,或者说敏感信息没有被使用。近年来,对抗性方法已被证明能有效地从神经网络的潜在表征中去除敏感信息。然而,这些方法依赖于使用明确标记的受保护变量(如性别),而在某些国家(如法国)的招聘中无法收集这些变量。在这篇文章中,我们提出了一种新的对抗性方法来去除神经网络潜在表示中的敏感信息,而不需要收集任何敏感变量。仅使用面试的几个框架,我们训练我们的模型,使其无法在模型的内层找到与面试相关的候选人的面孔。这反过来又允许我们从这些层中删除相关的私有信息。将我们的方法与带有性别和种族注释的公共数据集上的标准基线进行比较,我们发现它可以有效地从主网络中删除敏感信息。此外,据我们所知,这是第一次应用对抗性技术,在视频面试中获得多模式公平表达。总之,我们的贡献旨在提高即将推出的自动系统处理求职面试视频的公平性,以实现求职平等。 摘要:se of machine learning for automatic analysis of job interview videos has recently seen increased interest. Despite claims of fair output regarding sensitive information such as gender or ethnicity of the candidates, the current approaches rarely provide proof of unbiased decision-making, or that sensitive information is not used. Recently, adversarial methods have been proved to effectively remove sensitive information from the latent representation of neural networks. However, these methods rely on the use of explicitly labeled protected variables (e.g. gender), which cannot be collected in the context of recruiting in some countries (e.g. France). In this article, we propose a new adversarial approach to remove sensitive information from the latent representation of neural networks without the need to collect any sensitive variable. Using only a few frames of the interview, we train our model to not be able to find the face of the candidate related to the job interview in the inner layers of the model. This, in turn, allows us to remove relevant private information from these layers. Comparing our approach to a standard baseline on a public dataset with gender and ethnicity annotations, we show that it effectively removes sensitive information from the main network. Moreover, to the best of our knowledge, this is the first application of adversarial techniques for obtaining a multimodal fair representation in the context of video job interviews. In summary, our contributions aim at improving fairness of the upcoming automatic systems processing videos of job interviews for equality in job selection.
【3】 A Prior Guided Adversarial Representation Learning and Hypergraph Perceptual Network for Predicting Abnormal Connections of Alzheimer's Disease 标题:先验引导的对抗性表征学习和超图感知网络预测阿尔茨海默病异常联系 链接:https://arxiv.org/abs/2110.09302
作者:Qiankun Zuo,Baiying Lei,Shuqiang Wang,Yong Liu,Bingchuan Wang,Yanyan Shen 机构: ShenzhenUniversity, ChinaYong Liu is with the Gaoling School of Artificial Intelligence, RenminUniversity of China 摘要:阿尔茨海默病的特征是在其进行性退化过程中大脑结构和功能连接的改变。现有的辅助诊断方法已经完成了分类任务,但很少有方法能够准确地评估脑连接性的变化特征。在这项工作中,提出了一种先验引导的对抗性表征学习和超图感知网络(PGARL-HPN),用于使用三模态医学图像预测异常的大脑连接。具体而言,根据解剖学知识估计先验分布,以指导使用对抗策略的多模态表征学习。此外,还进一步利用成对协作鉴别器结构来缩小表示分布的差异。此外,开发了超图感知网络来有效地融合学习的表示,同时在多模态图像内部和之间建立高阶关系。实验结果表明,该模型在分析和预测阿尔茨海默病进展方面优于其他相关方法。更重要的是,已识别的异常连接部分符合先前的神经科学发现。该模型可以评估阿尔茨海默病不同阶段异常脑连接的特征,有助于认知疾病的研究和早期治疗。 摘要:Alzheimer's disease is characterized by alterations of the brain's structural and functional connectivity during its progressive degenerative processes. Existing auxiliary diagnostic methods have accomplished the classification task, but few of them can accurately evaluate the changing characteristics of brain connectivity. In this work, a prior guided adversarial representation learning and hypergraph perceptual network (PGARL-HPN) is proposed to predict abnormal brain connections using triple-modality medical images. Concretely, a prior distribution from the anatomical knowledge is estimated to guide multimodal representation learning using an adversarial strategy. Also, the pairwise collaborative discriminator structure is further utilized to narrow the difference of representation distribution. Moreover, the hypergraph perceptual network is developed to effectively fuse the learned representations while establishing high-order relations within and between multimodal images. Experimental results demonstrate that the proposed model outperforms other related methods in analyzing and predicting Alzheimer's disease progression. More importantly, the identified abnormal connections are partly consistent with the previous neuroscience discoveries. The proposed model can evaluate characteristics of abnormal brain connections at different stages of Alzheimer's disease, which is helpful for cognitive disease study and early treatment.
【4】 BEAMetrics: A Benchmark for Language Generation Evaluation Evaluation 标题:BEAMetrics:一种语言生成评价的基准 链接:https://arxiv.org/abs/2110.09147
作者:Thomas Scialom,Felix Hill 机构:Sorbonne Université, CNRS, LIP, F-, reciTAL, Paris, France, DeepMind 摘要:自然语言处理(NLP)系统越来越多地训练生成开放式文本,而不是在响应之间进行分类。这使得对生成的语言的评估指标的研究至关重要。生成的语言是在给定上下文和/或人类引用响应的情况下对系统输出进行评分的函数。然而,不同的指标具有不同的优势和偏差,并且在某些任务上比在其他任务上更好地反映了人类的直觉。目前还没有一种简单、统一的方法来比较、分析或评估一组具有代表性的任务的指标。这里,我们描述了评估自动度量的基准(BeamMetrics),这是一种使新度量本身的研究更容易评估的资源。BeamMetrics用户可以在不同的任务、质量维度(流利性、连贯性、信息性等)和语言中快速比较现有和新的指标与人类的判断。正如发电专家可能预测的那样,BeamMetrics揭示了现有指标之间与任务相关的显著差异,以及在回答空间复杂或高度依赖一般知识的任务上一贯表现不佳。虽然该分析强调了当前研究实践中面临的一个关键问题,但BeamMetrics也通过促进更好的度量研究——特别是那些可以解释许多现代NLP应用固有的上下文和一般知识之间复杂交互的度量,为解决这一问题做出了贡献。BEAMetrics在麻省理工学院许可证下提供:https://github.com/ThomasScialom/BEAMetrics 摘要:Natural language processing (NLP) systems are increasingly trained to generate open-ended text rather than classifying between responses. This makes research on evaluation metrics for generated language -- functions that score system output given the context and/or human reference responses -- of critical importance. However, different metrics have different strengths and biases, and reflect human intuitions better on some tasks than others. There is currently no simple, unified way to compare, analyse or evaluate metrics across a representative set of tasks. Here, we describe the Benchmark to Evaluate Automatic Metrics (BEAMetrics), a resource to make research into new metrics itself easier to evaluate. BEAMetrics users can quickly compare existing and new metrics with human judgements across a diverse set of tasks, quality dimensions (fluency vs. coherence vs. informativeness etc), and languages. As generation experts might predict, BEAMetrics reveals stark task-dependent differences between existing metrics, and consistently poor performance on tasks with complex answer spaces or high reliance on general knowledge. While this analysis highlights a critical issue facing current research practice, BEAMetrics also contribute to its resolution by facilitating research into better metrics -- particularly those that can account for the complex interaction between context and general knowledge inherent to many modern NLP applications. BEAMetrics is available under the MIT License: https://github.com/ThomasScialom/BEAMetrics
【5】 Improving Robustness of Reinforcement Learning for Power System Control with Adversarial Training 标题:对抗性训练提高强化学习在电力系统控制中的鲁棒性 链接:https://arxiv.org/abs/2110.08956
作者:Alexander Pan,Yongkyun,Lee,Huan Zhang,Yize Chen,Yuanyuan Shi 机构: Huan Zhang is with the Department of Computer Science, Yuanyuan Shi is with the Department of Electrical and ComputerEngineering, University of California San Diego 备注:Published at 2021 ICML RL4RL Workshop Submitted to 2022 PSCC 摘要:由于可再生能源的扩散及其固有的间歇性和随机性,当前电力系统面临着严峻的运行挑战。强化学习(RL)的数据驱动决策算法为高效运行清洁能源系统提供了解决方案。尽管与基于模型的控制模型相比,RL算法取得了很好的性能,但对于安全关键物理系统中RL鲁棒性的研究还很有限。在这项工作中,我们首先展示了几个赢得竞争的、最先进的用于电力系统控制的RL代理容易受到对手攻击。具体而言,我们使用对手马尔可夫决策过程来学习攻击策略,并通过在白盒和黑盒攻击设置下从学习运行电力网络(L2RPN)挑战中成功攻击多个获胜代理来证明我们的攻击的效力。然后,我们建议使用对抗性训练来提高RL代理对攻击的鲁棒性,并避免不可行的操作决策。据我们所知,我们的工作首次强调了网格控制RL算法的脆弱性,并为提高其鲁棒性和安全性提供了有效的防御方案。 摘要:Due to the proliferation of renewable energy and its intrinsic intermittency and stochasticity, current power systems face severe operational challenges. Data-driven decision-making algorithms from reinforcement learning (RL) offer a solution towards efficiently operating a clean energy system. Although RL algorithms achieve promising performance compared to model-based control models, there has been limited investigation of RL robustness in safety-critical physical systems. In this work, we first show that several competition-winning, state-of-the-art RL agents proposed for power system control are vulnerable to adversarial attacks. Specifically, we use an adversary Markov Decision Process to learn an attack policy, and demonstrate the potency of our attack by successfully attacking multiple winning agents from the Learning To Run a Power Network (L2RPN) challenge, under both white-box and black-box attack settings. We then propose to use adversarial training to increase the robustness of RL agent against attacks and avoid infeasible operational decisions. To the best of our knowledge, our work is the first to highlight the fragility of grid control RL algorithms, and contribute an effective defense scheme towards improving their robustness and security.
【6】 Poisoning Attacks on Fair Machine Learning 标题:对公平机器学习的毒害攻击 链接:https://arxiv.org/abs/2110.08932
作者:Minh-Hao Van,Wei Du,Xintao Wu,Aidong Lu 机构: University of Arkansas at Fayetteville, University of North Carolina at Charlotte 摘要:公平机器学习和对抗学习都得到了广泛的研究。然而,攻击公平的机器学习模型受到的关注较少。在本文中,我们提出了一个框架,旨在有效地生成中毒样本,以攻击模型准确性和算法公平性。我们的攻击框架可以针对使用各种基于群体的公平概念训练的公平机器学习模型,如人口均等和均等赔率。我们开发了三种在线攻击:对抗性采样、对抗性标记和对抗性特征修改。所有这三种攻击都通过采样、标记或修改部分训练数据来有效地生成中毒样本,以降低测试精度。我们的框架使攻击者能够灵活地调整攻击的重点,即预测准确性或公平性,并准确地量化每个候选点对准确性损失和公平性违反的影响,从而产生有效的中毒样本。在两个真实数据集上的实验证明了该框架的有效性和效率。 摘要:Both fair machine learning and adversarial learning have been extensively studied. However, attacking fair machine learning models has received less attention. In this paper, we present a framework that seeks to effectively generate poisoning samples to attack both model accuracy and algorithmic fairness. Our attacking framework can target fair machine learning models trained with a variety of group based fairness notions such as demographic parity and equalized odds. We develop three online attacks, adversarial sampling , adversarial labeling, and adversarial feature modification. All three attacks effectively and efficiently produce poisoning samples via sampling, labeling, or modifying a fraction of training data in order to reduce the test accuracy. Our framework enables attackers to flexibly adjust the attack's focus on prediction accuracy or fairness and accurately quantify the impact of each candidate point to both accuracy loss and fairness violation, thus producing effective poisoning samples. Experiments on two real datasets demonstrate the effectiveness and efficiency of our framework.
【7】 Taming Visually Guided Sound Generation 标题:驯服视觉引导的声音生成 链接:https://arxiv.org/abs/2110.08791
作者:Vladimir Iashin,Esa Rahtu 机构:Computing Sciences, Tampere University, Tampere, Finland, Playing, Harp, Lions, Roaring, Canary, Calling, Visually-, Guided, Sound, Generation, Model, � Click to Play, in Adobe Reader 备注:Accepted as an oral presentation for the BMVC 2021. Code: this https URL Project page: this https URL 摘要:视觉诱导音频生成的最新进展是基于采样短、低保真度和一类声音。此外,在高端GPU上,从最先进的模型中采集1秒的音频需要几分钟。在这项工作中,我们提出了一种单一的模型,它能够在比在单个GPU上播放所需时间更短的时间内,从开放域视频中生成一组帧提示的视觉相关的高保真声音。我们训练一个转换器,在给定一组视频特征的情况下,从预先训练的频谱图码本中采样一个新的频谱图。该码本是使用VQGAN的一种变体获得的,该变体经过训练以产生一个紧凑的采样空间,该采样空间具有一种新的基于谱图的感知损失。生成的光谱图使用基于窗口的GAN转换为波形,显著加快生成速度。考虑到缺乏自动评估生成光谱图的指标,我们还构建了一系列指标,称为FID和MKL。这些指标基于一种称为Melection的新型声音分类器,旨在评估开放域样本的保真度和相关性。定性和定量研究均在小型和大型数据集上进行,以评估生成样本的保真度和相关性。我们还将我们的模型与最新技术进行了比较,并观察到在质量、大小和计算时间方面有了实质性的改进。代码、演示和示例:v-iashin.github.io/SpecVQGAN 摘要:Recent advances in visually-induced audio generation are based on sampling short, low-fidelity, and one-class sounds. Moreover, sampling 1 second of audio from the state-of-the-art model takes minutes on a high-end GPU. In this work, we propose a single model capable of generating visually relevant, high-fidelity sounds prompted with a set of frames from open-domain videos in less time than it takes to play it on a single GPU. We train a transformer to sample a new spectrogram from the pre-trained spectrogram codebook given the set of video features. The codebook is obtained using a variant of VQGAN trained to produce a compact sampling space with a novel spectrogram-based perceptual loss. The generated spectrogram is transformed into a waveform using a window-based GAN that significantly speeds up generation. Considering the lack of metrics for automatic evaluation of generated spectrograms, we also build a family of metrics called FID and MKL. These metrics are based on a novel sound classifier, called Melception, and designed to evaluate the fidelity and relevance of open-domain samples. Both qualitative and quantitative studies are conducted on small- and large-scale datasets to evaluate the fidelity and relevance of generated samples. We also compare our model to the state-of-the-art and observe a substantial improvement in quality, size, and computation time. Code, demo, and samples: v-iashin.github.io/SpecVQGAN
【8】 Towards Better Long-range Time Series Forecasting using Generative Adversarial Networks 标题:利用产生式对抗性网络进行更好的长期时间序列预测 链接:https://arxiv.org/abs/2110.08770
作者:Shiyu Liu,Mehul Motani 机构:Department of Electrical and Computer Engineering, National University of Singapore 备注:7 pages main paper with 4 pages appendix 摘要:时间序列数据的准确长期预测是能源、医疗和金融等许多行业的一个重要问题。近年来,生成性对抗网络(GAN)为解决许多问题提供了革命性的方法。然而,使用GAN改进长期时间序列预测的方法仍然相对未被探索。在本文中,我们利用条件Wasserstein-GAN(CWGAN)并用误差惩罚项对其进行扩充,从而产生一种新的生成模型,该模型旨在生成高质量的合成时间序列数据,称为CWGAN-TS。通过使用这种合成数据,我们开发了一种长期预测方法,称为生成预测(GenF),由三部分组成:(i)CWGAN-TS,用于生成接下来几个时间步的合成数据。(ii)根据生成和观测数据进行长期预测的预测器。(iii)一种信息论聚类(ITC)算法,用于更好地训练CWGAN-TS和预测器。我们在三个公共数据集上的实验结果表明,GenF显著优于各种最先进的基准和经典方法。在大多数情况下,我们发现与性能最佳的基准相比,预测性能(平均绝对误差)提高了6%-12%,参数减少了37%。最后,我们进行了烧蚀研究,以证明CWGAN-TS和ITC算法的有效性。 摘要:Accurate long-range forecasting of time series data is an important problem in many sectors, such as energy, healthcare, and finance. In recent years, Generative Adversarial Networks (GAN) have provided a revolutionary approach to many problems. However, the use of GAN to improve long-range time series forecasting remains relatively unexplored. In this paper, we utilize a Conditional Wasserstein GAN (CWGAN) and augment it with an error penalty term, leading to a new generative model which aims to generate high-quality synthetic time series data, called CWGAN-TS. By using such synthetic data, we develop a long-range forecasting approach, called Generative Forecasting (GenF), consisting of three components: (i) CWGAN-TS to generate synthetic data for the next few time steps. (ii) a predictor which makes long-range predictions based on generated and observed data. (iii) an information theoretic clustering (ITC) algorithm to better train the CWGAN-TS and the predictor. Our experimental results on three public datasets demonstrate that GenF significantly outperforms a diverse range of state-of-the-art benchmarks and classical approaches. In most cases, we find a 6% - 12% improvement in predictive performance (mean absolute error) and a 37% reduction in parameters compared to the best performing benchmark. Lastly, we conduct an ablation study to demonstrate the effectiveness of the CWGAN-TS and the ITC algorithm.
【9】 Black-box Adversarial Attacks on Network-wide Multi-step Traffic State Prediction Models 标题:全网多步流量状态预测模型的黑盒对抗性攻击 链接:https://arxiv.org/abs/2110.08712
作者:Bibek Poudel,Weizi Li 机构:Bibek Poudel and Weizi Li are with the Department of Com-puter Science, University of Memphis 备注:Accepted to IEEE International Conference on Intelligent Transportation Systems (ITSC), 2021 摘要:交通状态预测是许多智能交通系统应用的必要条件。该主题的最新发展集中于网络范围的多步预测,其中最先进的性能是通过深度学习模型实现的,特别是基于图形神经网络的模型。虽然深度学习模型的预测精度很高,但这些模型的鲁棒性引起了许多安全问题,因为添加到输入中的不可察觉的扰动会严重降低模型性能。在这项工作中,我们提出了一个对抗性攻击框架,将预测模型视为一个黑箱,即假设不知道模型体系结构、训练数据和(超)参数。然而,我们假设对手可以用任何输入预言预测模型并获得相应的输出。接下来,对手可以使用输入-输出对训练替代模型,并基于替代模型生成对抗信号。为了测试攻击的有效性,研究了两种最新的基于图形神经网络的模型(GCGRNN和DCRNN)。因此,对手可以将目标模型的预测精度降低到54美元。相比之下,两个传统的统计模型(线性回归和历史平均)也进行了检验。虽然这两个模型不能产生很高的预测精度,但它们要么受到轻微影响(低于3\%$),要么对对手的攻击免疫。 摘要:Traffic state prediction is necessary for many Intelligent Transportation Systems applications. Recent developments of the topic have focused on network-wide, multi-step prediction, where state of the art performance is achieved via deep learning models, in particular, graph neural network-based models. While the prediction accuracy of deep learning models is high, these models' robustness has raised many safety concerns, given that imperceptible perturbations added to input can substantially degrade the model performance. In this work, we propose an adversarial attack framework by treating the prediction model as a black-box, i.e., assuming no knowledge of the model architecture, training data, and (hyper)parameters. However, we assume that the adversary can oracle the prediction model with any input and obtain corresponding output. Next, the adversary can train a substitute model using input-output pairs and generate adversarial signals based on the substitute model. To test the attack effectiveness, two state of the art, graph neural network-based models (GCGRNN and DCRNN) are examined. As a result, the adversary can degrade the target model's prediction accuracy up to $54\%$. In comparison, two conventional statistical models (linear regression and historical average) are also examined. While these two models do not produce high prediction accuracy, they are either influenced negligibly (less than $3\%$) or are immune to the adversary's attack.
【10】 Generative Adversarial Imitation Learning for End-to-End Autonomous Driving on Urban Environments 标题:城市环境端到端自主驾驶的生成性对抗性模仿学习 链接:https://arxiv.org/abs/2110.08586
作者:Gustavo Claudio Karl Couto,Eric Aislan Antonelo 机构:Automation and Systems Engineering Department, Federal University of Santa Catarina, Florianopolis, Brazil 摘要:自动驾驶是一项复杂的任务,自1989年第一辆自动驾驶汽车ALVINN问世以来,就一直采用有监督的学习方法或行为克隆(BC)来解决这一问题。在BC中,使用状态-动作对对对神经网络进行训练,这些状态-动作对构成由专家(即人类驾驶员)制作的训练集。然而,这种类型的模仿学习没有考虑在导航轨迹的不同时刻采取的行动之间可能存在的时间依赖性。强化学习(RL)算法可以更好地处理这些类型的任务,它需要定义一个奖励函数。另一方面,最近的模仿学习方法,如生成性对抗性模仿学习(GAIL),可以在不明确要求定义奖励函数的情况下训练策略,允许代理直接在专家轨迹的训练集上通过试错学习。在这项工作中,我们提出了两种GAIL变体,用于在城市场景的真实CARLA模拟环境中进行车辆自主导航。它们都使用相同的网络结构,处理来自三个正面摄像头的高维图像输入,以及表示速度的其他九个连续输入,稀疏轨迹的下一个点和高级驾驶指令。我们证明了这两种方法都能在训练结束后从头到尾模拟专家轨迹,但在收敛时间和训练稳定性方面,用BC扩充的GAIL损失函数优于前者。 摘要:Autonomous driving is a complex task, which has been tackled since the first self-driving car ALVINN in 1989, with a supervised learning approach, or behavioral cloning (BC). In BC, a neural network is trained with state-action pairs that constitute the training set made by an expert, i.e., a human driver. However, this type of imitation learning does not take into account the temporal dependencies that might exist between actions taken in different moments of a navigation trajectory. These type of tasks are better handled by reinforcement learning (RL) algorithms, which need to define a reward function. On the other hand, more recent approaches to imitation learning, such as Generative Adversarial Imitation Learning (GAIL), can train policies without explicitly requiring to define a reward function, allowing an agent to learn by trial and error directly on a training set of expert trajectories. In this work, we propose two variations of GAIL for autonomous navigation of a vehicle in the realistic CARLA simulation environment for urban scenarios. Both of them use the same network architecture, which process high dimensional image input from three frontal cameras, and other nine continuous inputs representing the velocity, the next point from the sparse trajectory and a high-level driving command. We show that both of them are capable of imitating the expert trajectory from start to end after training ends, but the GAIL loss function that is augmented with BC outperforms the former in terms of convergence time and training stability.
【11】 Multimodal Dialogue Response Generation 标题:多模态对话响应生成 链接:https://arxiv.org/abs/2110.08515
作者:Qingfeng Sun,Yujing Wang,Can Xu,Kai Zheng,Yaming Yang,Huang Hu,Fei Xu,Jessica Zhang,Xiubo Geng,Daxin Jiang 机构:Microsoft STC Aisa, Microsoft Research Asia 备注:This paper has been submitted before 15th October @ 11:59pm AOE(UTC -12) 摘要:图像响应是智能会话代理的一项重要功能。然而现有的研究只关注于多模态对话模型的探索,依赖于基于检索的方法,而忽略了生成方法。为了填补这一空白,我们首先提出了一个多模态对话生成模型,该模型将对话历史作为输入,然后生成文本序列或图像作为响应。学习这样一个模型通常需要多模态对话,其中包含难以获得的文本和图像。动机的挑战,在实践中,我们认为多模态对话产生的自然假设,只有有限的训练实例是可用的。在这种低资源环境下,我们设计了一种新的会话代理Divter,以便将依赖于多模态对话的参数从整个生成模型中分离出来。通过这种方法,可以分别从大量纯文本对话和文本-图像对中学习模型的主要部分,然后使用有限的训练示例对整个参数进行拟合。大量实验表明,我们的方法在自动和人工评估方面都达到了最先进的效果,并且能够生成信息丰富的文本和高分辨率的图像响应。 摘要:Responsing with image has been recognized as an important capability for an intelligent conversational agent. Yet existing works only focus on exploring the multimodal dialogue models which depend on retrieval-based methods, but neglecting generation methods. To fill in the gaps, we first present a multimodal dialogue generation model, which takes the dialogue history as input, then generates a textual sequence or an image as response. Learning such a model often requires multimodal dialogues containing both texts and images which are difficult to obtain. Motivated by the challenge in practice, we consider multimodal dialogue generation under a natural assumption that only limited training examples are available. In such a low-resource setting, we devise a novel conversational agent, Divter, in order to isolate parameters that depend on multimodal dialogues from the entire generation model. By this means, the major part of the model can be learned from a large number of text-only dialogues and text-image pairs respectively, then the whole parameters can be well fitted using the limited training examples. Extensive experiments demonstrate our method achieves state-of-the-art results in both automatic and human evaluation, and can generate informative text and high-resolution image responses.
【12】 Analyzing Dynamic Adversarial Training Data in the Limit 标题:极限条件下的动态对抗性训练数据分析 链接:https://arxiv.org/abs/2110.08514
作者:Eric Wallace,Adina Williams,Robin Jia,Douwe Kiela 机构:UC Berkeley, Facebook AI Research, USC 摘要:为了创建在广泛的测试输入中具有鲁棒性的模型,训练数据集应该包括跨越许多现象的不同示例。动态对抗性数据收集(DADC)是一种生成如此多样化的训练集的方法,注释者可以在其中制作挑战不断改进模型的示例。先前的工作表明,在1-3轮中运行DADC可以帮助模型修复某些错误类型,但这并不一定会导致更好的泛化,超越对抗性测试数据。我们认为,多轮运行DADC可以最大限度地提高其训练时间效益,因为不同的轮可以一起覆盖许多与任务相关的现象。我们介绍了第一项长期DADC研究,其中我们收集了20轮NLI示例,用于一小部分前提段落,包括对抗性和非对抗性方法。与基于非对抗性数据训练的模型相比,基于DADC示例训练的模型在我们的专家策划的测试集上的错误减少了26%。我们的分析表明,与非对抗性示例相比,DADC生成的示例更加困难,在词汇和语法上更加多样化,并且包含的注释工件更少。 摘要:To create models that are robust across a wide range of test inputs, training datasets should include diverse examples that span numerous phenomena. Dynamic adversarial data collection (DADC), where annotators craft examples that challenge continually improving models, holds promise as an approach for generating such diverse training sets. Prior work has shown that running DADC over 1-3 rounds can help models fix some error types, but it does not necessarily lead to better generalization beyond adversarial test data. We argue that running DADC over many rounds maximizes its training-time benefits, as the different rounds can together cover many of the task-relevant phenomena. We present the first study of longer-term DADC, where we collect 20 rounds of NLI examples for a small set of premise paragraphs, with both adversarial and non-adversarial approaches. Models trained on DADC examples make 26% fewer errors on our expert-curated test set compared to models trained on non-adversarial data. Our analysis shows that DADC yields examples that are more difficult, more lexically and syntactically diverse, and contain fewer annotation artifacts compared to non-adversarial examples.
【13】 FedMM: Saddle Point Optimization for Federated Adversarial Domain Adaptation 标题:FedMM:联合对抗域自适应的鞍点优化 链接:https://arxiv.org/abs/2110.08477
作者:Yan Shen,Jian Du,Hao Zhang,Benyu Zhang,Zhanghexuan Ji,Mingchen Gao 机构: 1Department of Computer Science and Engineering, Universityat Buffalo, 3 Department ofComputer Science, University of Illinois at Urbana-Champaign 摘要:由于客户端之间标签不平衡的普遍性,联邦对手域自适应是一项独特的分布式极大极小训练任务,每个客户端只看到训练全局模型所需的标签类的子集。为了解决这个问题,我们提出了一个称为FedMM的分布式minimax优化器,专门为联邦对手域自适应问题而设计。即使在每个客户机都有不同的标签类,并且一些客户机只有无监督任务的极端情况下,它也能很好地工作。我们证明了FedMM保证了在域移位无监督数据下收敛到一个平稳点。在各种基准数据集上,大量实验表明,与基于梯度下降上升(GDA)算法的联邦优化器相比,FedMM一致地实现了显著的通信节省或显著的精度改进。例如,当从头开始训练时,它比其他基于GDA的联合平均方法在相同的通信回合中的准确率高出约20\%$;而且,当使用预先训练过的模型进行训练时,它始终表现出色,对于不同的网络,精确度从$5.4\%$提高到$9\%$。 摘要:Federated adversary domain adaptation is a unique distributed minimax training task due to the prevalence of label imbalance among clients, with each client only seeing a subset of the classes of labels required to train a global model. To tackle this problem, we propose a distributed minimax optimizer referred to as FedMM, designed specifically for the federated adversary domain adaptation problem. It works well even in the extreme case where each client has different label classes and some clients only have unsupervised tasks. We prove that FedMM ensures convergence to a stationary point with domain-shifted unsupervised data. On a variety of benchmark datasets, extensive experiments show that FedMM consistently achieves either significant communication savings or significant accuracy improvements over federated optimizers based on the gradient descent ascent (GDA) algorithm. When training from scratch, for example, it outperforms other GDA based federated average methods by around $20\%$ in accuracy over the same communication rounds; and it consistently outperforms when training from pre-trained models with an accuracy improvement from $5.4\%$ to $9\%$ for different networks.
【14】 TESDA: Transform Enabled Statistical Detection of Attacks in Deep Neural Networks 标题:TESDA:基于变换的深层神经网络攻击统计检测 链接:https://arxiv.org/abs/2110.08447
作者:Chandramouli Amarnath,Aishwarya H. Balwani,Kwondo Ma,Abhijit Chatterjee 机构:Current research either (a) Reverse-engineers the trigger 1Department of ECE, Georgia Institute of Technology 备注:10 pages, 2 reference pages, 2 appendix pages, 14 figures, 2 tables 摘要:深度神经网络(DNN)现在是计算机视觉任务(如图像分类)的实际选择。然而,它们的复杂性和“黑箱”性质往往使它们部署的系统容易受到一系列安全威胁。因此,成功地识别此类威胁,特别是在安全关键的现实世界应用中,是至关重要的,但仍然是一个悬而未决的问题。我们提出了TESDA,这是一种低开销、灵活且基于统计的攻击{在线检测}方法,它利用了DNN中间层特征分布中的差异。与大多数以前的工作不同,我们既不需要专用硬件实时运行,也不需要存在特洛伊木马触发器来检测行为差异。我们以经验证明了我们的方法在多个体系结构、数据集和各种攻击中的实用性和实用性,一致地实现了95%以上的检测覆盖率,操作计数开销低至1-2%。 摘要:Deep neural networks (DNNs) are now the de facto choice for computer vision tasks such as image classification. However, their complexity and "black box" nature often renders the systems they're deployed in vulnerable to a range of security threats. Successfully identifying such threats, especially in safety-critical real-world applications is thus of utmost importance, but still very much an open problem. We present TESDA, a low-overhead, flexible, and statistically grounded method for {online detection} of attacks by exploiting the discrepancies they cause in the distributions of intermediate layer features of DNNs. Unlike most prior work, we require neither dedicated hardware to run in real-time, nor the presence of a Trojan trigger to detect discrepancies in behavior. We empirically establish our method's usefulness and practicality across multiple architectures, datasets and diverse attacks, consistently achieving detection coverages of above 95% with operation count overheads as low as 1-2%.
【15】 Unsupervised Natural Language Inference Using PHL Triplet Generation 标题:基于PHL三元组生成的无监督自然语言推理 链接:https://arxiv.org/abs/2110.08438
作者:Neeraj Varshney,Pratyay Banerjee,Tejas Gokhale,Chitta Baral 机构:Arizona State University 备注:9 pages, 2 figures, 8 tables 摘要:当在各自的训练数据集上进行训练时,基于转换器的模型在各种自然语言推理(NLI)基准上取得了令人印象深刻的性能。但是,在某些情况下,可能无法获得训练样本,或者收集样本可能会耗费大量时间和资源。在这项工作中,我们解决了这一挑战,并对无监督NLI进行了探索性研究,这是一种没有人类注释训练样本的范式。我们在三种具有挑战性的环境下研究NLI:PH、P和NPH,它们在可用于学习的未标记数据的范围上不同。作为一种解决方案,我们提出了一种过程数据生成方法,该方法利用一组句子转换来收集用于训练NLI模型的PHL(前提、假设、标签)三元组,而不需要人类注释的训练数据集。综合实验表明,该方法在PH、P、NPH设置下的准确率分别为66.75%、65.9%、65.39%,优于所有现有基线。此外,仅使用约0.1%的训练数据集(500个样本)对我们的模型进行微调,与在相同的500个实例上从头开始训练的模型相比,精确度提高了12.2%。 摘要:Transformer-based models have achieved impressive performance on various Natural Language Inference (NLI) benchmarks, when trained on respective training datasets. However, in certain cases, training samples may not be available or collecting them could be time-consuming and resource-intensive. In this work, we address this challenge and present an explorative study on unsupervised NLI, a paradigm in which no human-annotated training samples are available. We investigate NLI under three challenging settings: PH, P, and NPH that differ in the extent of unlabeled data available for learning. As a solution, we propose a procedural data generation approach that leverages a set of sentence transformations to collect PHL (Premise, Hypothesis, Label) triplets for training NLI models, bypassing the need for human-annotated training datasets. Comprehensive experiments show that this approach results in accuracies of 66.75%, 65.9%, 65.39% in PH, P, NPH settings respectively, outperforming all existing baselines. Furthermore, fine-tuning our models with as little as ~0.1% of the training dataset (500 samples) leads to 12.2% higher accuracy than the model trained from scratch on the same 500 instances.
【16】 Control Prefixes for Text Generation 标题:用于文本生成的控件前缀 链接:https://arxiv.org/abs/2110.08329
作者:Jordan Clive,Kris Cao,Marek Rei 机构:Imperial College London, DeepMind, London, UK 摘要:提示学习方法通过使用特定于任务的提示和输入,使预先训练的语言模型适应下游应用程序。当前关于文本生成中提示学习的大多数工作都依赖于数据集中所有示例的共享数据集级别的提示。我们扩展了这种方法,并提出了一种动态方法,即控制前缀,它允许在每个提示中包含条件输入相关信息。控制前缀位于即时学习和受控生成的交叉点,使模型能够在文本生成期间具有更细粒度的控制。该方法将属性级可学习表示合并到预先训练的转换器的不同层中,允许生成的文本在特定方向上被引导。我们对该技术进行了系统评估,并将其应用于自然语言生成(NLG)GEM基准中的五个数据集。我们展示了几个数据到文本数据集的最新结果,包括WebNLG。 摘要:Prompt learning methods adapt pre-trained language models to downstream applications by using a task-specific prompt together with the input. Most of the current work on prompt learning in text generation relies on a shared dataset-level prompt for all examples in the dataset. We extend this approach and propose a dynamic method, Control Prefixes, which allows for the inclusion of conditional input-dependent information in each prompt. Control Prefixes is at the intersection of prompt learning and controlled generation, empowering the model to have finer-grained control during text generation. The method incorporates attribute-level learnable representations into different layers of a pre-trained transformer, allowing for the generated text to be guided in a particular direction. We provide a systematic evaluation of the technique and apply it to five datasets from the GEM benchmark for natural language generation (NLG). We present state-of-the-art results on several data-to-text datasets, including WebNLG.
【17】 Mitigating Membership Inference Attacks by Self-Distillation Through a Novel Ensemble Architecture 标题:一种新的集成体系结构利用自蒸馏技术缓解成员关系推理攻击 链接:https://arxiv.org/abs/2110.08324
作者:Xinyu Tang,Saeed Mahloujifar,Liwei Song,Virat Shejwalkar,Milad Nasr,Amir Houmansadr,Prateek Mittal 机构:Princeton University, University of Massachusetts Amherst 摘要:在机器学习(ML)模型中,成员推理攻击是评估隐私泄漏的一个重要手段。这些攻击的目的是通过利用模型在成员和非成员输入上的差异行为来区分训练成员和非成员。这项工作的目标是训练具有高成员隐私的ML模型,同时在很大程度上保留其实用性;因此,我们的目标是一种经验的成员隐私保证,而不是由诸如差异隐私之类的技术提供的可证明的隐私保证,因为这些技术被证明会恶化模型效用。具体来说,我们提出了一个新的框架来训练隐私保护模型,该模型在成员和非成员输入上诱导相似的行为,以减轻成员推断攻击。我们的框架称为SELENA,它有两个主要组件。我们防御的第一个组成部分和核心是用于训练的新型集成架构。这种架构,我们称之为分割AI,将训练数据分割成随机子集,并在数据的每个子集上训练模型。我们在测试时使用自适应推理策略:我们的集成体系结构只聚合那些训练数据中不包含输入样本的模型的输出。我们证明了我们的分裂人工智能体系结构可以抵御一大类成员推理攻击,但是,它容易受到新的自适应攻击。因此,我们在我们的框架中使用了第二个组件,称为自蒸馏(Self-depression),以防止这种更强大的攻击。自蒸馏组件(Self-)通过我们的分离AI集成提取训练数据集,而不使用任何外部公共数据集。通过对主要基准数据集的大量实验,我们表明,与最新技术相比,SELENA在会员隐私和实用性之间具有更好的权衡。 摘要:Membership inference attacks are a key measure to evaluate privacy leakage in machine learning (ML) models. These attacks aim to distinguish training members from non-members by exploiting differential behavior of the models on member and non-member inputs. The goal of this work is to train ML models that have high membership privacy while largely preserving their utility; we therefore aim for an empirical membership privacy guarantee as opposed to the provable privacy guarantees provided by techniques like differential privacy, as such techniques are shown to deteriorate model utility. Specifically, we propose a new framework to train privacy-preserving models that induces similar behavior on member and non-member inputs to mitigate membership inference attacks. Our framework, called SELENA, has two major components. The first component and the core of our defense is a novel ensemble architecture for training. This architecture, which we call Split-AI, splits the training data into random subsets, and trains a model on each subset of the data. We use an adaptive inference strategy at test time: our ensemble architecture aggregates the outputs of only those models that did not contain the input sample in their training data. We prove that our Split-AI architecture defends against a large family of membership inference attacks, however, it is susceptible to new adaptive attacks. Therefore, we use a second component in our framework called Self-Distillation to protect against such stronger attacks. The Self-Distillation component (self-)distills the training dataset through our Split-AI ensemble, without using any external public datasets. Through extensive experiments on major benchmark datasets we show that SELENA presents a superior trade-off between membership privacy and utility compared to the state of the art.
【18】 Memory-augmented Adversarial Autoencoders for Multivariate Time-series Anomaly Detection with Deep Reconstruction and Prediction 标题:用于深度重构和预测的多变量时间序列异常检测的记忆增强型对抗性自动编码器 链接:https://arxiv.org/abs/2110.08306
作者:Qinfeng Xiao,Shikuan Shao,Jing Wang 机构:Beijing Jiaotong University 摘要:由于当今IT监控系统的规模和复杂性不断增加,在没有人工监控的情况下检测多变量时间序列的异常仍然是一个具有挑战性的问题。无监督时间序列异常检测的最新进展主要是使用深度自动编码器来解决这一问题,即对正常样本进行训练并对异常输入产生显著的重构误差。然而,在实践中,由于神经网络的强大功能,自动编码器可以很好地重建异常。此外,这些方法对于识别非点异常(例如上下文异常和集体异常)可能无效,因为它们仅使用点态重建目标。为了解决上述问题,我们提出了一种新的时间序列无监督异常检测方法MemAAE(\textit{具有深度重建和预测的内存增强对抗式自动编码器})。通过使用共享网络结构联合训练两个互补的代理任务(重建和预测),我们表明,通过多个任务检测异常比单任务训练具有更好的性能。此外,还引入了一个压缩内存模块来保持正常模式,避免对异常输入的意外泛化。通过大量实验,MemAAE在四个公共数据集上的F1总分为0.90,显著优于最佳基线0.02。 摘要:Detecting anomalies for multivariate time-series without manual supervision continues a challenging problem due to the increased scale of dimensions and complexity of today's IT monitoring systems. Recent progress of unsupervised time-series anomaly detection mainly use deep autoencoders to solve this problem, i.e. training on normal samples and producing significant reconstruction error on abnormal inputs. However, in practice, autoencoders can reconstruct anomalies so well, due to powerful capabilites of neural networks. Besides, these approaches can be ineffective for identifying non-point anomalies, e.g. contextual anomalies and collective anomalies, since they solely utilze a point-wise reconstruction objective. To tackle the above issues, we propose MemAAE (\textit{Memory-augmented Adversarial Autoencoders with Deep Reconstruction and Prediction}), a novel unsupervised anomaly detection method for time-series. By jointly training two complementary proxy tasks, reconstruction and prediction, with a shared network architecture, we show that detecting anomalies via multiple tasks obtains superior performance rather than single-task training. Additionally, a compressive memory module is introduced to preserve normal patterns, avoiding unexpected generalization on abnormal inputs. Through extensive experiments, MemAAE achieves an overall F1 score of 0.90 on four public datasets, significantly outperforming the best baseline by 0.02.
【19】 Model-Agnostic Meta-Attack: Towards Reliable Evaluation of Adversarial Robustness 标题:模型不可知元攻击:对敌方健壮性的可靠评估 链接:https://arxiv.org/abs/2110.08256
作者:Xiao Yang,Yinpeng Dong,Wenzhao Xiang,Tianyu Pang,Hang Su,Jun Zhu 机构: Department of Computer Science & Technology, Tsinghua University, Shanghai Jiao Tong University 摘要:深层神经网络对对抗性示例的脆弱性促使越来越多的防御策略用于提高模型的鲁棒性。然而,由于稳健性评估不足,进展通常受到阻碍。作为评估对抗性稳健性的事实标准,对抗性攻击通常通过迭代过程来解决制作对抗性示例的优化问题。在这项工作中,我们提出了一种模型不可知元攻击(MAMA)方法来自动发现更强的攻击算法。我们的方法学习由递归神经网络参数化的对抗性攻击中的优化器,该网络通过一类数据样本和防御进行训练,以在对抗性示例生成过程中生成有效的更新方向。此外,我们还提出了一种模型不可知的训练算法,以提高学习优化器在攻击隐蔽防御时的泛化能力。我们的方法可以灵活地与各种攻击结合,并在不增加额外计算成本的情况下持续提高性能。大量实验表明,与针对不同防御的最新攻击相比,MAMA所学习的攻击是有效的,从而更可靠地评估了对手的鲁棒性。 摘要:The vulnerability of deep neural networks to adversarial examples has motivated an increasing number of defense strategies for promoting model robustness. However, the progress is usually hampered by insufficient robustness evaluations. As the de facto standard to evaluate adversarial robustness, adversarial attacks typically solve an optimization problem of crafting adversarial examples with an iterative process. In this work, we propose a Model-Agnostic Meta-Attack (MAMA) approach to discover stronger attack algorithms automatically. Our method learns the optimizer in adversarial attacks parameterized by a recurrent neural network, which is trained over a class of data samples and defenses to produce effective update directions during adversarial example generation. Furthermore, we develop a model-agnostic training algorithm to improve the generalization ability of the learned optimizer when attacking unseen defenses. Our approach can be flexibly incorporated with various attacks and consistently improves the performance with little extra computational cost. Extensive experiments demonstrate the effectiveness of the learned attacks by MAMA compared to the state-of-the-art attacks on different defenses, leading to a more reliable evaluation of adversarial robustness.
【20】 BAPGAN: GAN-based Bone Age Progression of Femur and Phalange X-ray Images 标题:BAPGAN:GaN基股骨和指骨X射线图像的骨龄演变 链接:https://arxiv.org/abs/2110.08509
作者:Shinji Nakazawa,Changhee Han,Joe Hasei,Ryuichi Nakahara,Toshifumi Ozaki 机构: LPIXEL Inc., Tokyo, Japan, Saitama Prefectural University, Saitama, Japan, Okayama City General Medical Center, Okayama City Hospital, Okayama, Japan, Dept. of Orthopaedic Surgery, Grad. School of Medicine, Dentistry and Pharmaceutical Sciences 备注:6 pages, 5 figures, accepted to SPIE Medical Imaging 2022 摘要:卷积神经网络在骨龄评估中起着关键作用,用于研究各种形态和身体区域下的内分泌、遗传和生长障碍。然而,尽管骨龄进展/回归具有潜在的应用价值:骨相关疾病诊断、临床知识获取和博物馆教育,但还没有研究人员研究过它。因此,我们提出骨龄进展生成性对抗网络(BAPGAN)在保留身份和真实感的同时,对股骨/指骨X射线图像进行进展/回归。我们通过Frechet起始距离、两位专业骨科医师的视觉图灵测试和t分布随机邻域嵌入,彻底证实了BAPGAN的临床潜力。 摘要:Convolutional Neural Networks play a key role in bone age assessment for investigating endocrinology, genetic, and growth disorders under various modalities and body regions. However, no researcher has tackled bone age progression/regression despite its valuable potential applications: bone-related disease diagnosis, clinical knowledge acquisition, and museum education. Therefore, we propose Bone Age Progression Generative Adversarial Network (BAPGAN) to progress/regress both femur/phalange X-ray images while preserving identity and realism. We exhaustively confirm the BAPGAN's clinical potential via Frechet Inception Distance, Visual Turing Test by two expert orthopedists, and t-Distributed Stochastic Neighbor Embedding.
【21】 Adversarial Attacks on Gaussian Process Bandits 标题:对高斯过程环的对抗性攻击 链接:https://arxiv.org/abs/2110.08449
作者:Eric Han,Jonathan Scarlett 机构:School of Computing, National University of Singapore, Department of Mathematics & Institute of Data Science, National University of Singapore 摘要:高斯过程(GP)是一种被广泛采用的工具,用于顺序优化黑盒函数,在这种情况下,评估成本高且可能有噪声。最近关于GP bandits的工作建议超越随机噪声,设计对对抗性攻击具有鲁棒性的算法。在本文中,我们从攻击者的角度研究这个问题,提出各种对抗性攻击对攻击者的实力和先验信息有不同假设的al攻击方法。我们的目标是从理论和实践的角度理解对GP bandits的对抗性攻击。我们主要关注对流行的GP-UCB算法和相关的基于消除的算法的有针对性的攻击,基于对抗性pertur将函数$f$绑定到另一个函数$\tilde{f}$,其最优值在某个区域$\mathcal{R}{\rm target}$。根据我们的理论分析,我们设计了白盒攻击(已知$f$)和黑盒攻击(未知$f$),前者包括减法攻击和剪切攻击,后者包括攻击性减法攻击。我们证明了对GP bandits的对抗性攻击可以成功地迫使算法朝向$\mathcal{R}{\rm target}美元,即使攻击预算很低,我们也会比较几种真实和合成功能的攻击性能和效率。 摘要:Gaussian processes (GP) are a widely-adopted tool used to sequentially optimize black-box functions, where evaluations are costly and potentially noisy. Recent works on GP bandits have proposed to move beyond random noise and devise algorithms robust to adversarial attacks. In this paper, we study this problem from the attacker's perspective, proposing various adversarial attack methods with differing assumptions on the attacker's strength and prior information. Our goal is to understand adversarial attacks on GP bandits from both a theoretical and practical perspective. We focus primarily on targeted attacks on the popular GP-UCB algorithm and a related elimination-based algorithm, based on adversarially perturbing the function $f$ to produce another function $\tilde{f}$ whose optima are in some region $\mathcal{R}_{\rm target}$. Based on our theoretical analysis, we devise both white-box attacks (known $f$) and black-box attacks (unknown $f$), with the former including a Subtraction attack and Clipping attack, and the latter including an Aggressive subtraction attack. We demonstrate that adversarial attacks on GP bandits can succeed in forcing the algorithm towards $\mathcal{R}_{\rm target}$ even with a low attack budget, and we compare our attacks' performance and efficiency on several real and synthetic functions.
半/弱/无/有监督|不确定性|主动学习(13篇)
【1】 Unsupervised Finetuning 标题:无监督微调 链接:https://arxiv.org/abs/2110.09510
作者:Suichan Li,Dongdong Chen,Yinpeng Chen,Lu Yuan,Lei Zhang,Qi Chu,Bin Liu,Nenghai Yu 机构:University of Science and Technology of China, Microsoft Cloud & AI 摘要:本文研究了“无监督微调”这一著名的“有监督微调”的对称问题。在给定预训练模型和小规模未标记目标数据的情况下,无监督微调是将预训练后的表示形式从源域调整到目标域,从而获得更好的传输性能。这个问题比有监督的问题更具挑战性,因为小规模目标数据中的低数据密度不利于无监督学习,导致目标域中的预训练表示和差表示受到破坏。在本文中,我们发现在将精细调整范式从监督转向非监督时,源数据至关重要,并提出了两种简单有效的策略,将源数据和目标数据结合到无监督精细调整中:“稀疏源数据重放”和“数据混合”。前一种策略的动机是添加一小部分源数据以占据其预训练的表示空间,并帮助将目标数据推送到更小的紧凑空间中;后一种策略的动机是增加数据密度,帮助学习更紧凑的表示。为了证明我们提出的“无监督微调”策略的有效性,我们在多个不同的目标数据集上进行了大量的实验,实验结果表明,该策略比朴素策略具有更好的传输性能。 摘要:This paper studies "unsupervised finetuning", the symmetrical problem of the well-known "supervised finetuning". Given a pretrained model and small-scale unlabeled target data, unsupervised finetuning is to adapt the representation pretrained from the source domain to the target domain so that better transfer performance can be obtained. This problem is more challenging than the supervised counterpart, as the low data density in the small-scale target data is not friendly for unsupervised learning, leading to the damage of the pretrained representation and poor representation in the target domain. In this paper, we find the source data is crucial when shifting the finetuning paradigm from supervise to unsupervise, and propose two simple and effective strategies to combine source and target data into unsupervised finetuning: "sparse source data replaying", and "data mixing". The motivation of the former strategy is to add a small portion of source data back to occupy their pretrained representation space and help push the target data to reside in a smaller compact space; and the motivation of the latter strategy is to increase the data density and help learn more compact representation. To demonstrate the effectiveness of our proposed ``unsupervised finetuning'' strategy, we conduct extensive experiments on multiple different target datasets, which show better transfer performance than the naive strategy.
【2】 Understanding Dimensional Collapse in Contrastive Self-supervised Learning 标题:理解对比性自我监督学习中的维度塌陷 链接:https://arxiv.org/abs/2110.09348
作者:Li Jing,Pascal Vincent,Yann LeCun,Yuandong Tian 机构:Facebook AI Research 备注:15 pages, 10 figures 摘要:自监督视觉表征学习旨在学习有用的表征,而不依赖于人类的注释。联合嵌入方法基于最大化同一图像不同视图的嵌入向量之间的一致性。人们已经提出了各种方法来解决所有嵌入向量折叠为平凡常数解的折叠问题。在这些方法中,对比学习通过负样本对防止崩溃。已经证明,非对比方法遭受不同性质的较小折叠问题:维度折叠,嵌入向量最终跨越低维子空间而不是整个可用嵌入空间。在这里,我们表明,维度崩溃也发生在对比学习中。在这篇论文中,我们揭示了在对比学习中导致维度崩溃的动态机制。受我们理论的启发,我们提出了一种新的对比学习方法,称为DirectCLR,它直接优化表示空间,而不依赖可训练的投影仪。实验表明,在ImageNet上使用可训练线性投影仪时,DirectCLR的性能优于SimCLR。 摘要:Self-supervised visual representation learning aims to learn useful representations without relying on human annotations. Joint embedding approach bases on maximizing the agreement between embedding vectors from different views of the same image. Various methods have been proposed to solve the collapsing problem where all embedding vectors collapse to a trivial constant solution. Among these methods, contrastive learning prevents collapse via negative sample pairs. It has been shown that non-contrastive methods suffer from a lesser collapse problem of a different nature: dimensional collapse, whereby the embedding vectors end up spanning a lower-dimensional subspace instead of the entire available embedding space. Here, we show that dimensional collapse also happens in contrastive learning. In this paper, we shed light on the dynamics at play in contrastive learning that leads to dimensional collapse. Inspired by our theory, we propose a novel contrastive learning method, called DirectCLR, which directly optimizes the representation space without relying on a trainable projector. Experiments show that DirectCLR outperforms SimCLR with a trainable linear projector on ImageNet.
【3】 Self-Supervised Representation Learning: Introduction, Advances and Challenges 标题:自我监督表征学习:介绍、进展与挑战 链接:https://arxiv.org/abs/2110.09327
作者:Linus Ericsson,Henry Gouk,Chen Change Loy,Timothy M. Hospedales 摘要:自监督表示学习方法旨在提供强大的深度特征学习,而无需大型标注数据集,从而缓解标注瓶颈,这是当前深度学习实际应用的主要障碍之一。近年来,这些方法发展迅速,其功效接近,有时甚至超过了各种数据模式(包括图像、视频、声音、文本和图形)中完全监督的训练前备选方案。本文介绍了这一充满活力的领域,包括关键概念、四大方法家族和相关的最新技术,以及如何将自我监督方法应用于各种数据模式。我们进一步讨论实际考虑因素,包括工作流、表示可转移性和计算成本。最后,我们调查了该领域中为未来工作提供肥沃土壤的主要公开挑战。 摘要:Self-supervised representation learning methods aim to provide powerful deep feature learning without the requirement of large annotated datasets, thus alleviating the annotation bottleneck that is one of the main barriers to practical deployment of deep learning today. These methods have advanced rapidly in recent years, with their efficacy approaching and sometimes surpassing fully supervised pre-training alternatives across a variety of data modalities including image, video, sound, text and graphs. This article introduces this vibrant area including key concepts, the four main families of approach and associated state of the art, and how self-supervised methods are applied to diverse modalities of data. We further discuss practical considerations including workflows, representation transferability, and compute cost. Finally, we survey the major open challenges in the field that provide fertile ground for future work.
【4】 Utilizing Active Machine Learning for Quality Assurance: A Case Study of Virtual Car Renderings in the Automotive Industry 标题:利用主动机器学习进行质量保证--以汽车行业虚拟汽车渲染为例 链接:https://arxiv.org/abs/2110.09023
作者:Patrick Hemmer,Niklas Kühl,Jakob Schöffer 机构:Karlsruhe Institute of Technology, Niklas K¨uhl, Jakob Sch¨offer 备注:Hawaii International Conference on System Sciences 2022 (HICSS-55) 摘要:计算机生成的汽车模型图像已成为汽车制造商广告理念中不可或缺的一部分。例如,它们被用于汽车配置程序中,为客户提供根据个人喜好在线配置汽车的可能性。然而,由于车型日益复杂,以人为主导的质量保证面临着跟上大批量目视检查的挑战。尽管机器学习在许多视觉检查任务中的应用已经取得了巨大的成功,但它对大型标记数据集的需求仍然是在实践中使用此类系统的主要障碍。在本文中,我们提出了一种基于主动机器学习的质量保证系统,该系统在不影响性能的情况下,需要显著更少的标记实例来识别有缺陷的虚拟汽车渲染。通过在一家德国汽车制造商使用我们的系统,可以克服启动困难,提高检测过程效率,从而实现经济优势。 摘要:Computer-generated imagery of car models has become an indispensable part of car manufacturers' advertisement concepts. They are for instance used in car configurators to offer customers the possibility to configure their car online according to their personal preferences. However, human-led quality assurance faces the challenge to keep up with high-volume visual inspections due to the car models' increasing complexity. Even though the application of machine learning to many visual inspection tasks has demonstrated great success, its need for large labeled data sets remains a central barrier to using such systems in practice. In this paper, we propose an active machine learning-based quality assurance system that requires significantly fewer labeled instances to identify defective virtual car renderings without compromising performance. By employing our system at a German automotive manufacturer, start-up difficulties can be overcome, the inspection process efficiency can be increased, and thus economic advantages can be realized.
【5】 Demystifying How Self-Supervised Features Improve Training from Noisy Labels 标题:揭开自我监督特征如何从噪声标签中改进训练的神秘面纱 链接:https://arxiv.org/abs/2110.09022
作者:Hao Cheng,Zhaowei Zhu,Xing Sun,Yang Liu 机构:† Computer Science and Engineering, University of California, Santa Cruz, ‡ Tencent YouTu Lab 摘要:自我监督学习(SSL)的发展促使研究人员将SSL应用于其他任务,如使用噪声标签进行学习。最近的文献表明,基于SSL特性的方法可以极大地提高带噪标签的学习性能。尽管如此,人们对SSL特性为什么(以及如何)从嘈杂的标签中受益于训练的深层次原因还不太了解。在本文中,我们通过理论分析和数值实验研究了自监督特征为何以及如何帮助网络抵抗标签噪声。我们的结果表明,给定一个从SSL预训练的高质量编码器,由交叉熵损失训练的简单线性层理论上对对称标签噪声具有鲁棒性。此外,我们还深入了解了从SSL特性中提取的知识如何缓解过度拟合问题。我们希望我们的工作能够从自我监督学习的角度更好地理解带噪声标签的学习,并为进一步的研究提供指导。代码可从github.com/UCSC-REAL/SelfSup\u NoisyLabel获得。 摘要:The advancement of self-supervised learning (SSL) motivates researchers to apply SSL on other tasks such as learning with noisy labels. Recent literature indicates that methods built on SSL features can substantially improve the performance of learning with noisy labels. Nonetheless, the deeper reasons why (and how) SSL features benefit the training from noisy labels are less understood. In this paper, we study why and how self-supervised features help networks resist label noise using both theoretical analyses and numerical experiments. Our result shows that, given a quality encoder pre-trained from SSL, a simple linear layer trained by the cross-entropy loss is theoretically robust to symmetric label noise. Further, we provide insights for how knowledge distilled from SSL features can alleviate the over-fitting problem. We hope our work provides a better understanding for learning with noisy labels from the perspective of self-supervised learning and can potentially serve as a guideline for further research. Code is available at github.com/UCSC-REAL/SelfSup_NoisyLabel.
【6】 Self-Supervised Learning for Binary Networks by Joint Classifier Training 标题:基于联合分类器训练的二值网络自监督学习 链接:https://arxiv.org/abs/2110.08851
作者:Dahyun Kim,Jonghyun Choi 机构:Gwangju Institute of Science and Technology (GIST), South Korea; ,NAVER AI Lab 摘要:尽管使用大型浮点网络进行自我监督学习取得了巨大成功,但此类网络并不容易部署到边缘设备上。为了通过无监督表示学习加速模型部署到各种下游任务的边缘设备,我们提出了一种二进制网络的自监督学习方法。特别地,我们建议使用一个随机初始化的分类器作为目标连接到一个预训练的浮点特征提取器,并与一个二进制网络联合训练它。为了更好地训练二进制网络,我们提出了一种特征相似性损失、损失项动态平衡方案和改进的多级训练。我们称我们的方法为BSSL。我们的实验验证表明,BSSL在各种下游任务中优于二进制网络的自监督学习基线,在某些任务中优于监督预训练。 摘要:Despite the great success of self-supervised learning with large floating point networks, such networks are not readily deployable to edge devices. To accelerate deployment of models to edge devices for various downstream tasks by unsupervised representation learning, we propose a self-supervised learning method for binary networks. In particular, we propose to use a randomly initialized classifier attached to a pretrained floating point feature extractor as targets and jointly train it with a binary network. For better training of the binary network, we propose a feature similarity loss, a dynamic balancing scheme of loss terms, and modified multi-stage training. We call our method as BSSL. Our empirical validations show that BSSL outperforms self-supervised learning baselines for binary networks in various downstream tasks and outperforms supervised pretraining in certain tasks.
【7】 Deep Active Learning by Leveraging Training Dynamics 标题:利用训练动态实现深度主动学习 链接:https://arxiv.org/abs/2110.08611
作者:Haonan Wang,Wei Huang,Andrew Margenot,Hanghang Tong,Jingrui He 机构:University of Illinois at Urbana Champaign,University of Technology Sydney 摘要:主动学习理论和方法在经典的统计学习环境中得到了广泛的研究。然而,深度主动学习,即使用深度学习模型的主动学习,通常基于经验标准,没有坚实的理论依据,因此,当其中一些无法在应用中提供益处时,会受到严重质疑。在本文中,通过探索泛化性能与训练动态之间的联系,我们提出了一种理论驱动的深度主动学习方法(dynamic),该方法通过选择样本来最大化训练动态。特别地,我们证明了在超宽条件下,训练的收敛速度与泛化性能正相关,并表明最大化训练动态会导致更好的泛化性能。此外,为了扩展到大型深层神经网络和数据集,我们对子集选择问题引入了两种松弛,并将时间复杂度从多项式降低到常数。实证结果表明,Dynamic不仅始终优于其他基线,而且在大型深度学习模型上也具有良好的可扩展性。我们希望我们的工作能激发更多的尝试,在深度网络的理论发现和深度主动学习应用的实际影响之间架起桥梁。 摘要:Active learning theories and methods have been extensively studied in classical statistical learning settings. However, deep active learning, i.e., active learning with deep learning models, is usually based on empirical criteria without solid theoretical justification, thus suffering from heavy doubts when some of those fail to provide benefits in applications. In this paper, by exploring the connection between the generalization performance and the training dynamics, we propose a theory-driven deep active learning method (dynamicAL) which selects samples to maximize training dynamics. In particular, we prove that convergence speed of training and the generalization performance is positively correlated under the ultra-wide condition and show that maximizing the training dynamics leads to a better generalization performance. Further on, to scale up to large deep neural networks and data sets, we introduce two relaxations for the subset selection problem and reduce the time complexity from polynomial to constant. Empirical results show that dynamicAL not only outperforms the other baselines consistently but also scales well on large deep learning models. We hope our work inspires more attempts in bridging the theoretical findings of deep networks and practical impacts in deep active learning applications.
【8】 Physics-guided Deep Markov Models for Learning Nonlinear Dynamical Systems with Uncertainty 标题:物理引导的不确定非线性动力系统学习的深马尔可夫模型 链接:https://arxiv.org/abs/2110.08607
作者:Wei Liu,Zhilu Lai,Kiran Bacsa,Eleni Chatzi 机构: Chair of Structural Mechanics and Monitoring, Department of Civil, Environmental and Geomatic, Engineering, ETH-Z¨urich, Z¨urich, Switzerland, Future Resilient Systems, Singapore-ETH Centre, Singapore 摘要:在本文中,我们提出了一个概率物理指导框架,称为物理指导的深马尔可夫模型(PgDMM)。该框架特别适用于从测量数据推断非线性动力系统的特征和潜在结构,其中,潜在变量的精确推断通常很难实现。最近出现的一个选项涉及利用变分推理来执行近似推理。在这种方案中,系统的过渡和发射函数通过前馈神经网络(深层生成模型)参数化。然而,由于神经网络函数的通用性和高度通用性,学习的潜在空间往往缺乏物理解释和结构化表示。为了解决这个问题,我们将基于物理的状态空间模型与深度马尔可夫模型相结合,从而为非线性动力系统的无监督学习和辨识提供了一个混合建模框架。具体来说,过渡过程可以建模为一个基于物理的模型,该模型通过添加神经网络组件进行增强,旨在了解基于物理的模型与被监测的实际动力系统之间的差异。该框架利用了深度学习的表达能力,同时通过在潜在空间一侧施加物理驱动的限制,保留了动力系统的驱动物理。我们在非线性系统的说明性仿真示例和实验案例研究中展示了这种融合在提高性能方面的优势。我们的结果表明,所采用的跃迁和发射函数所涉及的基于物理的模型本质上加强了更结构化和物理上可解释的潜在空间,这对于泛化和预测能力至关重要。 摘要:In this paper, we propose a probabilistic physics-guided framework, termed Physics-guided Deep Markov Model (PgDMM). The framework is especially targeted to the inference of the characteristics and latent structure of nonlinear dynamical systems from measurement data, where it is typically intractable to perform exact inference of latent variables. A recently surfaced option pertains to leveraging variational inference to perform approximate inference. In such a scheme, transition and emission functions of the system are parameterized via feed-forward neural networks (deep generative models). However, due to the generalized and highly versatile formulation of neural network functions, the learned latent space is often prone to lack physical interpretation and structured representation. To address this, we bridge physics-based state space models with Deep Markov Models, thus delivering a hybrid modeling framework for unsupervised learning and identification for nonlinear dynamical systems. Specifically, the transition process can be modeled as a physics-based model enhanced with an additive neural network component, which aims to learn the discrepancy between the physics-based model and the actual dynamical system being monitored. The proposed framework takes advantage of the expressive power of deep learning, while retaining the driving physics of the dynamical system by imposing physics-driven restrictions on the side of the latent space. We demonstrate the benefits of such a fusion in terms of achieving improved performance on illustrative simulation examples and experimental case studies of nonlinear systems. Our results indicate that the physics-based models involved in the employed transition and emission functions essentially enforce a more structured and physically interpretable latent space, which is essential to generalization and prediction capabilities.
【9】 Knowledge-driven Active Learning 标题:知识驱动的主动学习 链接:https://arxiv.org/abs/2110.08265
作者:Gabriele Ciravegna,Frederic Precioso,Marco Gori 机构:Universita di Firenze, Universite Cˆote d’Azur, Italy,France, Universita di Siena 备注:Submitted to the ICLR 2022 conference 摘要:在过去几年中,深度学习模式变得越来越流行。然而,在受监督的数据量有限且人工标签费用昂贵的情况下,仍然无法部署它们。主动学习策略旨在解决这一问题,只需要对少数未标记样本进行监督,在将样本添加到训练集中后,可以提高大多数模型的性能。大多数策略都基于不确定样本选择,甚至常常局限于靠近决策边界的样本。在这里,我们提出了一种非常不同的方法,考虑到领域知识。事实上,在多标签分类的情况下,类之间的关系提供了一种发现非相干预测的方法,即模型最有可能需要监督的预测。我们开发了一个框架,将一阶逻辑知识转换为约束,并检查它们的违反情况,作为样本选择的自然指南。我们的经验表明,知识驱动策略优于标准策略,尤其是在领域知识完整的数据集上。此外,我们还展示了该方法如何发现远离训练数据的数据分布。最后,所提出的知识驱动策略也可以很容易地用于基于标准不确定性的技术难以应用的目标检测问题。 摘要:In the last few years, Deep Learning models have become increasingly popular. However, their deployment is still precluded in those contexts where the amount of supervised data is limited and manual labelling expensive. Active learning strategies aim at solving this problem by requiring supervision only on few unlabelled samples, which improve the most model performances after adding them to the training set. Most strategies are based on uncertain sample selection, and even often restricted to samples lying close to the decision boundary. Here we propose a very different approach, taking into consideration domain knowledge. Indeed, in the case of multi-label classification, the relationships among classes offer a way to spot incoherent predictions, i.e., predictions where the model may most likely need supervision. We have developed a framework where first-order-logic knowledge is converted into constraints and their violation is checked as a natural guide for sample selection. We empirically demonstrate that knowledge-driven strategy outperforms standard strategies, particularly on those datasets where domain knowledge is complete. Furthermore, we show how the proposed approach enables discovering data distributions lying far from training data. Finally, the proposed knowledge-driven strategy can be also easily used in object-detection problems where standard uncertainty-based techniques are difficult to apply.
【10】 FlexMatch: Boosting Semi-Supervised Learning with Curriculum Pseudo Labeling 标题:FlexMatch:用课程伪标签促进半监督学习 链接:https://arxiv.org/abs/2110.08263
作者:Bowen Zhang,Yidong Wang,Wenxin Hou,Hao Wu,Jindong Wang,Manabu Okumura,Takahiro Shinozaki 机构:Tokyo Institute of Technology, Microsoft Research Asia 备注:Accepted by NeurIPS 2021; 16 pages with appendix; code: this https URL 摘要:最近提出的FixMatch在大多数半监督学习(SSL)基准上取得了最先进的结果。然而,与其他现代SSL算法一样,FixMatt使用预定义的常数阈值来选择所有有助于训练的未标记数据,因此不能考虑不同的学习状态和不同类别的学习困难。为了解决这个问题,我们提出了课程伪标记(CPL),一种根据模型的学习状态利用未标记数据的课程学习方法。CPL的核心是在每个时间步灵活地调整不同类的阈值,以便传递信息性的未标记数据及其伪标签。CPL不引入额外的参数或计算(正向或反向传播)。我们将CPL应用于FixMatch,并将改进后的算法称为FlexMatch。FlexMatch在各种SSL基准上实现了最先进的性能,当标记的数据非常有限或任务具有挑战性时,性能尤其强大。例如,当每个类只有4个标签时,FlexMatch在CIFAR-100和STL-10数据集上的性能分别比FixMatch好14.32%和24.55%。CPL还显著提高了收敛速度,例如,FlexMatch可以仅使用FixMatch的1/5训练时间来实现更好的性能。此外,我们还证明了CPL可以很容易地适应其他SSL算法,并显著提高了它们的性能。我们在https://github.com/TorchSSL/TorchSSL. 摘要:The recently proposed FixMatch achieved state-of-the-art results on most semi-supervised learning (SSL) benchmarks. However, like other modern SSL algorithms, FixMatch uses a pre-defined constant threshold for all classes to select unlabeled data that contribute to the training, thus failing to consider different learning status and learning difficulties of different classes. To address this issue, we propose Curriculum Pseudo Labeling (CPL), a curriculum learning approach to leverage unlabeled data according to the model's learning status. The core of CPL is to flexibly adjust thresholds for different classes at each time step to let pass informative unlabeled data and their pseudo labels. CPL does not introduce additional parameters or computations (forward or backward propagation). We apply CPL to FixMatch and call our improved algorithm FlexMatch. FlexMatch achieves state-of-the-art performance on a variety of SSL benchmarks, with especially strong performances when the labeled data are extremely limited or when the task is challenging. For example, FlexMatch outperforms FixMatch by 14.32% and 24.55% on CIFAR-100 and STL-10 datasets respectively, when there are only 4 labels per class. CPL also significantly boosts the convergence speed, e.g., FlexMatch can use only 1/5 training time of FixMatch to achieve even better performance. Furthermore, we show that CPL can be easily adapted to other SSL algorithms and remarkably improve their performances. We open source our code at https://github.com/TorchSSL/TorchSSL.
【11】 Unsupervised Learned Kalman Filtering 标题:无监督学习卡尔曼滤波 链接:https://arxiv.org/abs/2110.09005
作者:Guy Revach,Nir Shlezinger,Timur Locher,Xiaoyong Ni,Ruud J. G. van Sloun,Yonina C. Eldar 机构: Ben-Gurion University of the Negev 备注:5 Pages, 5 Figures, Submitted to ICASSP 2022 摘要:在本文中,我们采用KalmanNet,这是一个最近提出的深度神经网络(DNN)辅助系统,其结构遵循基于模型的卡尔曼滤波器(KF)的操作,以无监督的方式学习其映射,即不需要地面真值状态。无监督自适应是通过利用KalmanNet基于模型/数据驱动的混合架构实现的,该架构与KF一样,在内部预测下一次观测。然后,这些内部特征用于计算系统输出的损失,而不是状态估计。利用KalmanNet的无监督学习能力,不仅可以跟踪隐藏状态,还可以适应状态空间(SS)模型的变化。我们数值证明,当噪声统计未知时,无监督KalmanNet与有监督学习的KalmanNet具有相似的性能。我们还表明,由于无监督能力,我们可以使预先训练的KalmanNet适应不断变化的SS模型,而无需提供额外的数据。 摘要:In this paper we adapt KalmanNet, which is a recently pro-posed deep neural network (DNN)-aided system whose architecture follows the operation of the model-based Kalman filter (KF), to learn its mapping in an unsupervised manner, i.e., without requiring ground-truth states. The unsupervised adaptation is achieved by exploiting the hybrid model-based/data-driven architecture of KalmanNet, which internally predicts the next observation as the KF does. These internal features are then used to compute the loss rather than the state estimate at the output of the system. With the capability of unsupervised learning, one can use KalmanNet not only to track the hidden state, but also to adapt to variations in the state space (SS) model. We numerically demonstrate that when the noise statistics are unknown, unsupervised KalmanNet achieves a similar performance to KalmanNet with supervised learning. We also show that we can adapt a pre-trained KalmanNet to changing SS models without providing additional data thanks to the unsupervised capabilities.
【12】 Self-Supervised U-Net for Segmenting Flat and Sessile Polyps 标题:基于自监督U网的扁平和硬性息肉分割 链接:https://arxiv.org/abs/2110.08776
作者:Debayan Bhattacharya,Christian Betz,Dennis Eggert,Alexander Schlaefer 机构:a,dHamburg University of Technology, Hamburg, Germany, a,b,c Universit¨atsklinikum Hamburg-Eppendorf, Hamburg, Germany, . PURPOSE 摘要:结直肠癌(CRC)对公众健康构成巨大风险。它是美国第三大最常见的癌症病因。结直肠息肉的发生是癌症的早期症状之一。早期发现和切除息肉可以大大提高生存率至90%。人工检查可能导致误检,因为息肉的颜色、形状、大小和外观各不相同。为此,计算机辅助诊断系统(CADx)被提出,通过处理结肠镜视频来检测息肉。该系统起到辅助检查的作用,帮助临床医生减少误检,以便在息肉转化为癌症之前进行切除。息肉的颜色、形状、大小、质地和外观各不相同。因此,尽管CADx溶液非常突出,息肉的漏检率仍在6%至27%之间。此外,直径小于10毫米的无柄扁平息肉更容易被发现。卷积神经网络(CNN)在息肉分割中显示出良好的效果。然而,所有这些工作都有一个有监督的方法,并且受到数据集大小的限制。据观察,较小的数据集会降低ResUNet++的分割精度。我们训练一个U网络来修复图像中随机丢失的像素作为代理任务。我们用于预训练的数据集是Kvasir SEG数据集。随后,在有限的Kvasir会话数据集上进行监督训练。我们的实验结果表明,对于有限的注释数据集和较大的未标记数据集,自监督方法是一种比完全监督方法更好的选择。具体来说,我们的自监督U-Net比在Kvasir Sessile数据集上以监督方式训练的五种分割模型表现得更好。 摘要:Colorectal Cancer(CRC) poses a great risk to public health. It is the third most common cause of cancer in the US. Development of colorectal polyps is one of the earliest signs of cancer. Early detection and resection of polyps can greatly increase survival rate to 90%. Manual inspection can cause misdetections because polyps vary in color, shape, size and appearance. To this end, Computer-Aided Diagnosis systems(CADx) has been proposed that detect polyps by processing the colonoscopic videos. The system acts a secondary check to help clinicians reduce misdetections so that polyps may be resected before they transform to cancer. Polyps vary in color, shape, size, texture and appearance. As a result, the miss rate of polyps is between 6% and 27% despite the prominence of CADx solutions. Furthermore, sessile and flat polyps which have diameter less than 10 mm are more likely to be undetected. Convolutional Neural Networks(CNN) have shown promising results in polyp segmentation. However, all of these works have a supervised approach and are limited by the size of the dataset. It was observed that smaller datasets reduce the segmentation accuracy of ResUNet++. We train a U-Net to inpaint randomly dropped out pixels in the image as a proxy task. The dataset we use for pre-training is Kvasir-SEG dataset. This is followed by a supervised training on the limited Kvasir-Sessile dataset. Our experimental results demonstrate that with limited annotated dataset and a larger unlabeled dataset, self-supervised approach is a better alternative than fully supervised approach. Specifically, our self-supervised U-Net performs better than five segmentation models which were trained in supervised manner on the Kvasir-Sessile dataset.
【13】 Nuances in Margin Conditions Determine Gains in Active Learning 标题:边际条件的细微差别决定了主动学习的收益 链接:https://arxiv.org/abs/2110.08418
作者:Samory Kpotufe,Gan Yuan,Yunfan Zhao 机构: and here weColumbia University 摘要:我们考虑具有平滑回归函数的非参数分类,其中众所周知的是,E$(Y×x] $中的余量的概念在主动学习和被动学习中确定快或慢的速率。在这里,我们阐明了这两种设置之间的显著区别。也就是说,我们证明了边际概念中的一些看似良性的细微差别——涉及贝叶斯分类器的唯一性,并且对被动学习率没有明显的影响——决定了是否有任何主动学习者能够超越被动学习率。特别是,对于Audibert Tsybakov的边际条件(允许具有非唯一Bayes分类器的一般情况),在通常研究的环境中,如果$X$上的边际接近一致,则任何主动学习者都无法超过被动学习。因此,我们的结果否定了过去文献中的通常直觉,即在非参数环境下,主动速率应该比被动速率更好。 摘要:We consider nonparametric classification with smooth regression functions, where it is well known that notions of margin in $E[Y|X]$ determine fast or slow rates in both active and passive learning. Here we elucidate a striking distinction between the two settings. Namely, we show that some seemingly benign nuances in notions of margin -- involving the uniqueness of the Bayes classifier, and which have no apparent effect on rates in passive learning -- determine whether or not any active learner can outperform passive learning rates. In particular, for Audibert-Tsybakov's margin condition (allowing general situations with non-unique Bayes classifiers), no active learner can gain over passive learning in commonly studied settings where the marginal on $X$ is near uniform. Our results thus negate the usual intuition from past literature that active rates should improve over passive rates in nonparametric settings.
迁移|Zero/Few/One-Shot|自适应(11篇)
【1】 MEMO: Test Time Robustness via Adaptation and Augmentation 标题:备注:通过自适应和增强实现测试时间健壮性 链接:https://arxiv.org/abs/2110.09506
作者:Marvin Zhang,Sergey Levine,Chelsea Finn 机构: UC Berkeley, Stanford University 摘要:虽然深度神经网络可以在分布测试点上获得良好的精度,但许多应用需要鲁棒性,即使在面对输入中的意外扰动、域中的变化或其他分布偏移源时也是如此。我们研究了测试时间鲁棒性问题,即使用测试输入来提高模型的鲁棒性。最近的前期工作已经提出了测试时间自适应的方法,但是,它们都引入了额外的假设,例如访问多个测试点,从而阻止了广泛采用。在这项工作中,我们的目标是研究和设计方法,使模型训练过程不作任何假设,并在测试时广泛适用。我们提出了一种简单的方法,可用于模型具有概率性和适应性的任何测试环境中:当使用测试示例时,对数据点执行不同的数据增强,然后通过最小化模型平均值或边际熵来调整(所有)模型参数,整个扩充的输出分布。直观地说,这一目标鼓励模型在不同的增强中做出相同的预测,从而增强这些增强中编码的不变性,同时保持对其预测的信心。在我们的实验中,我们证明了这种方法持续改进了稳健的ResNet和vision transformer模型,与标准模型评估相比,实现了1-8%的精度增益,并且通常优于先前的增强和自适应策略。我们实现了由图像损坏(ImageNet-C)、常见对象的再现(ImageNet-R)以及在ResNet-50模型中,逆向选择的自然示例(ImageNet-A)引起的测试位移的最新结果。 摘要:While deep neural networks can attain good accuracy on in-distribution test points, many applications require robustness even in the face of unexpected perturbations in the input, changes in the domain, or other sources of distribution shift. We study the problem of test time robustification, i.e., using the test input to improve model robustness. Recent prior works have proposed methods for test time adaptation, however, they each introduce additional assumptions, such as access to multiple test points, that prevent widespread adoption. In this work, we aim to study and devise methods that make no assumptions about the model training process and are broadly applicable at test time. We propose a simple approach that can be used in any test setting where the model is probabilistic and adaptable: when presented with a test example, perform different data augmentations on the data point, and then adapt (all of) the model parameters by minimizing the entropy of the model's average, or marginal, output distribution across the augmentations. Intuitively, this objective encourages the model to make the same prediction across different augmentations, thus enforcing the invariances encoded in these augmentations, while also maintaining confidence in its predictions. In our experiments, we demonstrate that this approach consistently improves robust ResNet and vision transformer models, achieving accuracy gains of 1-8% over standard model evaluation and also generally outperforming prior augmentation and adaptation strategies. We achieve state-of-the-art results for test shifts caused by image corruptions (ImageNet-C), renditions of common objects (ImageNet-R), and, among ResNet-50 models, adversarially chosen natural examples (ImageNet-A).
【2】 Squeezing Backbone Feature Distributions to the Max for Efficient Few-Shot Learning 标题:将主干特征分布压缩到最大值以实现高效的少发式学习 链接:https://arxiv.org/abs/2110.09446
作者:Yuqing Hu,Vincent Gripon,Stéphane Pateux 机构:Orange Labs, Rennes, France, IMT Atlantique, Lab-STICC, UMR CNRS , F-, France 备注:Init commit. arXiv admin note: text overlap with arXiv:2006.03806 摘要:由于使用少量标记样本所造成的不确定性,少量镜头分类是一个具有挑战性的问题。在过去的几年中,已经提出了许多方法,其共同目标是转移在先前解决的任务中获得的知识,这通常是通过使用预训练的特征提取器来实现的。基于这种思路,本文提出了一种新的基于传输的方法,旨在处理特征向量,使其更接近高斯分布,从而提高精度。对于在训练过程中可以使用未标记测试样本的转换式Few-Shot学习,我们还引入了一种优化的传输启发算法,以进一步提高所获得的性能。使用标准化的视觉基准,我们展示了所提出的方法在各种数据集、主干架构和少量镜头设置下实现最先进精度的能力。 摘要:Few-shot classification is a challenging problem due to the uncertainty caused by using few labelled samples. In the past few years, many methods have been proposed with the common aim of transferring knowledge acquired on a previously solved task, what is often achieved by using a pretrained feature extractor. Following this vein, in this paper we propose a novel transfer-based method which aims at processing the feature vectors so that they become closer to Gaussian-like distributions, resulting in increased accuracy. In the case of transductive few-shot learning where unlabelled test samples are available during training, we also introduce an optimal-transport inspired algorithm to boost even further the achieved performance. Using standardized vision benchmarks, we show the ability of the proposed methodology to achieve state-of-the-art accuracy with various datasets, backbone architectures and few-shot settings.
【3】 Ortho-Shot: Low Displacement Rank Regularization with Data Augmentation for Few-Shot Learning 标题:正射:基于数据增强的低位移秩正则化在少射学习中的应用 链接:https://arxiv.org/abs/2110.09374
作者:Uche Osahor,Nasser M. Nasrabadi 机构:West Virginia University 摘要:在少数镜头分类中,主要目标是从几个样本中学习表示,这些样本可以很好地概括新类。在本文中,我们提出了一种有效的低位移秩(LDR)正则化策略称为正交镜头;基于双块toeplitz(DBT)矩阵结构的一种技术,它在少量镜头分类器的卷积层上施加正交正则化。少数镜头分类器的正则化卷积层增强了模型泛化和类内特征嵌入,这对于少数镜头学习至关重要。过度拟合是少数镜头模型的一个典型问题,数据多样性的缺乏抑制了适当的模型推理,从而削弱了少数镜头学习者对新类的分类精度。在这方面,我们分解了少数镜头分类器的管道,并确定支持、查询和任务数据的增加共同缓解了网络中的过度拟合。通过令人信服的结果,我们证明了将基于DBT的低秩正交正则化器与数据增强策略相结合,可以显著提高少量镜头分类器的性能。我们在miniImagenet、CIFAR-FS和斯坦福数据集上进行实验,与最先进的技术相比,性能值约为5% 摘要:In few-shot classification, the primary goal is to learn representations from a few samples that generalize well for novel classes. In this paper, we propose an efficient low displacement rank (LDR) regularization strategy termed Ortho-Shot; a technique that imposes orthogonal regularization on the convolutional layers of a few-shot classifier, which is based on the doubly-block toeplitz (DBT) matrix structure. The regularized convolutional layers of the few-shot classifier enhances model generalization and intra-class feature embeddings that are crucial for few-shot learning. Overfitting is a typical issue for few-shot models, the lack of data diversity inhibits proper model inference which weakens the classification accuracy of few-shot learners to novel classes. In this regard, we broke down the pipeline of the few-shot classifier and established that the support, query and task data augmentation collectively alleviates overfitting in networks. With compelling results, we demonstrated that combining a DBT-based low-rank orthogonal regularizer with data augmentation strategies, significantly boosts the performance of a few-shot classifier. We perform our experiments on the miniImagenet, CIFAR-FS and Stanford datasets with performance values of about 5\% when compared to state-of-the-art
【4】 Training Deep Neural Networks with Adaptive Momentum Inspired by the Quadratic Optimization 标题:受二次优化启发训练动量自适应的深度神经网络 链接:https://arxiv.org/abs/2110.09057
作者:Tao Sun,Huaming Ling,Zuoqiang Shi,Dongsheng Li,Bao Wang 机构: University of Utah (wang-baonj 摘要:重球动量是加速(随机)基于梯度的机器学习优化算法的关键。现有的重球动量通常由一个统一的超参数加权,这依赖于过度调谐。此外,校准的固定超参数可能不会导致最佳性能。在本文中,为了消除调整动量相关超参数的努力,我们提出了一种新的自适应动量,其灵感来源于二次优化中重球动量的最佳选择。我们提出的自适应重球动量可以改善随机梯度下降(SGD)和Adam。具有新设计的自适应动量的SGD和Adam对较大的学习速率更具鲁棒性,收敛速度更快,并且比基线具有更好的泛化能力。我们在广泛的机器学习基准(包括图像分类、语言建模和机器翻译)上用新的自适应动量验证了SGD和Adam的效率。最后,我们用所提出的自适应动量为SGD和Adam提供了收敛保证。 摘要:Heavy ball momentum is crucial in accelerating (stochastic) gradient-based optimization algorithms for machine learning. Existing heavy ball momentum is usually weighted by a uniform hyperparameter, which relies on excessive tuning. Moreover, the calibrated fixed hyperparameter may not lead to optimal performance. In this paper, to eliminate the effort for tuning the momentum-related hyperparameter, we propose a new adaptive momentum inspired by the optimal choice of the heavy ball momentum for quadratic optimization. Our proposed adaptive heavy ball momentum can improve stochastic gradient descent (SGD) and Adam. SGD and Adam with the newly designed adaptive momentum are more robust to large learning rates, converge faster, and generalize better than the baselines. We verify the efficiency of SGD and Adam with the new adaptive momentum on extensive machine learning benchmarks, including image classification, language modeling, and machine translation. Finally, we provide convergence guarantees for SGD and Adam with the proposed adaptive momentum.
【5】 Classical-to-Quantum Transfer Learning for Spoken Command Recognition Based on Quantum Neural Networks 标题:基于量子神经网络的经典到量子转移学习在口令识别中的应用 链接:https://arxiv.org/abs/2110.08689
作者:Jun Qi,Javier Tejedor 机构:Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta, GA, USA, Escuela Politecnica Superior, Universidad San Pablo-CEU, CEU Universities, Madrid, Spain 备注:submitted to ICASSP'22 摘要:本研究将机器学习算法中的转移学习扩展到新兴的用于语音命令识别(SCR)的端到端混合量子神经网络(QNN)。基于QNN的SCR系统由经典和量子两部分组成:(1)经典部分主要依靠一维卷积神经网络(CNN)提取语音特征;(2) 量子部分建立在变分量子电路的基础上,具有一些可学习的参数。由于在有噪声的中尺度量子(NISQ)设备上从头训练混合端到端QNN效率低下,我们提出了一种混合转移学习算法,允许预先训练的经典网络转移到混合QNN模型的经典部分。通过与变分量子电路(VQC)的联合微调,对预先训练好的经典网络进行进一步修改和扩充。混合转移学习方法对于基于QNN的SCR任务特别有吸引力,因为低维经典特征有望被编码到量子态中。我们在Google语音命令数据集上评估了应用于SCR混合经典量子QNN的混合转移学习算法,我们的经典模拟结果表明,混合转移学习可以提高SCR任务的基线性能。 摘要:This work investigates an extension of transfer learning applied in machine learning algorithms to the emerging hybrid end-to-end quantum neural network (QNN) for spoken command recognition (SCR). Our QNN-based SCR system is composed of classical and quantum components: (1) the classical part mainly relies on a 1D convolutional neural network (CNN) to extract speech features; (2) the quantum part is built upon the variational quantum circuit with a few learnable parameters. Since it is inefficient to train the hybrid end-to-end QNN from scratch on a noisy intermediate-scale quantum (NISQ) device, we put forth a hybrid transfer learning algorithm that allows a pre-trained classical network to be transferred to the classical part of the hybrid QNN model. The pre-trained classical network is further modified and augmented through jointly fine-tuning with a variational quantum circuit (VQC). The hybrid transfer learning methodology is particularly attractive for the task of QNN-based SCR because low-dimensional classical features are expected to be encoded into quantum states. We assess the hybrid transfer learning algorithm applied to the hybrid classical-quantum QNN for SCR on the Google speech command dataset, and our classical simulation results suggest that the hybrid transfer learning can boost our baseline performance on the SCR task.
【6】 Dataset Knowledge Transfer for Class-Incremental Learning without Memory 标题:用于无记忆增量学习的数据集知识转移 链接:https://arxiv.org/abs/2110.08421
作者:Habib Slim,Eden Belouadah,Adrian Popescu,Darian Onchis 机构: Universit´e Paris-Saclay, CEA, List, F-, Palaiseau, France, IMT Atlantique, Lab-STICC, team RAMBO, UMR CNRS , F-, Brest, France, West University of Timisoara, Timisoara, Romania 备注:Accepted to WACV 2022 摘要:增量学习使人工智能体能够从连续数据中学习。虽然利用深度神经网络取得了重要进展,但增量学习仍然非常具有挑战性。尤其是当不允许记忆过去的数据,灾难性遗忘会产生强烈的负面影响时。我们通过调整预测偏差校正来解决无记忆的班级增量学习问题,这种方法使过去和新班级的预测更具可比性。它是在允许内存并且没有内存不能直接使用时提出的,因为需要过去类的样本。我们介绍了一个两步学习过程,该过程允许在参考数据集和目标数据集之间传递偏差校正参数。偏差校正首先在具有相关验证内存的参考数据集上进行离线优化。然后将获得的校正参数传输到没有可用内存的目标数据集。第二个贡献是通过学习每个增量状态的参数来引入更精细的偏差校正建模,而不是通常的过去与新类建模。所提出的数据集知识转移方法适用于任何无记忆的增量方法。我们通过将其应用于四种现有方法来测试其有效性。使用四个目标数据集和不同配置进行的评估表明,性能得到了持续改进,几乎没有计算和内存开销。 摘要:Incremental learning enables artificial agents to learn from sequential data. While important progress was made by exploiting deep neural networks, incremental learning remains very challenging. This is particularly the case when no memory of past data is allowed and catastrophic forgetting has a strong negative effect. We tackle class-incremental learning without memory by adapting prediction bias correction, a method which makes predictions of past and new classes more comparable. It was proposed when a memory is allowed and cannot be directly used without memory, since samples of past classes are required. We introduce a two-step learning process which allows the transfer of bias correction parameters between reference and target datasets. Bias correction is first optimized offline on reference datasets which have an associated validation memory. The obtained correction parameters are then transferred to target datasets, for which no memory is available. The second contribution is to introduce a finer modeling of bias correction by learning its parameters per incremental state instead of the usual past vs. new class modeling. The proposed dataset knowledge transfer is applicable to any incremental method which works without memory. We test its effectiveness by applying it to four existing methods. Evaluation with four target datasets and different configurations shows consistent improvement, with practically no computational and memory overhead.
【7】 Adapt to Adaptation: Learning Personalization for Cross-Silo Federated Learning 标题:适应适应:跨竖井联合学习的学习个性化 链接:https://arxiv.org/abs/2110.08394
作者:Jun Luo,Shandong Wu 机构:University of Pittsburgh, Pittsburgh, PA 备注:15 pages 摘要:传统联合学习(FL)的目标是为具有分散数据的客户联合训练一个全局模型,从而降低集中训练的系统隐私风险。跨非IID数据集的分布变化,也称为数据异构性,通常对这种“一个全球模型适用于所有人”的解决方案提出挑战。在这项工作中,我们提出了APPLE,一个个性化的跨思洛FL框架,可以自适应地了解每个客户可以从其他客户的模型中受益多少。我们还介绍了一种在全局和局部目标之间灵活控制苹果训练重点的方法。我们实证评估了我们的方法的收敛性和泛化行为,并在两个基准数据集和两个非IID设置下的医学影像数据集上进行了广泛的实验。结果表明,与文献中的其他几种个性化FL方法相比,所提出的个性化FL框架APPLE实现了最先进的性能。 摘要:The goal of conventional federated learning (FL) is to train a global model for a federation of clients with decentralized data, reducing the systemic privacy risk of centralized training. The distribution shift across non-IID datasets, also known as the data heterogeneity, often poses a challenge for this one-global-model-fits-all solution. In this work, we propose APPLE, a personalized cross-silo FL framework that adaptively learns how much each client can benefit from other clients' models. We also introduce a method to flexibly control the focus of training APPLE between global and local objectives. We empirically evaluate our method's convergence and generalization behavior and performed extensive experiments on two benchmark datasets and two medical imaging datasets under two non-IID settings. The results show that the proposed personalized FL framework, APPLE, achieves state-of-the-art performance compared to several other personalized FL approaches in the literature.
【8】 Inconsistent Few-Shot Relation Classification via Cross-Attentional Prototype Networks with Contrastive Learning 标题:基于对比性学习的交叉注意原型网络不一致少射关系分类 链接:https://arxiv.org/abs/2110.08254
作者:Hongru Wang,Zhijing Jin,Jiarun Cao,Gabriel Pui Cheong Fung,Kam-Fai Wong 机构:⋆ Department of Systems Engineering and Engineering Management, The Chinese University of Hong Kong, ♣ Max Planck Institute for Intelligent Systems & ETH Z¨urich 摘要:标准少数镜头关系分类(RC)旨在学习一个鲁棒分类器,每个类只有很少的标记数据。然而,以前的工作很少研究训练与测试期间不同数量的类(即$N$-way)和每个类的标记数据数量(即$K$-shot)的影响。在这项工作中,我们定义了一个新任务,\textit{inconsistent minor shot RC},其中模型需要处理训练和测试之间$N$和$K$的不一致性。为了解决这一新任务,我们提出了基于原型网络的交叉注意对比学习(ProtoCACL),以捕获支持集和查询集之间丰富的交互作用。实验结果表明,在不一致的$K$和不一致的$N$设置下,我们的ProtoCACL都优于最先进的基线模型,因为它具有更健壮和更具区分性的表示。此外,我们发现,在不一致的少数镜头学习设置中,模型可以在\textit{less data}的情况下获得比仔细选择$N$和$K$的标准少数镜头设置更好的性能。在文章的最后,我们提供了进一步的分析和建议,以系统地指导不同情景下的$N$和$K$选择。 摘要:Standard few-shot relation classification (RC) is designed to learn a robust classifier with only few labeled data for each class. However, previous works rarely investigate the effects of a different number of classes (i.e., $N$-way) and number of labeled data per class (i.e., $K$-shot) during training vs. testing. In this work, we define a new task, \textit{inconsistent few-shot RC}, where the model needs to handle the inconsistency of $N$ and $K$ between training and testing. To address this new task, we propose Prototype Network-based cross-attention contrastive learning (ProtoCACL) to capture the rich mutual interactions between the support set and query set. Experimental results demonstrate that our ProtoCACL can outperform the state-of-the-art baseline model under both inconsistent $K$ and inconsistent $N$ settings, owing to its more robust and discriminate representations. Moreover, we identify that in the inconsistent few-shot learning setting, models can achieve better performance with \textit{less data} than the standard few-shot setting with carefully-selected $N$ and $K$. In the end of the paper, we provide further analyses and suggestions to systematically guide the selection of $N$ and $K$ under different scenarios.
【9】 A Variational Bayesian Approach to Learning Latent Variables for Acoustic Knowledge Transfer 标题:声学知识转移潜变量学习的变分贝叶斯方法 链接:https://arxiv.org/abs/2110.08598
作者:Hu Hu,Sabato Marco Siniscalchi,Chao-Han Huck Yang,Chin-Hui Lee 机构:School of Electrical and Computer Engineering, Georgia Institute of Technology, GA, USA, Computer Engineering School, University of Enna Kore, Italy 备注:Submitted to ICASSP 2022 摘要:我们提出了一种变分贝叶斯(VB)方法来学习深度神经网络(DNN)模型中潜在变量的分布,用于跨领域知识转移,以解决训练和测试条件之间的声学不匹配问题。与传统的最大后验概率估计中的点估计不同,在估计大量模型参数时存在维数灾难的风险,我们将注意力集中在通过VB推理框架估计可管理数量的DNN潜在变量上。为了完成模型转换,从源域学习的知识被编码在潜在变量的先验分布中,并在贝叶斯意义上与来自目标域的一小组自适应数据进行最佳组合,以近似相应的后验分布。声场景分类中设备自适应的实验结果表明,我们提出的VB方法可以对目标设备进行很好的改进,并且始终优于13种最先进的知识转移算法。 摘要:We propose a variational Bayesian (VB) approach to learning distributions of latent variables in deep neural network (DNN) models for cross-domain knowledge transfer, to address acoustic mismatches between training and testing conditions. Instead of carrying out point estimation in conventional maximum a posteriori estimation with a risk of having a curse of dimensionality in estimating a huge number of model parameters, we focus our attention on estimating a manageable number of latent variables of DNNs via a VB inference framework. To accomplish model transfer, knowledge learnt from a source domain is encoded in prior distributions of latent variables and optimally combined, in a Bayesian sense, with a small set of adaptation data from a target domain to approximate the corresponding posterior distributions. Experimental results on device adaptation in acoustic scene classification show that our proposed VB approach can obtain good improvements on target devices, and consistently outperforms 13 state-of-the-art knowledge transfer algorithms.
【10】 A Unified Speaker Adaptation Approach for ASR 标题:一种适用于ASR的统一说话人自适应方法 链接:https://arxiv.org/abs/2110.08545
作者:Yingzhu Zhao,Chongjia Ni,Cheung-Chi Leung,Shafiq Joty,Eng Siong Chng,Bin Ma 机构:Nanyang Technological University, Singapore, Machine Intelligence Technology, Alibaba Group 备注:Accepted by EMNLP 2021 摘要:Transformer模型已成功地应用于自动语音识别(ASR),并产生了最先进的结果。然而,它的性能仍然受到训练数据和测试数据之间说话人不匹配的影响。使用目标说话人数据进一步微调训练模型是最自然的自适应方法,但这需要大量计算,并可能导致现有说话人的灾难性遗忘。在这项工作中,我们提出了一种统一的说话人自适应方法,包括特征自适应和模型自适应。对于特征自适应,我们采用了一种说话人感知的持久记忆模型,该模型利用说话人i向量形成持久记忆,从而更好地推广到不可见的测试说话人。对于模型自适应,我们使用了一种新的逐步修剪方法来适应目标说话人,而不改变模型结构,据我们所知,这在ASR中从未被探索过。具体地说,我们将模型编码器上贡献较少的参数逐渐修剪到一定的稀疏水平,并使用修剪后的参数进行自适应,同时冻结未运行的参数以保持原始模型的性能。我们在Librispeech数据集上进行了实验。我们提出的方法在一般说话人自适应方面相对减少了2.74-6.52%的字错误率(WER)。在目标说话人自适应方面,我们的方法比基线方法的相对功率降低了20.58%,比微调方法的相对功率降低了2.54%。此外,对于极低的资源适应数据(例如,1次话语),我们的方法仅需几次训练就可以相对提高WER 6.53%。 摘要:Transformer models have been used in automatic speech recognition (ASR) successfully and yields state-of-the-art results. However, its performance is still affected by speaker mismatch between training and test data. Further finetuning a trained model with target speaker data is the most natural approach for adaptation, but it takes a lot of compute and may cause catastrophic forgetting to the existing speakers. In this work, we propose a unified speaker adaptation approach consisting of feature adaptation and model adaptation. For feature adaptation, we employ a speaker-aware persistent memory model which generalizes better to unseen test speakers by making use of speaker i-vectors to form a persistent memory. For model adaptation, we use a novel gradual pruning method to adapt to target speakers without changing the model architecture, which to the best of our knowledge, has never been explored in ASR. Specifically, we gradually prune less contributing parameters on model encoder to a certain sparsity level, and use the pruned parameters for adaptation, while freezing the unpruned parameters to keep the original model performance. We conduct experiments on the Librispeech dataset. Our proposed approach brings relative 2.74-6.52% word error rate (WER) reduction on general speaker adaptation. On target speaker adaptation, our method outperforms the baseline with up to 20.58% relative WER reduction, and surpasses the finetuning method by up to relative 2.54%. Besides, with extremely low-resource adaptation data (e.g., 1 utterance), our method could improve the WER by relative 6.53% with only a few epochs of training.
【11】 A theoretical and empirical study of new adaptive algorithms with additional momentum steps and shifted updates for stochastic non-convex optimization 标题:带附加动量步长和移位更新的随机非凸优化新自适应算法的理论和实证研究 链接:https://arxiv.org/abs/2110.08531
作者:Cristian Daniel Alecsa 机构:Romanian Institute of Science and Technology, Technical University of Cluj-Napoca, Cluj-Napoca, Romania, ORCID : ,-,-,- 备注:36 pages, 5 figures, 6 tables, 35 references 摘要:在下面的文章中,我们介绍了随机非凸优化问题的新的具有动量项的自适应算法。我们研究了几乎肯定收敛到平稳点的问题,以及关于所选最终迭代的有限时间范围分析,并且我们还检查了最坏情况下的迭代复杂性给出了梯度的平方欧几里德范数期望值的imate,并通过各种神经网络训练的计算模拟辅助我们进行的理论分析。 摘要:In the following paper we introduce new adaptive algorithms endowed with momentum terms for stochastic non-convex optimization problems. We investigate the almost sure convergence to stationary points, along with a finite-time horizon analysis with respect to a chosen final iteration, and we also inspect the worst-case iteration complexity. An estimate for the expectation of the squared Euclidean norm of the gradient is given and the theoretical analysis that we perform is assisted by various computational simulations for neural network training.
强化学习(7篇)
【1】 Provable Hierarchy-Based Meta-Reinforcement Learning 标题:基于可证明层次的元强化学习 链接:https://arxiv.org/abs/2110.09507
作者:Kurtland Chua,Qi Lei,Jason D. Lee 机构:Princeton University, Princeton, NJ , USA 摘要:层次强化学习(HRL)作为一种复杂模块化行为的可处理学习方法,受到了广泛的关注。然而,现有的工作要么假设可以访问专家构建的层次结构,要么使用没有可证明保证的层次结构学习启发法。为了解决这一差距,我们分析了元RL设置中的HRL,学习者在元训练期间学习潜在的层次结构,以便用于下游任务。我们考虑在过渡动态中嵌入自然层次结构的表格设置。与有监督元学习理论类似,我们提供了“多样性条件”,与易于处理的基于乐观主义的算法一起,保证了这种自然层次结构的样本有效恢复。此外,我们使用恢复的层次结构为学习者解决元测试任务提供遗憾边界。我们的范围包含了HRL文献中的常见概念,如时间和状态/动作抽象,这表明我们的设置和分析在实践中捕捉了HRL的重要特征。 摘要:Hierarchical reinforcement learning (HRL) has seen widespread interest as an approach to tractable learning of complex modular behaviors. However, existing work either assume access to expert-constructed hierarchies, or use hierarchy-learning heuristics with no provable guarantees. To address this gap, we analyze HRL in the meta-RL setting, where a learner learns latent hierarchical structure during meta-training for use in a downstream task. We consider a tabular setting where natural hierarchical structure is embedded in the transition dynamics. Analogous to supervised meta-learning theory, we provide "diversity conditions" which, together with a tractable optimism-based algorithm, guarantee sample-efficient recovery of this natural hierarchy. Furthermore, we provide regret bounds on a learner using the recovered hierarchy to solve a meta-test task. Our bounds incorporate common notions in HRL literature such as temporal and state/action abstractions, suggesting that our setting and analysis capture important features of HRL in practice.
【2】 Reinforcement Learning-Based Coverage Path Planning with Implicit Cellular Decomposition 标题:基于强化学习的隐式元胞分解覆盖路径规划 链接:https://arxiv.org/abs/2110.09018
作者:Javad Heydari,Olimpiya Saha,Viswanath Ganapathy 备注:20 pages 摘要:在一般已知环境中,覆盖路径规划是NP难的。当环境未知时,机器人需要依靠覆盖期间建立的在线地图信息来规划其路径,这就变得更具挑战性。一项重要的研究工作集中在设计启发式或近似算法,以实现合理的性能。此类算法在覆盖面积或覆盖成本(例如,覆盖时间或能量消耗)方面具有次优性能。在本文中,我们对覆盖问题进行了系统分析,并将其表述为一个最优停止时间问题,其中明确考虑了覆盖性能和成本之间的权衡。接下来,我们证明了强化学习(RL)技术可以用于计算解决问题。为此,我们提供了一些技术和实践方面的考虑,以促进RL算法的应用并提高解决方案的效率。最后,通过在网格世界环境和Gazebo模拟器中的实验,我们证明了基于强化学习的算法能够有效地覆盖真实的未知室内环境,并且优于目前的技术水平。 摘要:Coverage path planning in a generic known environment is shown to be NP-hard. When the environment is unknown, it becomes more challenging as the robot is required to rely on its online map information built during coverage for planning its path. A significant research effort focuses on designing heuristic or approximate algorithms that achieve reasonable performance. Such algorithms have sub-optimal performance in terms of covering the area or the cost of coverage, e.g., coverage time or energy consumption. In this paper, we provide a systematic analysis of the coverage problem and formulate it as an optimal stopping time problem, where the trade-off between coverage performance and its cost is explicitly accounted for. Next, we demonstrate that reinforcement learning (RL) techniques can be leveraged to solve the problem computationally. To this end, we provide some technical and practical considerations to facilitate the application of the RL algorithms and improve the efficiency of the solutions. Finally, through experiments in grid world environments and Gazebo simulator, we show that reinforcement learning-based algorithms efficiently cover realistic unknown indoor environments, and outperform the current state of the art.
【3】 Damped Anderson Mixing for Deep Reinforcement Learning: Acceleration, Convergence, and Stabilization 标题:深度强化学习的阻尼Anderson混合:加速、收敛和稳定 链接:https://arxiv.org/abs/2110.08896
作者:Ke Sun,Yafei Wang,Yi Liu,Yingnan Zhao,Bo Pan,Shangling Jui,Bei Jiang,Linglong Kong 机构:University of Alberta, Edmonton, Canada, Harbin Institute of Technology, Harbin, China, Huawei Technologies Ltd. 摘要:Anderson混合已被启发式地应用于强化学习(RL)算法中,以加速深度RL的收敛并提高采样效率。尽管它在收敛性方面有启发性的改进,但对于RL中Anderson混合的好处还没有提出严格的数学证明。在本文中,我们对一类基于Anderson混合的加速方案提供了更深入的见解,这些加速方案提高了深度RL算法的收敛性。我们的主要结果建立了Anderson混合和拟牛顿方法之间的联系,并证明了Anderson混合通过一个额外的收缩因子增加了策略迭代格式的收敛半径。分析的重点在于RL的定点迭代性质。通过在Anderson混合中引入一个稳定的正则化项和一个可微的、非扩张的MellowMax算子,我们进一步提出了一种稳定化策略,该算子可以实现更快的收敛和更稳定的行为。大量实验表明,该方法提高了RL算法的收敛性、稳定性和性能。 摘要:Anderson mixing has been heuristically applied to reinforcement learning (RL) algorithms for accelerating convergence and improving the sampling efficiency of deep RL. Despite its heuristic improvement of convergence, a rigorous mathematical justification for the benefits of Anderson mixing in RL has not yet been put forward. In this paper, we provide deeper insights into a class of acceleration schemes built on Anderson mixing that improve the convergence of deep RL algorithms. Our main results establish a connection between Anderson mixing and quasi-Newton methods and prove that Anderson mixing increases the convergence radius of policy iteration schemes by an extra contraction factor. The key focus of the analysis roots in the fixed-point iteration nature of RL. We further propose a stabilization strategy by introducing a stable regularization term in Anderson mixing and a differentiable, non-expansive MellowMax operator that can allow both faster convergence and more stable behavior. Extensive experiments demonstrate that our proposed method enhances the convergence, stability, and performance of RL algorithms.
【4】 Towards Instance-Optimal Offline Reinforcement Learning with Pessimism 标题:面向实例最优的悲观离线强化学习 链接:https://arxiv.org/abs/2110.08695
作者:Ming Yin,Yu-Xiang Wang 机构:Department of Computer Science, UC Santa Barbara, Department of Statistics and Applied Probability, UC Santa Barbara 备注:NeurIPS, 2021 摘要:我们研究了离线强化学习(offline-RL)问题,其目标是使用来自策略$\mu$的数据,在未知马尔可夫决策过程(MDP)中学习奖励最大化策略。特别地,我们考虑有限地平线MDPs的离线RL的样本复杂性问题。以往的工作是基于不同的数据覆盖假设来研究这个问题的,它们的学习保证是用覆盖系数来表示的,而覆盖系数缺乏对系统数量的明确描述。在这项工作中,我们分析了自适应悲观值迭代(APVI)算法,并推导了自适应悲观值迭代(APVI)算法,分析了自适应悲观值迭代(APVI)算法,我们分析了自适应悲观值迭代(APVI)算法,并分析了自适应悲观值迭代(APVI)算法,我们分析了自适应悲观值迭代(APVI)算法,并推导出了接近匹配[O\O\O\O\O\O\左(\sum \{{{{{{{h{{h h{{h{h h h h=h=h=h=h=h=1)h}{{{{h{{{{{{h h{{{{{h{{h{h{{{{{h{h h{{{{{{h{{h{{{}\sqrt{\frac{1}{n}}\right.\]作为补充,我们还证明了在弱假设下,如果$d^{\pi^\star}\uh(s_h,a_h)>0$,则$d^\muu h(s_h,a_h)>0$的每实例信息理论下界。与以前的极大极小下界不同,每个实例的下界(通过局部极大极小)是一个更强的标准,因为它分别适用于单个实例。这里$\pi^\star$是一个最优策略,$\mu$是行为策略,$d_h^\mu$是边际状态动作概率。我们将上述方程称为内在离线强化学习界,因为它直接暗示了所有现有的最优结果:统一数据覆盖假设下的极大极小率、无地平线设置、单一策略集中性和紧问题相关结果。随后,我们将结果推广到无假设的区域(在该区域中,我们不对$\mu$进行假设),并获得无假设的内在界。由于其一般形式,我们相信内在界有助于阐明导致特定问题难以解决的原因,并揭示离线RL的基本挑战。 摘要:We study the offline reinforcement learning (offline RL) problem, where the goal is to learn a reward-maximizing policy in an unknown Markov Decision Process (MDP) using the data coming from a policy $\mu$. In particular, we consider the sample complexity problems of offline RL for finite-horizon MDPs. Prior works study this problem based on different data-coverage assumptions, and their learning guarantees are expressed by the covering coefficients which lack the explicit characterization of system quantities. In this work, we analyze the Adaptive Pessimistic Value Iteration (APVI) algorithm and derive the suboptimality upper bound that nearly matches \[ O\left(\sum_{h=1}^H\sum_{s_h,a_h}d^{\pi^\star}_h(s_h,a_h)\sqrt{\frac{\mathrm{Var}_{P_{s_h,a_h}}{(V^\star_{h+1}+r_h)}}{d^\mu_h(s_h,a_h)}}\sqrt{\frac{1}{n}}\right). \] In complementary, we also prove a per-instance information-theoretical lower bound under the weak assumption that $d^\mu_h(s_h,a_h)>0$ if $d^{\pi^\star}_h(s_h,a_h)>0$. Different from the previous minimax lower bounds, the per-instance lower bound (via local minimaxity) is a much stronger criterion as it applies to individual instances separately. Here $\pi^\star$ is a optimal policy, $\mu$ is the behavior policy and $d_h^\mu$ is the marginal state-action probability. We call the above equation the intrinsic offline reinforcement learning bound since it directly implies all the existing optimal results: minimax rate under uniform data-coverage assumption, horizon-free setting, single policy concentrability, and the tight problem-dependent results. Later, we extend the result to the assumption-free regime (where we make no assumption on $ \mu$) and obtain the assumption-free intrinsic bound. Due to its generic form, we believe the intrinsic bound could help illuminate what makes a specific problem hard and reveal the fundamental challenges in offline RL.
【5】 Local Advantage Actor-Critic for Robust Multi-Agent Deep Reinforcement Learning 标题:鲁棒多Agent深度强化学习的局部优势执行者-批评者 链接:https://arxiv.org/abs/2110.08642
作者:Yuchen Xiao,Xueguang Lyu,Christopher Amato 机构: Northeastern University 备注:None 摘要:策略梯度方法已在多智能体强化学习中得到广泛应用,但由于环境随机性和探索智能体(即非平稳性)的存在,策略梯度方法的方差很高,这可能会因信贷分配的困难而恶化。因此,需要一种不仅能够有效地解决上述两个问题,而且能够足够健壮地解决各种任务的方法。为此,我们提出了一种新的多智能体策略梯度方法,称为鲁棒局部优势(ROLA)算法。ROLA允许每个代理作为本地批评家学习单个动作值函数,并通过基于集中式批评家的新型集中式训练方法改善环境的非平稳性。通过使用此局部批评家,每个代理计算一个基线,以减少其策略梯度估计的方差,这将导致相对于其他代理的选择的预期优势行动值,从而隐式地改善信贷分配。我们评估了ROLA在不同基准上的性能,并展示了它相对于许多最先进的多代理策略梯度算法的健壮性和有效性。 摘要:Policy gradient methods have become popular in multi-agent reinforcement learning, but they suffer from high variance due to the presence of environmental stochasticity and exploring agents (i.e., non-stationarity), which is potentially worsened by the difficulty in credit assignment. As a result, there is a need for a method that is not only capable of efficiently solving the above two problems but also robust enough to solve a variety of tasks. To this end, we propose a new multi-agent policy gradient method, called Robust Local Advantage (ROLA) Actor-Critic. ROLA allows each agent to learn an individual action-value function as a local critic as well as ameliorating environment non-stationarity via a novel centralized training approach based on a centralized critic. By using this local critic, each agent calculates a baseline to reduce variance on its policy gradient estimation, which results in an expected advantage action-value over other agents' choices that implicitly improves credit assignment. We evaluate ROLA across diverse benchmarks and show its robustness and effectiveness over a number of state-of-the-art multi-agent policy gradient algorithms.
【6】 Learning When and What to Ask: a Hierarchical Reinforcement Learning Framework 标题:学习何时问什么:分层强化学习框架 链接:https://arxiv.org/abs/2110.08258
作者:Khanh Nguyen,Yonatan Bisk,Hal Daumé III 机构:Hal Daum´e III♣♥, ♣ University of Maryland, ♦ Carnegie Melon University, ♥ Microsoft Research 备注:15 pages, 3 figures, 4 tables 摘要:可靠的人工智能代理应注意其知识的局限性,并在感觉到他们没有足够的知识做出合理决策时咨询人类。我们制定了一个分层强化学习框架,用于学习决定何时向人类请求额外信息,以及哪些类型的信息有助于请求。我们的框架通过允许代理与助手交互来利用其知识完成任务,从而扩展了部分观察到的马尔可夫决策过程(POMDP)。模拟人类辅助导航问题的结果证明了我们框架的有效性:通过我们的方法学习的交互策略的辅助,导航策略在任务成功率方面比单独执行任务提高了7倍。交互策略也很有效:平均而言,在任务执行期间执行的所有操作中,只有四分之一是信息请求。我们用分层的政策结构分析学习的好处和挑战,并为未来的工作提出方向。 摘要:Reliable AI agents should be mindful of the limits of their knowledge and consult humans when sensing that they do not have sufficient knowledge to make sound decisions. We formulate a hierarchical reinforcement learning framework for learning to decide when to request additional information from humans and what type of information would be helpful to request. Our framework extends partially-observed Markov decision processes (POMDPs) by allowing an agent to interact with an assistant to leverage their knowledge in accomplishing tasks. Results on a simulated human-assisted navigation problem demonstrate the effectiveness of our framework: aided with an interaction policy learned by our method, a navigation policy achieves up to a 7x improvement in task success rate compared to performing tasks only by itself. The interaction policy is also efficient: on average, only a quarter of all actions taken during a task execution are requests for information. We analyze benefits and challenges of learning with a hierarchical policy structure and suggest directions for future work.
【7】 Reinforcement Learning for Standards Design 标题:标准设计中的强化学习 链接:https://arxiv.org/abs/2110.06909
作者:Shahrukh Khan Kasi,Sayandev Mukherjee,Lin Cheng,Bernardo A. Huberman 机构:AI,Networks Center, University of Oklahoma, Tulsa, OK, USA, Next-Generation Systems, CableLabs, Santa Clara, CA, USA, Louisville, CO, USA 摘要:通信标准是通过人类委员会设计的,这些委员会在数月甚至数年内反复召开会议,直到达成共识。这包括关于要通过空中接口支持的调制和编码方案的决定。我们提出了一种方法,用于“自动化”选择在给定空中接口上支持的一组调制和编码方案,从而简化标准设计过程,并简化扩展标准以支持适用于新的更高级别应用和服务的新调制方案的易用性。我们的方案涉及机器学习,即构造器实体向评估者实体提交建议,评估者实体返回建议的分数。建造商采用强化学习对其提交的建议书进行迭代,直到达到建造商和评估人之前商定的分数,以表明满足所需的设计标准(包括接口传输的性能指标)。 摘要:Communications standards are designed via committees of humans holding repeated meetings over months or even years until consensus is achieved. This includes decisions regarding the modulation and coding schemes to be supported over an air interface. We propose a way to "automate" the selection of the set of modulation and coding schemes to be supported over a given air interface and thereby streamline both the standards design process and the ease of extending the standard to support new modulation schemes applicable to new higher-level applications and services. Our scheme involves machine learning, whereby a constructor entity submits proposals to an evaluator entity, which returns a score for the proposal. The constructor employs reinforcement learning to iterate on its submitted proposals until a score is achieved that was previously agreed upon by both constructor and evaluator to be indicative of satisfying the required design criteria (including performance metrics for transmissions over the interface).
元学习(2篇)
【1】 Learning Prototype-oriented Set Representations for Meta-Learning 标题:面向学习原型的集合表示在元学习中的应用 链接:https://arxiv.org/abs/2110.09140
作者:Dandan Guo,Long Tian,Minghe Zhang,Mingyuan Zhou,Hongyuan Zha 机构:The Chinese University of Hong Kong, Shenzhen, Xidian University, Georgia Institute of Technology, McCombs School of Business, The University of Texas at Austin., School of Data Science, Shenzhen Research Institute of Big Data 摘要:从集合结构化数据中学习是最近引起越来越多关注的一个基本问题,其中引入了一系列摘要网络来处理集合输入。事实上,许多元学习问题可以被视为集合输入任务。大多数现有的摘要网络旨在为输入集设计不同的体系结构,以实现排列不变性。然而,对于元分布中的不同集合密切相关并共享某些统计特性的常见情况关注甚少。将每个集合视为一组全局原型的分布,本文提供了一种新的基于优化传输(OT)的方法来改进现有的摘要网络。为了了解全球原型的分布,我们将其与数据点上的经验分布的OT距离最小化,从而提供一种自然的无监督方法来改进摘要网络。由于我们的即插即用框架可以应用于许多元学习问题,我们进一步将其实例化为少数镜头分类和隐式元生成建模的情况。大量实验表明,我们的框架在从集合中学习更强大的摘要统计信息方面显著改进了现有的摘要网络,并且可以成功地集成到基于度量的少量镜头分类和生成建模应用中,为解决集合输入和元学习问题提供了一个有前途的工具。 摘要:Learning from set-structured data is a fundamental problem that has recently attracted increasing attention, where a series of summary networks are introduced to deal with the set input. In fact, many meta-learning problems can be treated as set-input tasks. Most existing summary networks aim to design different architectures for the input set in order to enforce permutation invariance. However, scant attention has been paid to the common cases where different sets in a meta-distribution are closely related and share certain statistical properties. Viewing each set as a distribution over a set of global prototypes, this paper provides a novel optimal transport (OT) based way to improve existing summary networks. To learn the distribution over the global prototypes, we minimize its OT distance to the set empirical distribution over data points, providing a natural unsupervised way to improve the summary network. Since our plug-and-play framework can be applied to many meta-learning problems, we further instantiate it to the cases of few-shot classification and implicit meta generative modeling. Extensive experiments demonstrate that our framework significantly improves the existing summary networks on learning more powerful summary statistics from sets and can be successfully integrated into metric-based few-shot classification and generative modeling applications, providing a promising tool for addressing set-input and meta-learning problems.
【2】 Meta-Learning with Adjoint Methods 标题:基于伴随方法的元学习 链接:https://arxiv.org/abs/2110.08432
作者:Shibo Li,Zheng Wang,Akil Narayan,Robert Kirby,Shandian Zhe 机构:Robert M. Kirby, School of Computing, Scientific Computing and Imaging (SCI) Institute, Department of Mathematics, University of Utah 摘要:模型不可知元学习(MAML)被广泛用于为一系列任务寻找良好的初始化。尽管MAML取得了成功,但它的一个关键挑战是计算梯度w.r.t,这是采样任务的长训练轨迹的初始化,因为计算图可能会快速爆炸,并且计算成本非常昂贵。为了解决这个问题,我们提出了伴随MAML(A-MAML)。我们将内部优化中的梯度下降视为常微分方程(ODE)的演化。为了有效地计算验证损失的梯度w.r.t初始化,我们使用伴随方法构造了一个伴随的反向常微分方程。为了在初始化时获得梯度w.r.t,我们只需要运行标准ODE解算器两次——一次是在时间上向前,为采样任务演化出一条长的梯度流轨迹;另一种是反向的,求解伴随ODE。我们不需要创建或扩展任何中间计算图,采用积极的近似,或在训练损失中施加近似正则化。我们的方法便宜、准确,并且适用于不同的轨迹长度。我们展示了我们的方法在合成和真实元学习任务中的优势。 摘要:Model Agnostic Meta-Learning (MAML) is widely used to find a good initialization for a family of tasks. Despite its success, a critical challenge in MAML is to calculate the gradient w.r.t the initialization of a long training trajectory for the sampled tasks, because the computation graph can rapidly explode and the computational cost is very expensive. To address this problem, we propose Adjoint MAML (A-MAML). We view gradient descent in the inner optimization as the evolution of an Ordinary Differential Equation (ODE). To efficiently compute the gradient of the validation loss w.r.t the initialization, we use the adjoint method to construct a companion, backward ODE. To obtain the gradient w.r.t the initialization, we only need to run the standard ODE solver twice -- one is forward in time that evolves a long trajectory of gradient flow for the sampled task; the other is backward and solves the adjoint ODE. We need not create or expand any intermediate computational graphs, adopt aggressive approximations, or impose proximal regularizers in the training loss. Our approach is cheap, accurate, and adaptable to different trajectory lengths. We demonstrate the advantage of our approach in both synthetic and real-world meta-learning tasks.
符号|符号学习(1篇)
【1】 Neuro-Symbolic Forward Reasoning 标题:神经符号正向推理 链接:https://arxiv.org/abs/2110.09383
作者:Hikaru Shindo,Devendra Singh Dhami,Kristian Kersting 机构:AI and Machine Learning Group, Dept. of Computer Science, TU Darmstadt, Germany, Centre for Cognitive Science, TU Darmstadt, Germany, Hessian Center for AI (hessian.AI), Darmstadt, Germany 备注:Preprint 摘要:推理是人类智能的重要组成部分,因此一直是人工智能研究的一个长期目标。随着深度学习最近的成功,将推理与深度学习系统相结合,即神经符号人工智能已成为一个主要的兴趣领域。我们提出了神经符号前向推理器(NSFR),这是一种利用一阶逻辑的可微前向链推理任务的新方法。其关键思想是将可微前向链推理与以对象为中心的(深度)学习相结合。可微前向链式推理平滑地计算逻辑蕴涵,即以可微的方式从给定的事实和规则中推导出新的事实。以对象为中心的学习方法将原始输入分解为对象表示。因此,它允许我们提供一个一致的框架来执行来自原始输入的前向链接推断。NSFR将原始输入分解为以对象为中心的表示,将其转换为概率地原子,最后使用加权规则进行可微前向链推理。我们对以对象为中心的推理数据集、2D Kandinsky模式和3D CLEVR Hans以及各种任务的综合实验评估表明了我们方法的有效性和优势。 摘要:Reasoning is an essential part of human intelligence and thus has been a long-standing goal in artificial intelligence research. With the recent success of deep learning, incorporating reasoning with deep learning systems, i.e., neuro-symbolic AI has become a major field of interest. We propose the Neuro-Symbolic Forward Reasoner (NSFR), a new approach for reasoning tasks taking advantage of differentiable forward-chaining using first-order logic. The key idea is to combine differentiable forward-chaining reasoning with object-centric (deep) learning. Differentiable forward-chaining reasoning computes logical entailments smoothly, i.e., it deduces new facts from given facts and rules in a differentiable manner. The object-centric learning approach factorizes raw inputs into representations in terms of objects. Thus, it allows us to provide a consistent framework to perform the forward-chaining inference from raw inputs. NSFR factorizes the raw inputs into the object-centric representations, converts them into probabilistic ground atoms, and finally performs differentiable forward-chaining inference using weighted rules for inference. Our comprehensive experimental evaluations on object-centric reasoning data sets, 2D Kandinsky patterns and 3D CLEVR-Hans, and a variety of tasks show the effectiveness and advantage of our approach.
分层学习(1篇)
【1】 Semi-asynchronous Hierarchical Federated Learning for Cooperative Intelligent Transportation Systems 标题:协同式智能交通系统的半异步分层联邦学习 链接:https://arxiv.org/abs/2110.09073
作者:Qimei Chen,Zehua You,Hao Jiang 机构: Wuhan University 摘要:协作式智能交通系统(C-ITS)是一个很有前途的网络,通过利用参与者的优势,为自动化车辆和道路基础设施提供安全、高效、可持续和舒适的服务。然而,C-ITS的组件通常会生成大量数据,这使得数据科学的探索变得困难。目前,联邦学习已被提议作为一种吸引人的方法,允许用户合作地从经过训练的参与者那里获益。因此,在本文中,我们为C-ITS提出了一个新的半异步分层联邦学习(SHFL)框架,该框架支持从数据感知到弹性边缘到云模型的聚合。在提出的SHFL框架下,我们进一步提出了一个联合边缘节点关联和资源分配问题,以防止异质道路车辆的个性,实现通信效率。为了解决我们提出的混合整数非线性规划(MINLP)问题,我们引入了一种分布式交替方向乘子法(ADMM)-块坐标更新(BCU)算法。利用该算法,在训练精度和传输延迟之间进行了折衷。数值结果表明了该算法在训练开销和模型性能方面的优势。 摘要:Cooperative Intelligent Transport System (C-ITS) is a promising network to provide safety, efficiency, sustainability, and comfortable services for automated vehicles and road infrastructures by taking advantages from participants. However, the components of C-ITS usually generate large amounts of data, which makes it difficult to explore data science. Currently, federated learning has been proposed as an appealing approach to allow users to cooperatively reap the benefits from trained participants. Therefore, in this paper, we propose a novel Semi-asynchronous Hierarchical Federated Learning (SHFL) framework for C-ITS that enables elastic edge to cloud model aggregation from data sensing. We further formulate a joint edge node association and resource allocation problem under the proposed SHFL framework to prevent personalities of heterogeneous road vehicles and achieve communication-efficiency. To deal with our proposed Mixed integer nonlinear programming (MINLP) problem, we introduce a distributed Alternating Direction Method of Multipliers (ADMM)-Block Coordinate Update (BCU) algorithm. With this algorithm, a tradeoff between training accuracy and transmission latency has been derived. Numerical results demonstrate the advantages of the proposed algorithm in terms of training overhead and model performance.
医学相关(10篇)
【1】 Early Diagnostic Prediction of Covid-19 using Gradient-Boosting Machine Model 标题:基于梯度助推机模型的冠状病毒早期诊断预测 链接:https://arxiv.org/abs/2110.09436
作者:Satvik Tripathi 备注:Presented at the Drexel Society of Artificial Intelligence Research Conference, 2021 (arXiv:2110.05263) 摘要:随着全球新冠病毒-19病例的大幅增加,逆转录聚合酶链反应(RT-PCR)检测仍然是快速准确检测严重急性呼吸综合征冠状病毒2(SARS-CoV-2)的关键组成部分。近几个月来,发展中国家的医疗用品严重短缺,特别是缺乏RT-PCR检测,导致病人护理延误和感染率高。我们提出了一个梯度推进机模型,利用八个二元特征预测RT-PCR检测中SARS-CoV-2的诊断结果。我们使用了以色列卫生部发布的全国公开数据集。 摘要:With the huge spike in the COVID-19 cases across the globe and reverse transcriptase-polymerase chain reaction (RT-PCR) test remains a key component for rapid and accurate detection of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). In recent months there has been an acute shortage of medical supplies in developing countries, especially a lack of RT-PCR testing resulting in delayed patient care and high infection rates. We present a gradient-boosting machine model that predicts the diagnostics result of SARS-CoV- 2 in an RT-PCR test by utilizing eight binary features. We used the publicly available nationwide dataset released by the Israeli Ministry of Health.
【2】 Impact of COVID-19 Policies and Misinformation on Social Unrest 标题:冠状病毒政策和错误信息对社会动荡的影响 链接:https://arxiv.org/abs/2110.09234
作者:Martha Barnard,Radhika Iyer,Sara Y. Del Valle,Ashlynn R. Daughton 机构: A-, Information Systems and Modeling, Los Alamos National Lab, Los Alamos, NM, USA, Department of Political Science and Department of Computing, Data Science, and Society, University, of California, Berkeley, Berkeley, CA, USA 备注:21 pages, 9 figures 摘要:新的冠状病毒病(COVID-19)大流行影响了地球的每一个角落,扰乱了政府,导致了社会经济的不稳定。这场危机引发了关于社会不同部门在变革和压力时期如何互动和相互影响的问题。鉴于这一流行病前所未有的经济和社会影响,许多新的数据来源已经可用,使我们能够定量地探讨这些关联。了解这些关系可以帮助我们更好地为未来的灾难做好准备,并减轻影响。在这里,我们关注西欧八个国家和美国四个地区的社会动荡(抗议)、健康结果、公共卫生秩序和错误信息之间的相互作用。我们创建了1-3周的二元抗议指标预测,用于确定高抗议活动的时间,以及一段时间内的总体抗议计数。我们发现,除比利时外,所有地区的各种数据流中至少有一个特征可以预测抗议活动。然而,抗议预测的准确性因国家而异,也就是说,在所分析的大约一半国家中,我们的预测优于na\“这是一种新的模式。这些复杂的结果表明,不同的数据流有可能预测一个像抗议一样易变的话题,以及预测一个像大流行一样迅速演变的局势的困难。 摘要:The novel coronavirus disease (COVID-19) pandemic has impacted every corner of earth, disrupting governments and leading to socioeconomic instability. This crisis has prompted questions surrounding how different sectors of society interact and influence each other during times of change and stress. Given the unprecedented economic and societal impacts of this pandemic, many new data sources have become available, allowing us to quantitatively explore these associations. Understanding these relationships can help us better prepare for future disasters and mitigate the impacts. Here, we focus on the interplay between social unrest (protests), health outcomes, public health orders, and misinformation in eight countries of Western Europe and four regions of the United States. We created 1-3 week forecasts of both a binary protest metric for identifying times of high protest activity and the overall protest counts over time. We found that for all regions, except Belgium, at least one feature from our various data streams was predictive of protests. However, the accuracy of the protest forecasts varied by country, that is, for roughly half of the countries analyzed, our forecasts outperform a na\"ive model. These mixed results demonstrate the potential of diverse data streams to predict a topic as volatile as protests as well as the difficulties of predicting a situation that is as rapidly evolving as a pandemic.
【3】 Correlation-based Discovery of Disease Patterns for Syndromic Surveillance 标题:基于相关性的疾病模式发现在症状监测中的应用 链接:https://arxiv.org/abs/2110.09208
作者:Michael Rapp,Moritz Kulessa,Eneldo Loza Mencía,Johannes Fürnkranz 机构:TU Darmstadt, Darmstadt, Germany, -,-,-, Eneldo Loza Menc´ıa, Johannes F¨urnkranz, JKU Linz, Linz, Austria 摘要:早期爆发检测是遏制传染病的一个关键方面,因为它能够在疾病传播到更多人群之前识别和隔离受感染者。症状监测的目的不是通过监测确诊病例来检测感染意外增加,而是检测具有早期症状的病例,从而更及时地披露疫情。然而,这些疾病模式的定义往往是具有挑战性的,因为早期症状通常在许多疾病中都有,而且一种特定疾病在感染的早期阶段可能有几种临床表现。为了支持流行病学家定义可靠的疾病模式,我们提出了一种新的、数据驱动的方法来发现历史数据中的这种模式。关键的想法是考虑到与健康有关的数据来源中的指标与各地理区域报告的感染人数之间的相关性。在一项实验评估中,我们使用来自几个急诊科的数据来发现三种传染病的疾病模式。我们的结果表明,所提出的方法能够找到与报告的感染相关的模式,并经常识别与各自疾病相关的指标。 摘要:Early outbreak detection is a key aspect in the containment of infectious diseases, as it enables the identification and isolation of infected individuals before the disease can spread to a larger population. Instead of detecting unexpected increases of infections by monitoring confirmed cases, syndromic surveillance aims at the detection of cases with early symptoms, which allows a more timely disclosure of outbreaks. However, the definition of these disease patterns is often challenging, as early symptoms are usually shared among many diseases and a particular disease can have several clinical pictures in the early phase of an infection. To support epidemiologists in the process of defining reliable disease patterns, we present a novel, data-driven approach to discover such patterns in historic data. The key idea is to take into account the correlation between indicators in a health-related data source and the reported number of infections in the respective geographic region. In an experimental evaluation, we use data from several emergency departments to discover disease patterns for three infectious diseases. Our results suggest that the proposed approach is able to find patterns that correlate with the reported infections and often identifies indicators that are related to the respective diseases.
【4】 Using Clinical Drug Representations for Improving Mortality and Length of Stay Predictions 标题:利用临床药物陈述改善死亡率和住院时间预测 链接:https://arxiv.org/abs/2110.08918
作者:Batuhan Bardak,Mehmet Tan 机构:Department of Computer Engineering, TOBB University of Economics and Technology, Ankara, Turkey 备注:Published in IEEE CIBCB 2021 摘要:药物表征在化学信息学中起着重要作用。然而,在医疗保健领域,相对于其他电子健康记录(EHR)数据,药物表示一直未得到充分利用,这是因为高维药物表示的复杂性以及缺乏将临床药物转换为其表示的适当管道。时变生命体征、实验室测量和相关时间序列信号通常用于预测临床结果。在这项工作中,我们证明,除了其他临床特征外,使用临床药物表征具有显著的潜力,可以提高死亡率和住院时间(LOS)模型的性能。我们评估了两种不同的药物表征方法(扩展连接性指纹ECFP和微笑Transformer嵌入)对临床结果的预测。结果表明,与基线模型相比,所提出的多模式方法在临床任务方面取得了显著的增强。使用临床药物表征作为附加特征,可将受试者操作特征下区域(AUROC)的服务水平预测提高约%6,精确回忆曲线下区域(AUPRC)的服务水平预测提高约%5。此外,对于死亡率预测任务,在AUROC和AUPRC方面,分别比时间序列基线提高了约%2和%3.5。建议方法的代码可在https://github.com/tanlab/MIMIC-III-Clinical-Drug-Representations. 摘要:Drug representations have played an important role in cheminformatics. However, in the healthcare domain, drug representations have been underused relative to the rest of Electronic Health Record (EHR) data, due to the complexity of high dimensional drug representations and the lack of proper pipeline that will allow to convert clinical drugs to their representations. Time-varying vital signs, laboratory measurements, and related time-series signals are commonly used to predict clinical outcomes. In this work, we demonstrated that using clinical drug representations in addition to other clinical features has significant potential to increase the performance of mortality and length of stay (LOS) models. We evaluate the two different drug representation methods (Extended-Connectivity Fingerprint-ECFP and SMILES-Transformer embedding) on clinical outcome predictions. The results have shown that the proposed multimodal approach achieves substantial enhancement on clinical tasks over baseline models. Using clinical drug representations as additional features improve the LOS prediction for Area Under the Receiver Operating Characteristics (AUROC) around %6 and for Area Under Precision-Recall Curve (AUPRC) by around %5. Furthermore, for the mortality prediction task, there is an improvement of around %2 over the time series baseline in terms of AUROC and %3.5 in terms of AUPRC. The code for the proposed method is available at https://github.com/tanlab/MIMIC-III-Clinical-Drug-Representations.
【5】 Deep forecasting of translational impact in medical research 标题:医学研究中翻译影响的深度预测 链接:https://arxiv.org/abs/2110.08904
作者:Amy PK Nelson,Robert J Gray,James K Ruffle,Henry C Watkins,Daniel Herron,Nick Sorros,Danil Mikhailov,M. Jorge Cardoso,Sebastien Ourselin,Nick McNally,Bryan Williams,Geraint E. Rees,Parashkev Nachev 机构:UCL Queen Square Institute of Neurology, University College London, London, UK, WC,B ,EH, Research & Development, NIHR University College London Hospitals Biomedical Research Centre, London, Wellcome Data Labs, Wellcome Trust, London, UK, NW,BE 备注:28 pages, 6 figures 摘要:生物医学研究的价值——每年1.7万亿美元的投资——最终取决于其对下游现实世界的影响。当前影响的客观预测因素取决于传播的代理性、还原性指标,如论文引用率,其与现实世界翻译的关系仍然无法量化。在这里,我们试图确定未来真实世界翻译的相对可预测性——如专利、指南或政策文件中的索引——从生物医学出版物的抽象层面内容的复杂模型与引文和出版物元数据的比较。我们开发了一套多尺度出版数据的代表性和鉴别性数学模型,使用微软学术图(Microsoft Academic Graph)从1990年到2019年捕获的整个生物医学研究语料库,提前量化了主要生物医学领域的样本外预测性能,涵盖所有领域的4330万篇论文。我们的研究表明,根据专利、指导方针或政策文件的内容判断,引用只能适度预测翻译影响。相比之下,出版物标题、摘要和元数据的高维模型表现出高保真性(AUROC>0.9),跨时间和主题领域进行概括,并转移到识别诺贝尔奖获得者论文的任务上。通过将论文纳入专利、指南或政策文件中进行索引,可以预测论文的翻译影响——不需要样本,也可以提前预测——其抽象层面内容的复杂模型的保真度远远高于出版元数据或引用指标模型的保真度。我们认为,基于内容的影响模型在性能上优于传统的、基于引文的测量,并且支持对翻译潜力的客观测量的更有力的基于证据的主张。 摘要:The value of biomedical research--a $1.7 trillion annual investment--is ultimately determined by its downstream, real-world impact. Current objective predictors of impact rest on proxy, reductive metrics of dissemination, such as paper citation rates, whose relation to real-world translation remains unquantified. Here we sought to determine the comparative predictability of future real-world translation--as indexed by inclusion in patents, guidelines or policy documents--from complex models of the abstract-level content of biomedical publications versus citations and publication meta-data alone. We develop a suite of representational and discriminative mathematical models of multi-scale publication data, quantifying predictive performance out-of-sample, ahead-of-time, across major biomedical domains, using the entire corpus of biomedical research captured by Microsoft Academic Graph from 1990 to 2019, encompassing 43.3 million papers across all domains. We show that citations are only moderately predictive of translational impact as judged by inclusion in patents, guidelines, or policy documents. By contrast, high-dimensional models of publication titles, abstracts and metadata exhibit high fidelity (AUROC > 0.9), generalise across time and thematic domain, and transfer to the task of recognising papers of Nobel Laureates. The translational impact of a paper indexed by inclusion in patents, guidelines, or policy documents can be predicted--out-of-sample and ahead-of-time--with substantially higher fidelity from complex models of its abstract-level content than from models of publication meta-data or citation metrics. We argue that content-based models of impact are superior in performance to conventional, citation-based measures, and sustain a stronger evidence-based claim to the objective measurement of translational potential.
【6】 A Bayesian Approach for Medical Inquiry and Disease Inference in Automated Differential Diagnosis 标题:自动鉴别诊断中医学查询和疾病推理的贝叶斯方法 链接:https://arxiv.org/abs/2110.08393
作者:Hong Guan,Chitta Baral 机构:Arizona State University 摘要:我们提出了一种用于医学查询和疾病推断的贝叶斯方法,这是鉴别诊断的两个主要阶段。与以往模拟给定概率数据并使用ML算法的工作不同,我们直接使用快速医学参考(QMR)信念网络,在推理阶段应用贝叶斯推理,在查询阶段应用贝叶斯实验设计。此外,我们通过将贝叶斯实验设计框架从一步搜索扩展到多步搜索来改进查询阶段。我们的方法具有一些实际优势,因为它是可解释的,不需要昂贵的训练,并且能够适应新的变化,而无需任何额外的努力。我们的实验表明,我们的方法在两个模拟数据集SymCAT和HPO上取得了最新的结果,在两个诊断对话数据集Muzhi和Dxy上取得了有竞争力的结果。 摘要:We propose a Bayesian approach for both medical inquiry and disease inference, the two major phases in differential diagnosis. Unlike previous work that simulates data from given probabilities and uses ML algorithms on them, we directly use the Quick Medical Reference (QMR) belief network, and apply Bayesian inference in the inference phase and Bayesian experimental design in the inquiry phase. Moreover, we improve the inquiry phase by extending the Bayesian experimental design framework from one-step search to multi-step search. Our approach has some practical advantages as it is interpretable, free of costly training, and able to adapt to new changes without any additional effort. Our experiments show that our approach achieves new state-of-the-art results on two simulated datasets, SymCAT and HPO, and competitive results on two diagnosis dialogue datasets, Muzhi and Dxy.
【7】 FedSLD: Federated Learning with Shared Label Distribution for Medical Image Classification 标题:FedSLD:基于共享标签分布的联合学习在医学图像分类中的应用 链接:https://arxiv.org/abs/2110.08378
作者:Jun Luo,Shandong Wu 机构: Intelligent Systems Program, University of Pittsburgh, Pittsburgh, PA, USA, Department of Radiology, Department of Biomedical Informatics, and Department, of Bioengineering, University of Pittsburgh, Pittsburgh, PA, USA 备注:10 pages 摘要:从本质上讲,医学研究中的机器学习需要仔细注意遵守数据隐私规则,这使得在从不同医疗中心收集的数据上训练机器学习模型变得困难。未能利用同类数据可能会导致训练模型的泛化性较差。联合学习(FL)支持协作训练联合模型,同时保持多个医疗中心的数据分散。然而,联邦优化经常受到医疗中心数据分布异构性的影响。在这项工作中,我们为分类任务提出了使用共享标签分布的联合学习(FedSLD),这是一种假设所有参与联合的客户机都知道标签分布的方法。FedSLD在给定分布知识的优化过程中调整每个数据样本对局部目标的贡献,从而减轻所有客户端的数据异构性带来的不稳定性。我们在四个具有不同类型非IID数据分布的公开图像数据集上进行了广泛的实验。我们的结果表明,FedSLD比领先的FL优化算法具有更好的收敛性能,将测试精度提高了5.50个百分点。 摘要:Machine learning in medical research, by nature, needs careful attention on obeying the regulations of data privacy, making it difficult to train a machine learning model over gathered data from different medical centers. Failure of leveraging data of the same kind may result in poor generalizability for the trained model. Federated learning (FL) enables collaboratively training a joint model while keeping the data decentralized for multiple medical centers. However, federated optimizations often suffer from the heterogeneity of the data distribution across medical centers. In this work, we propose Federated Learning with Shared Label Distribution (FedSLD) for classification tasks, a method that assumes knowledge of the label distributions for all the participating clients in the federation. FedSLD adjusts the contribution of each data sample to the local objective during optimization given knowledge of the distribution, mitigating the instability brought by data heterogeneity across all clients. We conduct extensive experiments on four publicly available image datasets with different types of non-IID data distributions. Our results show that FedSLD achieves better convergence performance than the compared leading FL optimization algorithms, increasing the test accuracy by up to 5.50 percentage points.
【8】 A New Approach for Interpretability and Reliability in Clinical Risk Prediction: Acute Coronary Syndrome Scenario 标题:临床风险预测中可解释性和可靠性的新方法:急性冠脉综合征情景 链接:https://arxiv.org/abs/2110.08331
作者:Francisco Valente,Jorge Henriques,Simão Paredes,Teresa Rocha,Paulo de Carvalho,João Morais 机构:Center for Informatics and Systems of University of Coimbra, University of Coimbra, P´olo II,-, Coimbra, Portugal, Polytechnic of Coimbra, Department of Systems and Computer Engineering, Rua Pedro, Nunes - Quinta da Nora,-, Coimbra, Portugal 备注:None 摘要:我们打算创建一种新的风险评估方法,结合风险评分和机器学习模型的最佳特征。更具体地说,我们的目标是开发一种方法,该方法除了具有良好的性能外,还为每个患者提供个性化的模型和结果,具有较高的可解释性,并结合通常不可用的预测可靠性估计。通过将这些特征结合在同一方法中,我们期望它能够提高医生在日常活动中使用此类工具的信心。为了实现上述目标,制定了三步方法:通过将风险因素二分法制定若干规则;使用机器学习分类器对这些规则进行训练,以预测每个患者对每个规则的接受程度(规则正确的概率);这些信息被结合起来,用于计算死亡风险和这种预测的可靠性。该方法应用于任何类型急性冠脉综合征(ACS)患者的数据集,以评估30天全因死亡率风险。将其性能与最先进的方法进行比较:逻辑回归(LR)、人工神经网络(ANN)和临床风险评分模型(急性冠脉事件全球登记处-GRACE)。所提出的方法实现了与标准LR相同的测试结果,但提供了更好的解释性和个性化;它也显著优于GRACE风险模型和标准ANN模型。校准曲线还表明,所获得的模型在接近理想曲线时具有很好的泛化能力。最后,个体预测的可靠性估计与错误分类率有很大的相关性。这些属性在其他临床场景中也可能有有益的应用。[摘要] 摘要:We intend to create a new risk assessment methodology that combines the best characteristics of both risk score and machine learning models. More specifically, we aim to develop a method that, besides having a good performance, offers a personalized model and outcome for each patient, presents high interpretability, and incorporates an estimation of the prediction reliability which is not usually available. By combining these features in the same approach we expect that it can boost the confidence of physicians to use such a tool in their daily activity. In order to achieve the mentioned goals, a three-step methodology was developed: several rules were created by dichotomizing risk factors; such rules were trained with a machine learning classifier to predict the acceptance degree of each rule (the probability that the rule is correct) for each patient; that information was combined and used to compute the risk of mortality and the reliability of such prediction. The methodology was applied to a dataset of patients admitted with any type of acute coronary syndromes (ACS), to assess the 30-days all-cause mortality risk. The performance was compared with state-of-the-art approaches: logistic regression (LR), artificial neural network (ANN), and clinical risk score model (Global Registry of Acute Coronary Events - GRACE). The proposed approach achieved testing results identical to the standard LR, but offers superior interpretability and personalization; it also significantly outperforms the GRACE risk model and the standard ANN model. The calibration curve also suggests a very good generalization ability of the obtained model as it approaches the ideal curve. Finally, the reliability estimation of individual predictions presented a great correlation with the misclassifications rate. Those properties may have a beneficial application in other clinical scenarios as well. [abridged]
【9】 Comparative Analysis of Deep Learning Algorithms for Classification of COVID-19 X-Ray Images 标题:深度学习算法在冠状病毒X射线图像分类中的比较分析 链接:https://arxiv.org/abs/2110.09294
作者:Unsa Maheen,Khawar Iqbal Malik,Gohar Ali 机构:[,], [,], [,] Department of Computer Science, University of Lahore – Pakistan 摘要:冠状病毒于2019年12月在中国武汉首次出现,并在全球迅速传播。它对全球经济、教育、社会、日常生活和人类的总体健康都有非常有害的影响。要在初期限制疾病的快速扩展,主要困难是尽快探索阳性电晕患者。由于没有可访问的自动工具包,因此对辅助诊断工具的需求增加。先前的研究表明,从放射技术获得的结果表明,此类图像具有与冠状病毒相关的重要细节。将改进的人工智能(AI)系统与无线电图形图像结合使用,可以有效地精确、准确地解决该病毒,也有助于解决偏远村庄缺乏专业医生的问题。在我们的研究中,我们分析了使用胸部X射线照相图像检测新冠病毒-19的不同技术,我们检查了不同的预训练CNN模型AlexNet、VGG-16、MobileNet-V2、SqeezeNet、ResNet-34、ResNet-50和COVIDX-Net,以纠正新冠病毒分类系统的分析。我们的研究表明,使用ResNet-34技术预训练的CNN模型具有较高的准确率,即98.33、96.77%的准确率和98.36的F1分数,这优于其他CNN技术。我们的模型可能有助于研究人员对CNN模型进行精细训练,以便快速筛查新冠病毒患者。 摘要:The Coronavirus was first emerged in December, in the city of China named Wuhan in 2019 and spread quickly all over the world. It has very harmful effects all over the global economy, education, social, daily living and general health of humans. To restrict the quick expansion of the disease initially, main difficulty is to explore the positive corona patients as quickly as possible. As there are no automatic tool kits accessible the requirement for supplementary diagnostic tools has risen up. Previous studies have findings acquired from radiological techniques proposed that this kind of images have important details related to the coronavirus. The usage of modified Artificial Intelligence (AI) system in combination with radio-graphical images can be fruitful for the precise and exact solution of this virus and can also be helpful to conquer the issue of deficiency of professional physicians in distant villages. In our research, we analyze the different techniques for the detection of COVID-19 using X-Ray radiographic images of the chest, we examined the different pre-trained CNN models AlexNet, VGG-16, MobileNet-V2, SqeezeNet, ResNet-34, ResNet-50 and COVIDX-Net to correct analytics for classification system of COVID-19. Our study shows that the pre trained CNN Model with ResNet-34 technique gives the higher accuracy rate of 98.33, 96.77% precision, and 98.36 F1-score, which is better than other CNN techniques. Our model may be helpful for the researchers to fine train the CNN model for the the quick screening of COVID patients.
【10】 MedAug: Contrastive learning leveraging patient metadata improves representations for chest X-ray interpretation 标题:MedAug:利用患者元数据的对比性学习改善了胸部X光解释的表示 链接:https://arxiv.org/abs/2102.10663
作者:Yen Nhi Truong Vu,Richard Wang,Niranjan Balachandar,Can Liu,Andrew Y. Ng,Pranav Rajpurkar 机构:Equal Contribution, Department of Computer Science, Stanford University, School of Medicine, Stanford University 摘要:同一图像的多视图对之间的自监督对比学习已被证明能够成功地利用未标记数据为自然图像和医学图像生成有意义的视觉表示。然而,在确定如何为医学图像选择对方面的工作有限,可以利用患者元数据的可用性来改进表示。在这项工作中,我们开发了一种方法,通过使用患者元数据从可能不同的图像视图中选择阳性对。我们比较了选择阳性对进行胸部X射线解释的策略,包括要求它们来自同一患者、影像学研究或偏侧。我们通过微调1%的胸腔积液分类标记数据集的线性层来评估下游任务性能。我们表现最佳的阳性配对选择策略,包括使用来自同一研究的同一患者的所有侧位图像,与ImageNet预训练基线相比,平均AUC的表现提高了14.4%。我们的对照实验表明,提高下游疾病分类性能的关键在于:(1)使用患者元数据从具有相同基本病理学的不同图像中适当地创建阳性对;(2)最大化查询配对中使用的不同图像的数量。此外,我们还探讨了如何利用患者元数据来选择硬阴性对进行对比学习,但没有发现比不使用元数据的基线有所改进。我们的方法广泛适用于医学图像解释,并允许在选择配对进行对比学习时灵活地结合医学见解。 摘要:Self-supervised contrastive learning between pairs of multiple views of the same image has been shown to successfully leverage unlabeled data to produce meaningful visual representations for both natural and medical images. However, there has been limited work on determining how to select pairs for medical images, where availability of patient metadata can be leveraged to improve representations. In this work, we develop a method to select positive pairs coming from views of possibly different images through the use of patient metadata. We compare strategies for selecting positive pairs for chest X-ray interpretation including requiring them to be from the same patient, imaging study or laterality. We evaluate downstream task performance by fine-tuning the linear layer on 1% of the labeled dataset for pleural effusion classification. Our best performing positive pair selection strategy, which involves using images from the same patient from the same study across all lateralities, achieves a performance increase of 14.4% in mean AUC from the ImageNet pretrained baseline. Our controlled experiments show that the keys to improving downstream performance on disease classification are (1) using patient metadata to appropriately create positive pairs from different images with the same underlying pathologies, and (2) maximizing the number of different images used in query pairing. In addition, we explore leveraging patient metadata to select hard negative pairs for contrastive learning, but do not find improvement over baselines that do not use metadata. Our method is broadly applicable to medical image interpretation and allows flexibility for incorporating medical insights in choosing pairs for contrastive learning.
蒸馏|知识提取(3篇)
【1】 Reminding the Incremental Language Model via Data-Free Self-Distillation 标题:通过无数据自蒸馏提醒增量语言模型 链接:https://arxiv.org/abs/2110.08745
作者:Han Wang,Ruiliu Fu,Chengzhang Li,Xuejun Zhang,Jun Zhou,Yonghong Yan 机构: Key Laboratory of Speech Acoustics and Content Understanding, Institute of Acoustics, China, University of Chinese Academy of Sciences, Beijing, China 备注:8 pages, 5 figures 摘要:使用伪数据的增量语言学习可以缓解神经网络中的灾难性遗忘。然而,为了获得更好的性能,以前的方法对以前任务的伪数据有更高的要求。当使用更少的伪数据时,性能会显著降低。此外,随着不同任务的顺序学习,伪数据的分布逐渐偏离真实数据。学习的任务越多,偏差就越大,这会导致更严重的灾难性遗忘。为了解决这些问题,我们提出了通过无数据自蒸馏(DFSD)来提醒增量语言模型,该模型包括基于地球移动者距离的自蒸馏和隐藏数据增强。通过估计GPT-2各层的知识分布,并将其从教师模型转换为学生模型,基于推土机距离的自蒸馏可以显著减少对伪数据的需求。通过将伪数据的生成建模为一个隐藏数据扩充过程,其中每个样本都是所有训练任务数据的混合,隐藏数据扩充可以极大地缓解由偏差引起的灾难性遗忘。实验结果表明,即使伪数据的最大减少量为90%,我们的DFSD也可以超过以前的最新方法。 摘要:Incremental language learning with pseudo-data can alleviate catastrophic forgetting in neural networks. However, to obtain better performance, former methods have higher demands for pseudo-data of the previous tasks. The performance dramatically decreases when fewer pseudo-data are employed. In addition, the distribution of pseudo-data gradually deviates from the real data with the sequential learning of different tasks. The deviation will be greater with more tasks learned, which results in more serious catastrophic forgetting. To address these issues, we propose reminding incremental language model via data-free self-distillation (DFSD), which includes self-distillation based on the Earth Mover's Distance and hidden data augmentation. By estimating the knowledge distribution in all layers of GPT-2 and transforming it from teacher model to student model, the Self-distillation based on the Earth Mover's Distance can significantly reduce the demand for pseudo-data. Hidden data augmentation can greatly alleviate the catastrophic forgetting caused by deviations via modeling the generation of pseudo-data as a hidden data augmentation process, where each sample is a mixture of all trained task data. The experimental results demonstrate that our DFSD can exceed the previous state-of-the-art methods even if the maximum decrease in pseudo-data is 90%.
【2】 HRKD: Hierarchical Relational Knowledge Distillation for Cross-domain Language Model Compression 标题:HRKD:面向跨域语言模型压缩的层次关系知识抽取 链接:https://arxiv.org/abs/2110.08551
作者:Chenhe Dong,Yaliang Li,Ying Shen,Minghui Qiu 机构: Sun Yat-sen University , Alibaba Group 备注:EMNLP 2021 摘要:在许多自然语言处理任务中,与传统的神经网络方法相比,大型预训练语言模型(PLM)表现出了压倒性的性能。然而,它们庞大的模型尺寸和较低的推理速度在实际应用中阻碍了在资源有限的设备上的部署。本文以知识提取为目标,提出了一种分层关系知识提取(HRKD)方法,用于获取分层和领域关系信息。具体来说,为了增强模型的能力和可转移性,我们利用元学习的思想,建立领域关系图来捕获不同领域之间的关系信息。为了动态地为每个域选择最具代表性的原型,我们提出了一种分层比较聚合机制来捕获分层关系。在公共多域数据集上的大量实验证明了我们的HRKD方法的优越性能及其强大的Few-Shot学习能力。为了再现性,我们在https://github.com/cheneydon/hrkd. 摘要:On many natural language processing tasks, large pre-trained language models (PLMs) have shown overwhelming performances compared with traditional neural network methods. Nevertheless, their huge model size and low inference speed have hindered the deployment on resource-limited devices in practice. In this paper, we target to compress PLMs with knowledge distillation, and propose a hierarchical relational knowledge distillation (HRKD) method to capture both hierarchical and domain relational information. Specifically, to enhance the model capability and transferability, we leverage the idea of meta-learning and set up domain-relational graphs to capture the relational information across different domains. And to dynamically select the most representative prototypes for each domain, we propose a hierarchical compare-aggregate mechanism to capture hierarchical relationships. Extensive experiments on public multi-domain datasets demonstrate the superior performance of our HRKD method as well as its strong few-shot learning ability. For reproducibility, we release the code at https://github.com/cheneydon/hrkd.
【3】 Sparse Distillation: Speeding Up Text Classification by Using Bigger Models 标题:稀疏精馏:通过使用更大的模型来加速文本分类 链接:https://arxiv.org/abs/2110.08536
作者:Qinyuan Ye,Madian Khabsa,Mike Lewis,Sinong Wang,Xiang Ren,Aaron Jaech 机构:University of Southern California, Facebook AI 摘要:将最先进的Transformer模型提取为轻量级学生模型是降低推理时计算成本的有效方法。然而,对于某些时间敏感的应用,改进的推理速度可能仍然不能令人满意。在本文中,我们的目标是通过在学生模型的设计空间中探索一个新的领域来进一步提高推理速度的极限。更具体地,我们考虑将基于Transformer的文本分类器提取为十亿参数,稀疏激活的学生模型与嵌入平均架构。我们的实验表明,学生模型在六个文本分类任务的集合中保留了RoBERTa大型教师97%的表现。同时,与教师模式相比,学生模式在GPU和CPU上实现了高达600倍的速度提升。进一步的研究表明,我们的管道在隐私保护和域泛化设置方面也是有效的。 摘要:Distilling state-of-the-art transformer models into lightweight student models is an effective way to reduce computation cost at inference time. However, the improved inference speed may be still unsatisfactory for certain time-sensitive applications. In this paper, we aim to further push the limit of inference speed by exploring a new area in the design space of the student model. More specifically, we consider distilling a transformer-based text classifier into a billion-parameter, sparsely-activated student model with a embedding-averaging architecture. Our experiments show that the student models retain 97% of the RoBERTa-Large teacher performance on a collection of six text classification tasks. Meanwhile, the student model achieves up to 600x speed-up on both GPUs and CPUs, compared to the teacher models. Further investigation shows that our pipeline is also effective in privacy-preserving and domain generalization settings.
推荐(2篇)
【1】 Learning to Learn a Cold-start Sequential Recommender 标题:学会学习冷启动序贯推荐器 链接:https://arxiv.org/abs/2110.09083
作者:Xiaowen Huang,Jitao Sang,Jian Yu,Changsheng Xu 机构:& Beijing Key Lab of Traffic Data Analysis and Mining, Beijing Jiaotong University, China, Sciences, China, School of Artificial Intelligence, University of Chinese Academy of Sciences, China, and Peng, Cheng Laboratory, China 摘要:冷启动推荐是当代在线应用中一个迫切需要解决的问题。它的目的是为行为稀疏的用户提供尽可能准确的建议。许多数据驱动算法,如广泛使用的矩阵分解,由于数据稀疏性而表现不佳。本文采用元学习的思想来解决用户的冷启动推荐问题。我们提出了一个基于元学习的冷启动顺序推荐框架metaCSR,该框架包括三个主要部分:Diffusion Representer,通过交互图上的信息扩散来学习更好的用户/项目嵌入;用于捕获行为序列的时间依赖性的序列推荐器;元学习器,用于提取和传播先前用户的可转移知识,并为新用户学习良好的初始化。metaCSR具有从常规用户行为中学习常见模式并优化初始化的能力,以便在一次或几次梯度更新后,模型能够快速适应新用户,以实现最佳性能。在三个广泛使用的数据集上进行的大量定量实验表明,metaCSR在处理用户冷启动问题方面具有显著的性能。同时,一系列的定性分析表明,该方法具有良好的泛化能力。 摘要:The cold-start recommendation is an urgent problem in contemporary online applications. It aims to provide users whose behaviors are literally sparse with as accurate recommendations as possible. Many data-driven algorithms, such as the widely used matrix factorization, underperform because of data sparseness. This work adopts the idea of meta-learning to solve the user's cold-start recommendation problem. We propose a meta-learning based cold-start sequential recommendation framework called metaCSR, including three main components: Diffusion Representer for learning better user/item embedding through information diffusion on the interaction graph; Sequential Recommender for capturing temporal dependencies of behavior sequences; Meta Learner for extracting and propagating transferable knowledge of prior users and learning a good initialization for new users. metaCSR holds the ability to learn the common patterns from regular users' behaviors and optimize the initialization so that the model can quickly adapt to new users after one or a few gradient updates to achieve optimal performance. The extensive quantitative experiments on three widely-used datasets show the remarkable performance of metaCSR in dealing with user cold-start problem. Meanwhile, a series of qualitative analysis demonstrates that the proposed metaCSR has good generalization.
【2】 Revisiting Popularity and Demographic Biases in Recommender Evaluation and Effectiveness 标题:再论推荐人评价和有效性中的人气和人口偏向 链接:https://arxiv.org/abs/2110.08353
作者:Nicola Neophytou,Bhaskar Mitra,Catherine Stinson 机构: The University of Manchester, Oxford Rd, Manchester M,PL, UK, Microsoft, Rue Marconi, Montréal, Quebec, H,S ,J, Canada, School of Computing, Goodwin Hall, Queen’s University, Kingston ON, K,L 摘要:推荐算法容易受到流行偏见的影响:即使流行项目不能满足用户需求,也会倾向于推荐它们。一个相关的问题是,推荐质量可能因人口统计组而异。与其他算法相比,边缘化群体或在训练数据中代表性不足的群体可能从这些算法中获得的相关建议较少。在最近的一项研究中,Ekstrand等人调查了推荐人的表现如何随受欢迎程度和人口统计学的不同而变化,并发现在两个数据集上,二元性别之间的推荐效用在统计学上存在显著差异,在一个数据集上,推荐人的推荐效用在年龄上存在显著差异。在这里,我们复制这些结果,并通过额外的分析对其进行扩展。我们发现在年龄和性别方面,推荐人的表现存在显著的统计学差异。我们观察到,老年用户的推荐效用稳步下降,女性的推荐效用低于男性。我们还发现,对于来自数据集中具有更多代表性的国家的用户,效用更高。此外,我们发现,消费内容的总使用量和受欢迎程度是推荐人绩效的有力预测因素,并且在人口统计学组中也存在显著差异。 摘要:Recommendation algorithms are susceptible to popularity bias: a tendency to recommend popular items even when they fail to meet user needs. A related issue is that the recommendation quality can vary by demographic groups. Marginalized groups or groups that are under-represented in the training data may receive less relevant recommendations from these algorithms compared to others. In a recent study, Ekstrand et al. investigate how recommender performance varies according to popularity and demographics, and find statistically significant differences in recommendation utility between binary genders on two datasets, and significant effects based on age on one dataset. Here we reproduce those results and extend them with additional analyses. We find statistically significant differences in recommender performance by both age and gender. We observe that recommendation utility steadily degrades for older users, and is lower for women than men. We also find that the utility is higher for users from countries with more representation in the dataset. In addition, we find that total usage and the popularity of consumed content are strong predictors of recommender performance and also vary significantly across demographic groups.
聚类(5篇)
【1】 Recovery Guarantees for Kernel-based Clustering under Non-parametric Mixture Models 标题:非参数混合模型下基于核的聚类的恢复保证 链接:https://arxiv.org/abs/2110.09476
作者:Leena Chennuru Vankadara,Sebastian Bordt,Ulrike von Luxburg,Debarghya Ghoshdastidar 机构:University of T¨ubingen, Max Planck Institute, for Intelligent Systems, T¨ubingen, Technical University of, Munich 摘要:尽管基于内核的聚类无处不在,但在考虑数据生成过程的强结构假设的设置之外,令人惊讶的是很少有统计保证。在这项工作中,我们通过研究非参数混合模型下基于核的聚类算法的统计性能,朝着缩小这一差距迈出了一步。我们提供了必要和充分的可分性条件,在这些条件下,这些算法可以一致地恢复潜在的真实聚类。我们的分析为内核聚类方法提供了保证,而无需对组件分布的形式进行结构性假设。此外,我们在基于核的数据聚类和基于核密度的聚类之间建立了一个关键等价关系。这使我们能够为非参数混合模型的基于核的估计提供一致性保证。除了理论意义外,这种联系还可能具有实际意义,包括在聚类的背景下系统地选择高斯核的带宽。 摘要:Despite the ubiquity of kernel-based clustering, surprisingly few statistical guarantees exist beyond settings that consider strong structural assumptions on the data generation process. In this work, we take a step towards bridging this gap by studying the statistical performance of kernel-based clustering algorithms under non-parametric mixture models. We provide necessary and sufficient separability conditions under which these algorithms can consistently recover the underlying true clustering. Our analysis provides guarantees for kernel clustering approaches without structural assumptions on the form of the component distributions. Additionally, we establish a key equivalence between kernel-based data-clustering and kernel density-based clustering. This enables us to provide consistency guarantees for kernel-based estimators of non-parametric mixture models. Along with theoretical implications, this connection could have practical implications, including in the systematic choice of the bandwidth of the Gaussian kernel in the context of clustering.
【2】 Noise-Resilient Ensemble Learning using Evidence Accumulation Clustering 标题:基于证据积累聚类的抗噪集成学习 链接:https://arxiv.org/abs/2110.09212
作者:Gaëlle Candel,David Naccache 机构:Wordline TSS Labs, Paris, &, Département d’informatique de l’ENS, ENS, CNRS, PSL University, Paris 备注:12 pages, submitted and accepted to ANTIC-2021 (International Conference on Advanced Network Technologies and Intelligent Computing) 摘要:集成学习方法将执行同一任务的多个算法结合起来,以建立一个质量更高的组。这些系统很好地适应分布式设置,其中网络的每个对等方或机器承载一个算法,并将其结果传递给其对等方。由于集成冗余,集成学习方法自然能够适应多个节点的缺失。然而,网络可能被破坏,改变对等点的预测精度,这对集合质量有不利影响。在本文中,我们提出了一种抗噪声的集成分类方法,它有助于提高分类精度和纠正随机错误。该方法受证据积累聚类的启发,适用于分类集合。我们将其与四个多类数据集上的朴素选民模型进行了比较。我们的模型显示了更大的弹性,使我们能够在非常高的噪声水平下恢复预测。此外,由于该方法是基于证据积累聚类的,因此我们的方法具有高度的灵活性,因为它可以组合具有不同标签定义的分类器。 摘要:Ensemble Learning methods combine multiple algorithms performing the same task to build a group with superior quality. These systems are well adapted to the distributed setup, where each peer or machine of the network hosts one algorithm and communicate its results to its peers. Ensemble learning methods are naturally resilient to the absence of several peers thanks to the ensemble redundancy. However, the network can be corrupted, altering the prediction accuracy of a peer, which has a deleterious effect on the ensemble quality. In this paper, we propose a noise-resilient ensemble classification method, which helps to improve accuracy and correct random errors. The approach is inspired by Evidence Accumulation Clustering , adapted to classification ensembles. We compared it to the naive voter model over four multi-class datasets. Our model showed a greater resilience, allowing us to recover prediction under a very high noise level. In addition as the method is based on the evidence accumulation clustering, our method is highly flexible as it can combines classifiers with different label definitions.
【3】 MARTINI: Smart Meter Driven Estimation of HVAC Schedules and Energy Savings Based on WiFi Sensing and Clustering 标题:Martini:基于WiFi传感和聚类的智能仪表驱动的暖通空调计划和节能评估 链接:https://arxiv.org/abs/2110.08927
作者:Kingsley Nweye,Zoltan Nagy 机构:• MARTINI derives ramp-up and ramp-down times of HVAC systems based on measured occupancy schedules, • Savings estimates are provided, even if no occupancy data is available, • Daily and seasonal occupancy and chilled water profile identified respectively. 备注:submitted 摘要:暖通空调系统占建筑能源使用的很大一部分。夜间后退调度是一种节能措施,在空闲期间分别增加和减少冷却和加热设定点,以实现节能。然而,需要了解建筑物的实际占用情况,才能最大限度地提高该措施的成功率。此外,还需要一种可扩展的方法来估算节能措施的节能潜力,这种方法不受建筑特定参数和实验或模拟投资的限制。在这里,我们提出了MARTINI,这是一种智能电表驱动的估算,利用商业建筑中无处不在的能源智能电表和WiFi基础设施,对居住者衍生的HVAC计划和节能进行估算。我们通过对WiFi导出的占用情况进行聚类来估计时间表,并通过对智能电表能源情况进行聚类得到的典型/测量负载情况中观察到的上升和下降时间的移动来节省能源。我们在七个月内对五栋建筑的案例研究结果表明,当暖通空调系统运行与入住率保持一致时,平均可节省8.1%-10.8%(夏季)和0.2%-5.9%(秋季)的冷冻水能耗。我们用建筑能耗性能模拟(BEPS)的结果验证了我们的方法,发现马提尼酒的平均节省量在BEPS预测的0.9%-2.4%范围内。在没有占用信息的情况下,我们仍然可以通过增加爬升时间和减少倒退开始时间来估计潜在的节约。在51座教学楼中,我们发现节约潜力在1%-5%之间。 摘要:HVAC systems account for a significant portion of building energy use. Nighttime setback scheduling is an energy conservation measure where cooling and heating setpoints are increased and decreased respectively during unoccupied periods with the goal of obtaining energy savings. However, knowledge of a building's real occupancy is required to maximize the success of this measure. In addition, there is the need for a scalable way to estimate energy savings potential from energy conservation measures that is not limited by building specific parameters and experimental or simulation modeling investments. Here, we propose MARTINI, a sMARt meTer drIveN estImation of occupant-derived HVAC schedules and energy savings that leverages the ubiquity of energy smart meters and WiFi infrastructure in commercial buildings. We estimate the schedules by clustering WiFi-derived occupancy profiles and, energy savings by shifting ramp-up and setback times observed in typical/measured load profiles obtained by clustering smart meter energy profiles. Our case-study results with five buildings over seven months show an average of 8.1%-10.8% (summer) and 0.2%-5.9% (fall) chilled water energy savings when HVAC system operation is aligned with occupancy. We validate our method with results from building energy performance simulation (BEPS) and find that estimated average savings of MARTINI are within 0.9%-2.4% of the BEPS predictions. In the absence of occupancy information, we can still estimate potential savings from increasing ramp-up time and decreasing setback start time. In 51 academic buildings, we find savings potentials between 1%-5%.
【4】 Noise-robust Clustering 标题:抗噪声聚类 链接:https://arxiv.org/abs/2110.08871
作者:Rahmat Adesunkanmi,Ratnesh Kumar 机构:accessible. 摘要:本文提出了无监督机器学习中的噪声鲁棒聚类技术。噪音、一致性和其他模糊性的不确定性可能成为数据分析中的严重障碍。因此,在处理大数据时,数据质量、清理、管理和治理仍然是至关重要的原则。有了这种复杂性,就不再像在经典环境中那样确定性地处理数据了,考虑噪声分布及其对数据样本值的影响就变得有意义了。经典聚类方法根据数据在底层空间中的相对距离或相似性将数据分组为“相似类”。本文通过在数据分布(而不是原始数据)上扩展经典的$K$-均值和$K$-中值聚类来解决这个问题。这涉及使用两种测量方法测量分布之间的距离:最佳质量传输(也称为Wasserstein距离,表示为$W_2$)和本文提出的一种新的距离测量方法,随机变量距离的期望值(表示为)。本文提出的基于分布的$K$-means和$K$-medoids算法首先对数据分布进行聚类,然后将每个原始数据分配给数据分布的聚类。 摘要:This paper presents noise-robust clustering techniques in unsupervised machine learning. The uncertainty about the noise, consistency, and other ambiguities can become severe obstacles in data analytics. As a result, data quality, cleansing, management, and governance remain critical disciplines when working with Big Data. With this complexity, it is no longer sufficient to treat data deterministically as in a classical setting, and it becomes meaningful to account for noise distribution and its impact on data sample values. Classical clustering methods group data into "similarity classes" depending on their relative distances or similarities in the underlying space. This paper addressed this problem via the extension of classical $K$-means and $K$-medoids clustering over data distributions (rather than the raw data). This involves measuring distances among distributions using two types of measures: the optimal mass transport (also called Wasserstein distance, denoted $W_2$) and a novel distance measure proposed in this paper, the expected value of random variable distance (denoted ED). The presented distribution-based $K$-means and $K$-medoids algorithms cluster the data distributions first and then assign each raw data to the cluster of data's distribution.
【5】 Robust Correlation Clustering with Asymmetric Noise 标题:非对称噪声下的鲁棒相关聚类 链接:https://arxiv.org/abs/2110.08385
作者:Jimit Majmudar,Stephen Vavasis 机构:University of Waterloo 摘要:图聚类问题通常旨在对图节点进行分区,使得两个节点在且仅当它们相似时才属于同一分区集。相关聚类是一种图聚类公式,它:(1)以边权重表示节点之间的相似性/相异性度量的有符号图作为输入,(2)不需要事先估计输入图中的聚类数。然而,基于相关聚类的组合优化问题是NP难问题。在这项工作中,我们提出了一种新的图生成模型,称为节点因子模型(NFM),该模型基于生成图节点的特征向量/嵌入。NFM生成的图包含不对称噪声,即同一簇中可能存在负相关的成对节点。我们利用半定规划的技术,提出了一种新的相关聚类算法\anOMD。结合理论和计算结果,我们证明$\texttt{$\ellu 2$-norm diag}$在NFM生成的图实例中恢复具有足够强的集群成员资格的节点,从而在建立我们所提出算法的可证明鲁棒性方面取得进展。 摘要:Graph clustering problems typically aim to partition the graph nodes such that two nodes belong to the same partition set if and only if they are similar. Correlation Clustering is a graph clustering formulation which: (1) takes as input a signed graph with edge weights representing a similarity/dissimilarity measure between the nodes, and (2) requires no prior estimate of the number of clusters in the input graph. However, the combinatorial optimization problem underlying Correlation Clustering is NP-hard. In this work, we propose a novel graph generative model, called the Node Factors Model (NFM), which is based on generating feature vectors/embeddings for the graph nodes. The graphs generated by the NFM contain asymmetric noise in the sense that there may exist pairs of nodes in the same cluster which are negatively correlated. We propose a novel Correlation Clustering algorithm, called \anormd, using techniques from semidefinite programming. Using a combination of theoretical and computational results, we demonstrate that $\texttt{$\ell_2$-norm-diag}$ recovers nodes with sufficiently strong cluster membership in graph instances generated by the NFM, thereby making progress towards establishing the provable robustness of our proposed algorithm.
超分辨率|去噪|去模糊|去雾(1篇)
【1】 Convolutional Deep Denoising Autoencoders for Radio Astronomical Images 标题:射电天文图像卷积深度去噪自动编码器 链接:https://arxiv.org/abs/2110.08618
作者:Claudio Gheller,Franco Vazza 机构: Universit´a di Bologna 备注:21 pages, 14 figures, Accepted for publication by MNRAS 摘要:我们应用一种称为卷积去噪自动编码器的机器学习技术,对最先进的射电望远镜的合成图像进行去噪,目的是检测预测为射电宇宙网特征的微弱、扩散的射电源。在我们的应用中,去噪旨在减少随机仪器噪声,并最小化孔径合成技术产生的附加杂散伪影,如旁瓣。分析了该方法对不同类型的输入图像的有效性和准确性,以及其计算性能。特别注意为训练创建逼真的模拟观测,利用宇宙学数值模拟的结果,生成与150 MHz下LOFAR HBA 8小时观测相对应的图像。我们的自动编码器可以有效地去噪复杂的图像识别和提取微弱的物体在仪器灵敏度的限制。该方法可以在大型数据集上高效扩展,利用高性能计算解决方案,以完全自动化的方式(即训练后无需人工监督)。它可以准确地执行图像分割,识别扩散源的低亮度外围,证明是检测隐藏在噪声无线电观测中的具有挑战性的扩展对象的可行解决方案。 摘要:We apply a Machine Learning technique known as Convolutional Denoising Autoencoder to denoise synthetic images of state-of-the-art radio telescopes, with the goal of detecting the faint, diffused radio sources predicted to characterise the radio cosmic web. In our application, denoising is intended to address both the reduction of random instrumental noise and the minimisation of additional spurious artefacts like the sidelobes, resulting from the aperture synthesis technique. The effectiveness and the accuracy of the method are analysed for different kinds of corrupted input images, together with its computational performance. Specific attention has been devoted to create realistic mock observations for the training, exploiting the outcomes of cosmological numerical simulations, to generate images corresponding to LOFAR HBA 8 hours observations at 150 MHz. Our autoencoder can effectively denoise complex images identifying and extracting faint objects at the limits of the instrumental sensitivity. The method can efficiently scale on large datasets, exploiting high performance computing solutions, in a fully automated way (i.e. no human supervision is required after training). It can accurately perform image segmentation, identifying low brightness outskirts of diffused sources, proving to be a viable solution for detecting challenging extended objects hidden in noisy radio observations.
自动驾驶|车辆|车道检测等(2篇)
【1】 SPAP: Simultaneous Demand Prediction and Planning for Electric Vehicle Chargers in a New City 标题:SPAP:新城市电动汽车充电器同步需求预测与规划 链接:https://arxiv.org/abs/2110.09452
作者:Yizong Wang,Dong Zhao,Yajie Ren,Desheng Zhang,Huadong Ma 机构:∗Beijing Key Laboratory of Intelligent Telecommunication Software and Multimedia, Beijing University of Posts and Telecommunications, Beijing, China, †Rutgers University, USA 摘要:对于一个致力于推广电动汽车(EV)的新城来说,规划充电需求高的公共充电基础设施具有重要意义。然而,由于缺乏运行数据,在实际部署电动汽车充电器之前很难预测充电需求,从而导致死锁。一个直接的想法是利用城市转移学习范式从源城市学习知识,然后利用它预测充电需求,同时确定目标城市充电站慢速/快速充电器的位置和数量。然而,需求预测和充电器规划相互依赖,需要对预测模型进行重新训练,以消除每个不同充电器计划的城市之间的负迁移,从而导致不可接受的时间复杂性。为此,我们提出了同时需求预测和规划(SPAP)的概念和有效解决方案:从多源数据中提取判别特征,并将其输入基于注意的时空城市域自适应网络(AST-CDAN)进行跨城市需求预测;通过迭代利用AST-CDAN和充电器计划微调算法,设计了一种新的用于充电器计划的转移迭代优化(TIO)算法。在中国三个城市收集的真实数据集上进行的大量实验验证了SPAP的有效性和效率。特别是,与实际充电器部署相比,SPAP最多可提高72.5%的收入。 摘要:For a new city that is committed to promoting Electric Vehicles (EVs), it is significant to plan the public charging infrastructure where charging demands are high. However, it is difficult to predict charging demands before the actual deployment of EV chargers for lack of operational data, resulting in a deadlock. A direct idea is to leverage the urban transfer learning paradigm to learn the knowledge from a source city, then exploit it to predict charging demands, and meanwhile determine locations and amounts of slow/fast chargers for charging stations in the target city. However, the demand prediction and charger planning depend on each other, and it is required to re-train the prediction model to eliminate the negative transfer between cities for each varied charger plan, leading to the unacceptable time complexity. To this end, we propose the concept and an effective solution of Simultaneous Demand Prediction And Planning (SPAP): discriminative features are extracted from multi-source data, and fed into an Attention-based Spatial-Temporal City Domain Adaptation Network (AST-CDAN) for cross-city demand prediction; a novel Transfer Iterative Optimization (TIO) algorithm is designed for charger planning by iteratively utilizing AST-CDAN and a charger plan fine-tuning algorithm. Extensive experiments on real-world datasets collected from three cities in China validate the effectiveness and efficiency of SPAP. Specially, SPAP improves at most 72.5% revenue compared with the real-world charger deployment.
【2】 MAAD: A Model and Dataset for "Attended Awareness" in Driving 链接:https://arxiv.org/abs/2110.08610
作者:Deepak Gopinath,Guy Rosman,Simon Stent,Katsuya Terahata,Luke Fletcher,Brenna Argall,John Leonard 备注:25 pages, 13 figures, 14 tables, Accepted at EPIC@ICCV 2021 Workshop. Main paper + Supplementary Material 摘要:我们提出了一个计算模型来估计一个人对环境的感知。我们将参与意识定义为一个人在最近的历史中参与过的潜在动态场景的那些部分,并且他们仍然可能在身体上意识到这些部分。我们的模型以视频和噪声注视估计的形式作为输入场景信息,并输出视觉显著性、精细注视估计和人的注意感知估计。为了测试我们的模型,我们用一个高精度的凝视跟踪器捕获了一个新的数据集,其中包括23名观看驾驶场景视频的受试者24.5小时的凝视序列。该数据集还包含基于扫描路径观察的受试者注意意识的第三方注释。我们的结果表明,我们的模型能够在受控环境下合理估计有人参与的意识,并且在未来可能扩展到真实的以自我为中心的驾驶数据,以帮助在安全系统中实现更有效的提前警告,从而提高驾驶员的驾驶性能。我们还使用我们的数据集和现有的显著性数据集证明了我们的模型在显著性、凝视校准和去噪任务上的有效性。我们的模型和数据集在https://github.com/ToyotaResearchInstitute/att-aware/. 摘要:We propose a computational model to estimate a person's attended awareness of their environment. We define attended awareness to be those parts of a potentially dynamic scene which a person has attended to in recent history and which they are still likely to be physically aware of. Our model takes as input scene information in the form of a video and noisy gaze estimates, and outputs visual saliency, a refined gaze estimate, and an estimate of the person's attended awareness. In order to test our model, we capture a new dataset with a high-precision gaze tracker including 24.5 hours of gaze sequences from 23 subjects attending to videos of driving scenes. The dataset also contains third-party annotations of the subjects' attended awareness based on observations of their scan path. Our results show that our model is able to reasonably estimate attended awareness in a controlled setting, and in the future could potentially be extended to real egocentric driving data to help enable more effective ahead-of-time warnings in safety systems and thereby augment driver performance. We also demonstrate our model's effectiveness on the tasks of saliency, gaze calibration, and denoising, using both our dataset and an existing saliency dataset. We make our model and dataset available at https://github.com/ToyotaResearchInstitute/att-aware/.
联邦学习|隐私保护|加密(3篇)
【1】 Towards Federated Bayesian Network Structure Learning with Continuous Optimization 标题:基于连续优化的联邦贝叶斯网络结构学习 链接:https://arxiv.org/abs/2110.09356
作者:Ignavier Ng,Kun Zhang 机构:Carnegie Mellon University 备注:16 pages; 5 figures 摘要:传统上,贝叶斯网络结构学习通常是在一个中心站点上进行的,在该站点上收集所有数据。然而,在实践中,数据可能分布在不同的方(例如,公司、设备),这些方打算共同学习贝叶斯网络,但出于隐私或安全考虑不愿意披露与其数据相关的信息。在这项工作中,我们提出了一种跨思洛联盟学习方法,以估计贝叶斯网络的结构,从数据是横向分割的不同方面。我们提出了一种基于连续优化的分布式结构学习方法,使用交替方向乘子法(ADMM),使得在优化过程中只需交换模型参数。我们通过在线性和非线性情况下采用该方法,证明了该方法的灵活性。在合成数据集和真实数据集上的实验结果表明,与其他方法相比,该方法取得了更好的性能,尤其是当客户端数量相对较多且每个客户端的样本量有限时。 摘要:Traditionally, Bayesian network structure learning is often carried out at a central site, in which all data is gathered. However, in practice, data may be distributed across different parties (e.g., companies, devices) who intend to collectively learn a Bayesian network, but are not willing to disclose information related to their data owing to privacy or security concerns. In this work, we present a cross-silo federated learning approach to estimate the structure of Bayesian network from data that is horizontally partitioned across different parties. We develop a distributed structure learning method based on continuous optimization, using the alternating direction method of multipliers (ADMM), such that only the model parameters have to be exchanged during the optimization process. We demonstrate the flexibility of our approach by adopting it for both linear and nonlinear cases. Experimental results on synthetic and real datasets show that it achieves an improved performance over the other methods, especially when there is a relatively large number of clients and each has a limited sample size.
【2】 Towards General Deep Leakage in Federated Learning 标题:论联合学习中的普遍深度泄漏 链接:https://arxiv.org/abs/2110.09074
作者:Jiahui Geng,Yongli Mou,Feifei Li,Qing Li,Oya Beyan,Stefan Decker,Chunming Rong 机构: University of Stavanger, RWTH-Aachen University, University of Cologne 摘要:与传统的集中训练不同,联邦学习(FL)通过共享和聚合本地模型而不是本地数据来保护用户的隐私,从而提高了全局模型的性能。尽管这种训练方法看起来很安全,但一些研究表明,攻击者仍然可以基于共享梯度信息恢复私有数据。这种动态重建攻击值得深入研究,因为它可以发生在训练的任何阶段,无论是在模型训练的开始还是结束时;不需要相关数据集,也不需要训练其他模型。我们突破了一些不切实际的假设和限制,将这种重建攻击应用于更广泛的场景中。我们提出了一些方法,分别对应于FedSGD和FedAvg使用场景,从共享梯度或权重重构训练数据。我们提出了一种零拍方法来恢复标签,即使批次中存在重复的标签。我们研究了标签和图像恢复之间的关系。我们发现,即使批次中只有一个错误推断的标签,图像恢复也会失败;我们还发现,当批处理图像具有相同的标签时,相应的图像被恢复为该类图像的融合。我们的方法基于经典图像基准进行评估,包括CIFAR-10和ImageNet。我们的方法的批量大小、图像质量和标签分布的适应性超过了最先进的梯度转换方法。 摘要:Unlike traditional central training, federated learning (FL) improves the performance of the global model by sharing and aggregating local models rather than local data to protect the users' privacy. Although this training approach appears secure, some research has demonstrated that an attacker can still recover private data based on the shared gradient information. This on-the-fly reconstruction attack deserves to be studied in depth because it can occur at any stage of training, whether at the beginning or at the end of model training; no relevant dataset is required and no additional models need to be trained. We break through some unrealistic assumptions and limitations to apply this reconstruction attack in a broader range of scenarios. We propose methods that can reconstruct the training data from shared gradients or weights, corresponding to the FedSGD and FedAvg usage scenarios, respectively. We propose a zero-shot approach to restore labels even if there are duplicate labels in the batch. We study the relationship between the label and image restoration. We find that image restoration fails even if there is only one incorrectly inferred label in the batch; we also find that when batch images have the same label, the corresponding image is restored as a fusion of that class of images. Our approaches are evaluated on classic image benchmarks, including CIFAR-10 and ImageNet. The batch size, image quality, and the adaptability of the label distribution of our approach exceed those of GradInversion, the state-of-the-art.
【3】 Nothing Wasted: Full Contribution Enforcement in Federated Edge Learning 标题:不浪费任何东西:联合边缘学习中的完全贡献强制执行 链接:https://arxiv.org/abs/2110.08330
作者:Qin Hu,Shengling Wang,Zeihui Xiong,Xiuzhen Cheng 机构:edu•Shengling Wang is with the School of Artificial Intelligence, BeijingNormal University, Singapore University of Technology and Design 摘要:网络边缘产生的爆炸性数据量使得移动边缘计算成为支持实时应用的关键技术,需要机器学习(ML)技术提供强大的数据处理和分析。特别是,联邦边缘学习(FEL)通过将用于训练ML模型的数据保存在本地,在保护数据所有者的隐私方面变得尤为突出。关于自由电子激光的现有研究要么利用过程优化,要么提前移除不合格的参与者。在本文中,我们加强了FEL中所有边缘设备的协作,以确保使用所有可用的本地数据来训练ML模型,从而加快学习过程。为此,我们提出了一种不完全信息多人自由电子交易博弈下的集体勒索(CE)策略,该策略被证明能够有效地帮助服务器有效地获取所有设备的全部贡献,而不必担心遭受任何经济损失。从技术上讲,我们提出的CE策略将控制单个对手预期效用的比例的经典勒索策略扩展到对一组参与者的快速同质控制,这进一步体现了公平对待所有参与者的吸引人的特点。此外,行政长官战略丰富了博弈论的层次结构,促进了勒索战略更广泛的应用范围。理论分析和实验评估都验证了该方案的有效性和公平性。 摘要:The explosive amount of data generated at the network edge makes mobile edge computing an essential technology to support real-time applications, calling for powerful data processing and analysis provided by machine learning (ML) techniques. In particular, federated edge learning (FEL) becomes prominent in securing the privacy of data owners by keeping the data locally used to train ML models. Existing studies on FEL either utilize in-process optimization or remove unqualified participants in advance. In this paper, we enhance the collaboration from all edge devices in FEL to guarantee that the ML model is trained using all available local data to accelerate the learning process. To that aim, we propose a collective extortion (CE) strategy under the imperfect-information multi-player FEL game, which is proved to be effective in helping the server efficiently elicit the full contribution of all devices without worrying about suffering from any economic loss. Technically, our proposed CE strategy extends the classical extortion strategy in controlling the proportionate share of expected utilities for a single opponent to the swiftly homogeneous control over a group of players, which further presents an attractive trait of being impartial for all participants. Moreover, the CE strategy enriches the game theory hierarchy, facilitating a wider application scope of the extortion strategy. Both theoretical analysis and experimental evaluations validate the effectiveness and fairness of our proposed scheme.
推理|分析|理解|解释(9篇)
【1】 PixelPyramids: Exact Inference Models from Lossless Image Pyramids 标题:像素金字塔:无损图像金字塔的精确推理模型 链接:https://arxiv.org/abs/2110.08787
作者:Shweta Mahajan,Stefan Roth 机构:Department of Computer Science, TU Darmstadt, hessian.AI 备注:To appear at ICCV 2021 摘要:自回归模型是一类精确推理方法,具有高度灵活的函数形式,可产生最先进的自然图像密度估计。然而,维度上的顺序使得这些模型的计算成本很高,并且限制了它们对低分辨率图像的适用性。在这项工作中,我们提出了像素金字塔,这是一种块自回归方法,采用无损金字塔分解和特定尺度表示来编码图像像素的联合分布。关键的是,与完全自回归方法相比,它提供了更稀疏的依赖结构。我们的像素金字塔为各种图像数据集(尤其是高分辨率数据)的密度估计提供了最先进的结果。对于CelebA HQ 1024 x 1024,我们观察到,尽管采样速度甚至优于易于并行化的基于流的模型,但密度估计(以比特/暗为单位)提高到基线的约44%。 摘要:Autoregressive models are a class of exact inference approaches with highly flexible functional forms, yielding state-of-the-art density estimates for natural images. Yet, the sequential ordering on the dimensions makes these models computationally expensive and limits their applicability to low-resolution imagery. In this work, we propose Pixel-Pyramids, a block-autoregressive approach employing a lossless pyramid decomposition with scale-specific representations to encode the joint distribution of image pixels. Crucially, it affords a sparser dependency structure compared to fully autoregressive approaches. Our PixelPyramids yield state-of-the-art results for density estimation on various image datasets, especially for high-resolution data. For CelebA-HQ 1024 x 1024, we observe that the density estimates (in terms of bits/dim) are improved to ~44% of the baseline despite sampling speeds superior even to easily parallelizable flow-based models.
【2】 On the Statistical Analysis of Complex Tree-shaped 3D Objects 标题:关于复杂树形三维物体的统计分析 链接:https://arxiv.org/abs/2110.08693
作者:Guan Wang,Hamid Laga,Anuj Srivastava 机构:au•Anuj Srivastava is with the Department of Statistics 摘要:人们如何分析表现出复杂几何和拓扑变化的详细3D生物对象,如神经元和植物树?在本文中,我们开发了一个新的数学框架,用于表示、比较和计算此类树状三维对象形状之间的测地变形。子树的层次结构是这些对象的特征——每个子树都有主分支和一些附加的分支——并且需要跨对象匹配这些结构以进行有意义的比较。我们提出了一种新的表示方法,将最初为欧几里德曲线开发的平方根速度函数(SRVF)扩展到树形3D对象。然后,我们定义了一个新的度量,用于量化将一个树状对象变形为另一个所需的弯曲、拉伸和分支滑动。与当前的度量(如商欧几里德距离(QED)和树编辑距离(TED)相比,所提出的表示和度量捕获了分支的全部弹性(即弯曲和拉伸)以及拓扑变化(即分支死亡/出生和滑动)。它完全避免了QED和TED度量的边折叠和节点拆分操作所导致的收缩。我们演示了该框架在比较、匹配和计算生物对象(如神经元和植物树)之间的测地线方面的实用性。该框架还适用于各种形状分析任务:(i)树形3D对象的对称性分析和对称化,(ii)计算树形3D对象总体的汇总统计(变化的平均值和模式),(iii)将参数概率分布拟合到此类总体,以及(iv)最后,根据估计的概率分布通过随机抽样合成新的树形3D对象。 摘要:How can one analyze detailed 3D biological objects, such as neurons and botanical trees, that exhibit complex geometrical and topological variation? In this paper, we develop a novel mathematical framework for representing, comparing, and computing geodesic deformations between the shapes of such tree-like 3D objects. A hierarchical organization of subtrees characterizes these objects -- each subtree has the main branch with some side branches attached -- and one needs to match these structures across objects for meaningful comparisons. We propose a novel representation that extends the Square-Root Velocity Function (SRVF), initially developed for Euclidean curves, to tree-shaped 3D objects. We then define a new metric that quantifies the bending, stretching, and branch sliding needed to deform one tree-shaped object into the other. Compared to the current metrics, such as the Quotient Euclidean Distance (QED) and the Tree Edit Distance (TED), the proposed representation and metric capture the full elasticity of the branches (i.e., bending and stretching) as well as the topological variations (i.e., branch death/birth and sliding). It completely avoids the shrinkage that results from the edge collapse and node split operations of the QED and TED metrics. We demonstrate the utility of this framework in comparing, matching, and computing geodesics between biological objects such as neurons and botanical trees. The framework is also applied to various shape analysis tasks: (i) symmetry analysis and symmetrization of tree-shaped 3D objects, (ii) computing summary statistics (means and modes of variations) of populations of tree-shaped 3D objects, (iii) fitting parametric probability distributions to such populations, and (iv) finally synthesizing novel tree-shaped 3D objects through random sampling from estimated probability distributions.
【3】 GradSign: Model Performance Inference with Theoretical Insights 标题:GradSign:具有理论洞察力的模型性能推断 链接:https://arxiv.org/abs/2110.08616
作者:Zhihao Zhang,Zhihao Jia 机构:Carnegie Mellon University 备注:Preprint. Under review 摘要:神经结构搜索(NAS)的一个关键挑战是快速推断广谱网络的预测性能,以发现统计上准确且计算效率高的网络。我们将此任务称为模型性能推断(MPI)。高效MPI的当前实践是基于梯度的方法,该方法在初始化时利用网络的梯度来推断其性能。然而,现有的基于梯度的方法仅依赖于启发式度量,缺乏必要的理论基础来巩固其设计。我们提出了GradSign,这是一种精确、简单、灵活的模型性能推断指标,具有理论见解。GradSign背后的关键思想是在单个训练样本的粒度上分析不同网络的优化情况的数量{\Psi}。理论上,我们证明了在合理的假设下,网络的训练和真实的人口损失都是{\Psi}成比例的上界。此外,我们还设计了GradSign,这是一种精确而简单的{\Psi}近似,使用在随机初始化状态下评估的网络梯度。对三个训练数据集的七个NAS基准进行的评估表明,GradSign能够很好地推广到现实世界的网络,并且始终优于Spearman的{\rho}和Kendall的Tau评估的基于梯度的MPI最新方法。此外,我们将GradSign集成到四种现有的NAS算法中,并表明GradSign辅助的NAS算法在三种实际任务中,通过将最佳发现网络的准确度提高0.3%、1.1%和1.0%,优于普通算法。 摘要:A key challenge in neural architecture search (NAS) is quickly inferring the predictive performance of a broad spectrum of networks to discover statistically accurate and computationally efficient ones. We refer to this task as model performance inference (MPI). The current practice for efficient MPI is gradient-based methods that leverage the gradients of a network at initialization to infer its performance. However, existing gradient-based methods rely only on heuristic metrics and lack the necessary theoretical foundations to consolidate their designs. We propose GradSign, an accurate, simple, and flexible metric for model performance inference with theoretical insights. The key idea behind GradSign is a quantity {\Psi} to analyze the optimization landscape of different networks at the granularity of individual training samples. Theoretically, we show that both the network's training and true population losses are proportionally upper-bounded by {\Psi} under reasonable assumptions. In addition, we design GradSign, an accurate and simple approximation of {\Psi} using the gradients of a network evaluated at a random initialization state. Evaluation on seven NAS benchmarks across three training datasets shows that GradSign generalizes well to real-world networks and consistently outperforms state-of-the-art gradient-based methods for MPI evaluated by Spearman's {\rho} and Kendall's Tau. Additionally, we integrate GradSign into four existing NAS algorithms and show that the GradSign-assisted NAS algorithms outperform their vanilla counterparts by improving the accuracies of best-discovered networks by up to 0.3%, 1.1%, and 1.0% on three real-world tasks.
【4】 Case-based Reasoning for Better Generalization in Text-Adventure Games 标题:基于案例推理的文本冒险游戏中更好的泛化 链接:https://arxiv.org/abs/2110.08470
作者:Mattia Atzeni,Shehzaad Dhuliawala,Keerthiram Murugesan,Mrinmaya Sachan 机构:IBM Research, EPFL, ETH Zürich 摘要:基于文本的游戏(TBG)已经成为推动扎根语言理解研究和研究泛化和样本效率等问题的一个很有前景的环境。针对TBG提出了几种不同结构和学习方案的深度强化学习(RL)方法。然而,这些方法不能有效地推广,特别是在分布转移的情况下。与深度RL方法不同,在本文中,我们提出了一种基于案例推理的通用方法来训练agent,并将其推广到训练分布之外。基于案例的推理器从代理过去与世界的交互中收集积极经验的实例,然后重用收集的经验以有效地采取行动。该方法可与文献中关于TBG的任何现有策略神经代理结合使用。我们的实验表明,该方法持续改进了现有方法,获得了良好的分布外泛化,并在广泛使用的环境中取得了新的最新成果。 摘要:Text-based games (TBG) have emerged as promising environments for driving research in grounded language understanding and studying problems like generalization and sample efficiency. Several deep reinforcement learning (RL) methods with varying architectures and learning schemes have been proposed for TBGs. However, these methods fail to generalize efficiently, especially under distributional shifts. In a departure from deep RL approaches, in this paper, we propose a general method inspired by case-based reasoning to train agents and generalize out of the training distribution. The case-based reasoner collects instances of positive experiences from the agent's interaction with the world in the past and later reuses the collected experiences to act efficiently. The method can be applied in conjunction with any existing on-policy neural agent in the literature for TBGs. Our experiments show that the proposed approach consistently improves existing methods, obtains good out-of-distribution generalization, and achieves new state-of-the-art results on widely used environments.
【5】 Efficient Representations for Privacy-Preserving Inference 标题:隐私保护推理的有效表示 链接:https://arxiv.org/abs/2110.08321
作者:Han Xuanyuan,Francisco Vargas,Stephen Cummins 机构:Department of Computer Science and Technology, University of Cambridge 备注:8 pages, 2 figures 摘要:深度神经网络在计算机视觉和医学等多个领域有着广泛的应用。在许多情况下,模型在推理时的输入可能包含敏感的用户数据,这就提出了有关此类服务所保证的隐私和信任级别的问题。许多现有的工作已经利用同态加密(HE)方案,使加密数据的计算能够实现多层感知机和CNN的私有推理。沿着这一方向的早期工作是加密网,一次MNIST推断需要250秒。这种方法的主要局限性在于计算,这是由于构成HE运算的NTT(数论变换)运算代价高昂。其他人建议使用模型修剪和高效的数据表示来减少所需的HE操作数量。在本文中,我们通过对CNN推理过程中的中间张量表示进行修改来改进现有的工作。我们在MNIST和CIFAR-10数据集上构建和评估私有CNN,并将用于推断加密网体系结构的操作数量减少两倍以上。 摘要:Deep neural networks have a wide range of applications across multiple domains such as computer vision and medicine. In many cases, the input of a model at inference time can consist of sensitive user data, which raises questions concerning the levels of privacy and trust guaranteed by such services. Much existing work has leveraged homomorphic encryption (HE) schemes that enable computation on encrypted data to achieve private inference for multi-layer perceptrons and CNNs. An early work along this direction was CryptoNets, which takes 250 seconds for one MNIST inference. The main limitation of such approaches is that of compute, which is due to the costly nature of the NTT (number theoretic transform)operations that constitute HE operations. Others have proposed the use of model pruning and efficient data representations to reduce the number of HE operations required. In this paper, we focus on improving upon existing work by proposing changes to the representations of intermediate tensors during CNN inference. We construct and evaluate private CNNs on the MNIST and CIFAR-10 datasets, and achieve over a two-fold reduction in the number of operations used for inferences of the CryptoNets architecture.
【6】 Tree-based local explanations of machine learning model predictions, AraucanaXAI 标题:机器学习模型预测的基于树的局部解释 链接:https://arxiv.org/abs/2110.08272
作者:Enea Parimbelli,Giovanna Nicora,Szymon Wilk,Wojtek Michalowski,Riccardo Bellazzi 机构: Bellazzi 1[0000-000 2-697 4-9808] 1 University of Pavia, Italy 2 Poznan University of Technology, Poland 3 Telfer school of management, University of Ottawa, Department of De- 1 https 备注:XAI Healthcare workshop 2021, AIME 2021 摘要:越来越复杂的学习方法,如boosting、bagging和deep learning,使ML模型更加准确,但更难理解和解释。性能和可理解性之间的权衡是经常要面对的,尤其是在医学等高风险应用中。在本文中,我们提出了一种新的方法,用于生成通用ML模型预测的解释,给出了一个已进行预测的特定实例,该实例可以处理分类和回归任务。提出的XAI方法的优点包括提高了对原始模型的保真度、处理非线性决策边界的能力以及对分类和回归问题的本地支持 摘要:Increasingly complex learning methods such as boosting, bagging and deep learning have made ML models more accurate, but harder to understand and interpret. A tradeoff between performance and intelligibility is often to be faced, especially in high-stakes applications like medicine. In the present article we propose a novel methodological approach for generating explanations of the predictions of a generic ML model, given a specific instance for which the prediction has been made, that can tackle both classification and regression tasks. Advantages of the proposed XAI approach include improved fidelity to the original model, the ability to deal with non-linear decision boundaries, and native support to both classification and regression problems
【7】 Explainable Student Performance Prediction With Personalized Attention for Explaining Why A Student Fails 标题:用个性化注意解释学生失败原因的可解释学生表现预测 链接:https://arxiv.org/abs/2110.08268
作者:Kun Niu,Xipeng Cao,Yicong Yu 机构:School of Computer Science (National Pilot Software Engineering School), Beijing University of Posts and Telecommunications, Beijing 备注:AAAI 2021 Workshop on AI Education/TIPCE 2021 摘要:随着高等教育中学生不及格率的不断增加,预测学生下学期的表现已成为一项重要需求。个性化的学生表现预测有助于教育工作者全面了解学生状况,并提前进行有效干预。然而,现有的工作几乎没有考虑到学生绩效预测的可解释性,这是教育工作者最关心的问题。在本文中,我们提出了一种新的基于个性化注意的可解释学生成绩预测方法(ESPA),该方法利用学生档案中的关系和相关课程的先验知识。所设计的双向长短时记忆(BiLSTM)结构提取特定模式路径中的语义信息。对于利用相似路径的内部关系,提出了一种局部和全局层面的注意机制,以区分不同学生或课程对预测的影响。因此,有效的路径推理可以用来预测学生的表现。ESPA在学生成绩预测方面始终优于其他最先进的模型,其结果可以直观地解释。这项工作可以帮助教育工作者更好地理解行为对学生学习的不同影响。 摘要:As student failure rates continue to increase in higher education, predicting student performance in the following semester has become a significant demand. Personalized student performance prediction helps educators gain a comprehensive view of student status and effectively intervene in advance. However, existing works scarcely consider the explainability of student performance prediction, which educators are most concerned about. In this paper, we propose a novel Explainable Student performance prediction method with Personalized Attention (ESPA) by utilizing relationships in student profiles and prior knowledge of related courses. The designed Bidirectional Long Short-Term Memory (BiLSTM) architecture extracts the semantic information in the paths with specific patterns. As for leveraging similar paths' internal relations, a local and global-level attention mechanism is proposed to distinguish the influence of different students or courses for making predictions. Hence, valid reasoning on paths can be applied to predict the performance of students. The ESPA consistently outperforms the other state-of-the-art models for student performance prediction, and the results are intuitively explainable. This work can help educators better understand the different impacts of behavior on students' studies.
【8】 Valid and Exact Statistical Inference for Multi-dimensional Multiple Change-Points by Selective Inference 标题:基于选择性推理的多维多变点有效准确的统计推断 链接:https://arxiv.org/abs/2110.08989
作者:Ryota Sugiyama,Hiroki Toda,Vo Nguyen Le Duy,Yu Inatsu,Ichiro Takeuchi 机构:Nagoya Institute of Technology, RIKEN 摘要:本文研究多维序列中变化点的统计推断。在从多维序列进行CP检测时,通常不仅需要检测位置,还需要识别发生变化的组件子集。针对这类问题,已经提出了几种算法,但尚未建立有效的精确推理方法来评估检测位置和组件的统计可靠性。在本研究中,我们提出了一种方法,可以保证检测到的变化的位置和分量的统计可靠性。我们将该方法应用于基因组异常识别和人类行为分析问题,证明了该方法的有效性。 摘要:In this paper, we study statistical inference of change-points (CPs) in multi-dimensional sequence. In CP detection from a multi-dimensional sequence, it is often desirable not only to detect the location, but also to identify the subset of the components in which the change occurs. Several algorithms have been proposed for such problems, but no valid exact inference method has been established to evaluate the statistical reliability of the detected locations and components. In this study, we propose a method that can guarantee the statistical reliability of both the location and the components of the detected changes. We demonstrate the effectiveness of the proposed method by applying it to the problems of genomic abnormality identification and human behavior analysis.
【9】 Understanding the network formation pattern for better link prediction 标题:了解网络形成模式,以便更好地预测链路 链接:https://arxiv.org/abs/2110.08850
作者:Jiating Yu,Ling-Yun Wu 机构:IAM, MADIS, NCMIS, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing , China, School of Mathematical Sciences, University of Chinese Academy of Sciences 备注:21 pages, 3 figures, 18 tables, and 29 references 摘要:链路预测作为复杂网络领域中的一个经典问题,已经引起了研究者的广泛关注,它对于理解网络的演化和动态发展机制具有重要意义。虽然已经提出了各种特定于网络类型的算法来解决链路预测问题,但大多数算法都假设网络结构由三元闭包原理控制。我们仍然缺乏对预测潜在链路的网络形成模式的适应性和全面的理解。此外,研究如何更好地利用网络本地信息也很有价值。为此,我们提出了一种新方法,称为使用多阶局部信息(MOLI)的链路预测,该方法利用来自不同距离邻居的局部信息,参数可以是基于先验知识的先验驱动,或通过解决观测网络上的优化问题的数据驱动。MOLI通过图上的随机游动定义了局部网络扩散过程,从而更好地利用了网络信息。在11种不同类型的模拟和真实网络上,我们证明了MOLI优于其他11种广泛使用的链路预测算法。我们还得出结论,对于不同的网络,包括社会网络、通信网络、生物网络等,存在不同的局部信息利用模式。特别是,经典的基于公共邻居的算法并不像人们认为的那样适用于所有社会网络;相反,一些社交网络遵循四边形闭合原则,优先连接长度为3的路径。 摘要:As a classical problem in the field of complex networks, link prediction has attracted much attention from researchers, which is of great significance to help us understand the evolution and dynamic development mechanisms of networks. Although various network type-specific algorithms have been proposed to tackle the link prediction problem, most of them suppose that the network structure is dominated by the Triadic Closure Principle. We still lack an adaptive and comprehensive understanding of network formation patterns for predicting potential links. In addition, it is valuable to investigate how network local information can be better utilized. To this end, we proposed a novel method named Link prediction using Multiple Order Local Information (MOLI) that exploits the local information from the neighbors of different distances, with parameters that can be a prior-driven based on prior knowledge, or data-driven by solving an optimization problem on observed networks. MOLI defined a local network diffusion process via random walks on the graph, resulting in better use of network information. We show that MOLI outperforms the other 11 widely used link prediction algorithms on 11 different types of simulated and real-world networks. We also conclude that there are different patterns of local information utilization for different networks, including social networks, communication networks, biological networks, etc. In particular, the classical common neighbor-based algorithm is not as adaptable to all social networks as it is perceived to be; instead, some of the social networks obey the Quadrilateral Closure Principle which preferentially connects paths of length three.
检测相关(4篇)
【1】 Natural Attribute-based Shift Detection 标题:基于自然属性的移位检测 链接:https://arxiv.org/abs/2110.09276
作者:Jeonghoon Park,Jimin Hong,Radhika Dua,Daehoon Gwak,Yixuan Li,Jaegul Choo,Edward Choi 机构:KAIST, University of Wisconsin-Madison 摘要:尽管deep networks在视觉、语言和医疗保健方面的表现令人印象深刻,但来自不同于训练分布的样本的不可预测行为会在部署中造成严重问题。为了提高基于神经网络的分类器的可靠性,我们定义了一个新的任务,即基于自然属性的移位(NAS)检测,来检测被试年龄或图像亮度等自然属性从训练分布中移位的样本。利用现有数据集中存在的自然属性,我们介绍了用于NAS检测的视觉、语言和医学方面的基准数据集。此外,我们对NAS数据集上先前的代表性分布外(OOD)检测方法进行了广泛评估,并观察到它们的性能不一致。为了理解这一点,我们分析了NAS样本在特征空间中的位置与基于距离和置信度的OOD检测方法性能之间的关系。基于分析,我们将NAS样本分为三类,并进一步建议对训练目标进行简单修改,以获得能够检测所有NAS类别样本的改进的OOD检测方法。 摘要:Despite the impressive performance of deep networks in vision, language, and healthcare, unpredictable behaviors on samples from the distribution different than the training distribution cause severe problems in deployment. For better reliability of neural-network-based classifiers, we define a new task, natural attribute-based shift (NAS) detection, to detect the samples shifted from the training distribution by some natural attribute such as age of subjects or brightness of images. Using the natural attributes present in existing datasets, we introduce benchmark datasets in vision, language, and medical for NAS detection. Further, we conduct an extensive evaluation of prior representative out-of-distribution (OOD) detection methods on NAS datasets and observe an inconsistency in their performance. To understand this, we provide an analysis on the relationship between the location of NAS samples in the feature space and the performance of distance- and confidence-based OOD detection methods. Based on the analysis, we split NAS samples into three categories and further suggest a simple modification to the training objective to obtain an improved OOD detection method that is capable of detecting samples from all NAS categories.
【2】 Single Layer Predictive Normalized Maximum Likelihood for Out-of-Distribution Detection 标题:单层预测归一化最大似然失配检测算法 链接:https://arxiv.org/abs/2110.09246
作者:Koby Bibas,Meir Feder,Tal Hassner 机构:School of Electrical Engineering, Tel Aviv University, Facebook AI 备注:NeurIPS 2021 摘要:检测分布外(OOD)样本对于开发基于机器学习的关键安全系统模型至关重要。常用的OOD检测方法假设在训练期间可以访问一些在现实场景中可能不可用的OOD样本。相反,我们使用{\em预测归一化最大似然}(pNML)学习器,在该学习器中,对测试输入不做任何假设。我们推导了单层神经网络(NN)的pNML及其推广误差的显式表达式,表示为{\em}。我们证明,当(i)测试向量位于与训练数据的经验相关矩阵的大特征值相关的特征向量所跨越的子空间中,或者(ii)测试样本远离决策边界时,该学习者具有良好的泛化能力。此外,我们描述了如何通过在最后一层使用显式pNML,然后使用softmax函数,将导出的pNML遗憾有效地应用于任何预训练的深度NN。将导出的遗憾应用于深度神经网络既不需要额外的可调参数,也不需要额外的数据。我们使用经CIFAR-100、CIFAR-10、SVHN和ImageNet-30训练的DenseNet-100、ResNet-34和WideResNet-40模型,在74个OOD检测基准上对我们的方法进行了广泛评估,结果表明,与最近领先的方法相比,我们的方法有了高达15.6%的显著改进。 摘要:Detecting out-of-distribution (OOD) samples is vital for developing machine learning based models for critical safety systems. Common approaches for OOD detection assume access to some OOD samples during training which may not be available in a real-life scenario. Instead, we utilize the {\em predictive normalized maximum likelihood} (pNML) learner, in which no assumptions are made on the tested input. We derive an explicit expression of the pNML and its generalization error, denoted as the {\em regret}, for a single layer neural network (NN). We show that this learner generalizes well when (i) the test vector resides in a subspace spanned by the eigenvectors associated with the large eigenvalues of the empirical correlation matrix of the training data, or (ii) the test sample is far from the decision boundary. Furthermore, we describe how to efficiently apply the derived pNML regret to any pretrained deep NN, by employing the explicit pNML for the last layer, followed by the softmax function. Applying the derived regret to deep NN requires neither additional tunable parameters nor extra data. We extensively evaluate our approach on 74 OOD detection benchmarks using DenseNet-100, ResNet-34, and WideResNet-40 models trained with CIFAR-100, CIFAR-10, SVHN, and ImageNet-30 showing a significant improvement of up to 15.6\% over recent leading methods.
【3】 An LSTM-based Plagiarism Detection via Attention Mechanism and a Population-based Approach for Pre-Training Parameters with imbalanced Classes 标题:基于LSTM的注意机制抄袭检测和基于群体的不平衡类预训练参数方法 链接:https://arxiv.org/abs/2110.08771
作者:Seyed Vahid Moravvej,Seyed Jalaleddin Mousavirad,Mahshid Helali Moghadam,Mehrdad Saadatmand 机构:Department of Computer Engineering, Isfahan University of Technology, Isfahan, Iran, Department of Computer Engineering, Hakim Sabzevari Univesity, Sabzevar, Iran, RISE Research Institutes of Sweden, Sweden, Mälardalen University, Västerás, Sweden 备注:12 pages, The 28th International Conference on Neural Information Processing (ICONIP2021), BALI, Indonesia 摘要:剽窃是学术和工业环境中的主要问题之一,其目标是在典型的文档或源代码中找到类似的项目。本文提出了一种基于长短时记忆(LSTM)和注意机制的结构,称为LSTM-AM-ABC,该结构由基于人口的参数初始化方法推动。基于梯度的优化算法,如反向传播(BP)在LSTM、注意机制和前馈神经网络的学习过程中得到了广泛的应用,但也存在一些问题,如陷入局部最优。为了解决这个问题,可以使用基于群体的元启发式(PBMH)算法。为此,本文采用了PBMH算法,即人工蜂群(ABC)来缓和问题。我们提出的算法可以同时在所有LSTM、注意机制和前馈神经网络中找到模型学习的初始值。换句话说,ABC算法为启动BP算法找到了一个有希望的点。为了评估,我们将我们提出的算法与传统方法和基于群体的方法进行了比较。结果清楚地表明,所提出的方法可以提供有竞争力的性能。 摘要:Plagiarism is one of the leading problems in academic and industrial environments, which its goal is to find the similar items in a typical document or source code. This paper proposes an architecture based on a Long Short-Term Memory (LSTM) and attention mechanism called LSTM-AM-ABC boosted by a population-based approach for parameter initialization. Gradient-based optimization algorithms such as back-propagation (BP) are widely used in the literature for learning process in LSTM, attention mechanism, and feed-forward neural network, while they suffer from some problems such as getting stuck in local optima. To tackle this problem, population-based metaheuristic (PBMH) algorithms can be used. To this end, this paper employs a PBMH algorithm, artificial bee colony (ABC), to moderate the problem. Our proposed algorithm can find the initial values for model learning in all LSTM, attention mechanism, and feed-forward neural network, simultaneously. In other words, ABC algorithm finds a promising point for starting BP algorithm. For evaluation, we compare our proposed algorithm with both conventional and population-based methods. The results clearly show that the proposed method can provide competitive performance.
【4】 Deep learning-based detection of intravenous contrast in computed tomography scans 标题:基于深度学习的CT扫描静脉造影剂检测 链接:https://arxiv.org/abs/2110.08424
作者:Zezhong Ye,Jack M. Qian,Ahmed Hosny,Roman Zeleznik,Deborah Plana,Jirapat Likitlersuang,Zhongyi Zhang,Raymond H. Mak,Hugo J. W. L. Aerts,Benjamin H. Kann 机构:H. Kann,, Affiliations:, . Artificial Intelligence in Medicine (AIM) Program, Mass General Brigham, Harvard Medical, . Department of Radiation Oncology, Dana-Farber Cancer Institute and Brigham and Women’s, Hospital, Harvard Medical School, Boston, MA, USA 摘要:目的:识别静脉注射(IV)CT扫描中的对比度使用是模型开发和测试数据整理的一个关键组成部分。目前,IV对比度在成像元数据中的记录很差,需要临床医生专家手动校正和注释,这是成像分析和算法部署的一个主要障碍。我们试图开发和验证基于进化神经网络(CNN)的深度学习(DL)平台,用于识别CT扫描中的IV对比度。方法:在模型开发和评估中,我们使用了头颈部(HN)CT扫描的独立数据集和肺癌患者,共133480张来自1979年CT扫描的轴向2D扫描切片,由临床专家手动注释对比度。采用五种不同的DL模型,并在用于切片水平对比度检测的HN训练数据集中进行训练。在保持集和独立验证集上对模型性能进行评估随后,DL模型在胸部CT数据上进行了微调,并在单独的胸部CT数据集上进行了外部验证。结果:1496次扫描中,静脉造影的初始DICOM元数据标签缺失或错误(75.6%)。基于EfficientNetB4的模型显示了最佳的整体检测性能。对于HN扫描,内部验证集的AUC为0.996(n=216),外部验证集的AUC为1.0(n=595)。胸部CT的微调模型得出内部验证集的AUC:1.0(n=53),外部验证集的AUC:0.980结论:DL模型能够准确检测HN和胸部CT扫描中的IV对比度,性能近乎完美。 摘要:Purpose: Identifying intravenous (IV) contrast use within CT scans is a key component of data curation for model development and testing. Currently, IV contrast is poorly documented in imaging metadata and necessitates manual correction and annotation by clinician experts, presenting a major barrier to imaging analyses and algorithm deployment. We sought to develop and validate a convolutional neural network (CNN)-based deep learning (DL) platform to identify IV contrast within CT scans. Methods: For model development and evaluation, we used independent datasets of CT scans of head, neck (HN) and lung cancer patients, totaling 133,480 axial 2D scan slices from 1,979 CT scans manually annotated for contrast presence by clinical experts. Five different DL models were adopted and trained in HN training datasets for slice-level contrast detection. Model performances were evaluated on a hold-out set and on an independent validation set from another institution. DL models was then fine-tuned on chest CT data and externally validated on a separate chest CT dataset. Results: Initial DICOM metadata tags for IV contrast were missing or erroneous in 1,496 scans (75.6%). The EfficientNetB4-based model showed the best overall detection performance. For HN scans, AUC was 0.996 in the internal validation set (n = 216) and 1.0 in the external validation set (n = 595). The fine-tuned model on chest CTs yielded an AUC: 1.0 for the internal validation set (n = 53), and AUC: 0.980 for the external validation set (n = 402). Conclusion: The DL model could accurately detect IV contrast in both HN and chest CT scans with near-perfect performance.
分类|识别(9篇)
【1】 Learning Optimal Conformal Classifiers 标题:学习最优共形分类器 链接:https://arxiv.org/abs/2110.09192
作者:David Stutz,Krishnamurthy,Dvijotham,Ali Taylan Cemgil,Arnaud Doucet 机构: DeepMind, Max Planck Institute for Informatics, Saarland Informatics Campus 摘要:现代基于深度学习的分类器在测试数据上显示出非常高的准确性,但这并不能为安全部署提供足够的保证,特别是在医疗诊断等高风险AI应用中。通常,预测是在没有可靠的不确定性估计或正式保证的情况下获得的。共形预测(CP)通过使用分类器的概率估计以用户指定的概率预测包含真实类的置信集来解决这些问题。然而,在训练后使用CP作为单独的处理步骤会阻止基础模型适应置信集的预测。因此,本文探讨了在训练过程中通过CP进行区分的策略,目标是使用保形包装器端到端构建训练模型。在我们的保形训练(ConfTr)方法中,我们在训练期间专门“模拟”小批量的保形训练。我们表明,CT通过减少平均置信集大小(效率低下)优于最先进的CP分类方法。此外,它允许“塑造”在测试时预测的置信集,这对于标准CP来说是困难的。在多个数据集的实验中,我们表明,在保留CP提供的保证的同时,ConfTr可以影响无效率在类之间的分布,或根据所包含的类指导置信集的组成。 摘要:Modern deep learning based classifiers show very high accuracy on test data but this does not provide sufficient guarantees for safe deployment, especially in high-stake AI applications such as medical diagnosis. Usually, predictions are obtained without a reliable uncertainty estimate or a formal guarantee. Conformal prediction (CP) addresses these issues by using the classifier's probability estimates to predict confidence sets containing the true class with a user-specified probability. However, using CP as a separate processing step after training prevents the underlying model from adapting to the prediction of confidence sets. Thus, this paper explores strategies to differentiate through CP during training with the goal of training model with the conformal wrapper end-to-end. In our approach, conformal training (ConfTr), we specifically "simulate" conformalization on mini-batches during training. We show that CT outperforms state-of-the-art CP methods for classification by reducing the average confidence set size (inefficiency). Moreover, it allows to "shape" the confidence sets predicted at test time, which is difficult for standard CP. On experiments with several datasets, we show ConfTr can influence how inefficiency is distributed across classes, or guide the composition of confidence sets in terms of the included classes, while retaining the guarantees offered by CP.
【2】 Domain Generalisation for Apparent Emotional Facial Expression Recognition across Age-Groups 标题:跨年龄组的表观情绪面部表情识别的领域泛化 链接:https://arxiv.org/abs/2110.09168
作者:Rafael Poyiadzi,Jie Shen,Stavros Petridis,Yujiang Wang,Maja Pantic 机构:University of Bristol, UK, Facebook AI Applied Research, UK, Imperial College London, UK 摘要:表面情感表情识别近年来引起了人们的广泛关注。然而,大多数方法忽略了年龄差异,并针对所有年龄段训练通用模型。在这项工作中,我们研究了使用不同年龄组训练明显情绪面部表情识别模型的效果。为此,我们从不同年龄组的面部图像中,在明显的情绪面部表情识别的背景下研究领域概括。我们首先比较了几种基于域外泛化的域泛化算法,发现类条件域对抗性神经网络(CDANN)算法具有最好的性能。然后,我们研究了训练过程中使用的年龄组的种类和数量对概括为不可见年龄组的影响,并观察到训练年龄组数量的增加往往会增加不可见年龄组的表面情绪面部表情识别表现。我们还表明,在训练过程中排除某一年龄组往往会对邻近年龄组的表现产生更大的影响。 摘要:Apparent emotional facial expression recognition has attracted a lot of research attention recently. However, the majority of approaches ignore age differences and train a generic model for all ages. In this work, we study the effect of using different age-groups for training apparent emotional facial expression recognition models. To this end, we study Domain Generalisation in the context of apparent emotional facial expression recognition from facial imagery across different age groups. We first compare several domain generalisation algorithms on the basis of out-of-domain-generalisation, and observe that the Class-Conditional Domain-Adversarial Neural Networks (CDANN) algorithm has the best performance. We then study the effect of variety and number of age-groups used during training on generalisation to unseen age-groups and observe that an increase in the number of training age-groups tends to increase the apparent emotional facial expression recognition performance on unseen age-groups. We also show that exclusion of an age-group during training tends to affect more the performance of the neighbouring age groups.
【3】 Online Sign Identification: Minimization of the Number of Errors in Thresholding Bandits 标题:在线标牌识别:最小化阈值分割中的错误数目 链接:https://arxiv.org/abs/2110.09133
作者:Reda Ouhamma,Rémy Degenne,Pierre Gaillard,Vianney Perchet 机构:Univ. Lille, Inria, CNRS, Centrale Lille, UMR , CRIStAL, F-, Lille, France, Univ. Grenoble Alpes, Inria, CNRS, Grenoble INP, LJK, Grenoble, France, Crest, Ensae & Criteo AI Lab 备注:10+15 pages. To be published in the proceedings of NeurIPS 2021 摘要:在固定预算阈值bandit问题中,算法顺序地将预算数量的样本分配给不同的分布。然后预测每个分布的平均值是大于还是小于给定阈值。我们介绍了一大系列算法(包含大多数现有的相关算法),其灵感来自Frank Wolfe算法,并对其性能进行了全面而通用的分析。这使我们能够为一大类问题构造新的显式算法,这些问题的损失在非自适应oracle问题的一个小常数因子内。非常有趣的是,我们观察到适应性方法在经验上大大超过了非适应性预言,这在标准的在线学习环境中是一种不常见的行为,如后悔最小化。我们用一个有见地的玩具问题来解释这个令人惊讶的现象。 摘要:In the fixed budget thresholding bandit problem, an algorithm sequentially allocates a budgeted number of samples to different distributions. It then predicts whether the mean of each distribution is larger or lower than a given threshold. We introduce a large family of algorithms (containing most existing relevant ones), inspired by the Frank-Wolfe algorithm, and provide a thorough yet generic analysis of their performance. This allowed us to construct new explicit algorithms, for a broad class of problems, whose losses are within a small constant factor of the non-adaptive oracle ones. Quite interestingly, we observed that adaptive methods empirically greatly out-perform non-adaptive oracles, an uncommon behavior in standard online learning settings, such as regret minimization. We explain this surprising phenomenon on an insightful toy problem.
【4】 Hand Gesture Recognition Using Temporal Convolutions and Attention Mechanism 标题:基于时间卷积和注意机制的手势识别 链接:https://arxiv.org/abs/2110.08717
作者:Elahe Rahimian,Soheil Zabihi,Amir Asif,Dario Farina,S. Farokh Atashzar,Arash Mohammadi 机构:†Concordia Institute for Information System Engineering, Concordia University, Montreal, QC, Canada, ‡Electrical and Computer Engineering, Concordia University, Montreal, QC, Canada, ††Department of Bioengineering, Imperial College London, London, UK 摘要:生物信号处理和机器学习的进步,特别是深度神经网络(DNN),为开发用于解码人类意图和控制假肢的创新人机界面铺平了道路。相对于其他解码肌肉电活动的算法,DNN模型显示出了有希望的结果,尤其是在手势识别方面。然而,此类数据驱动模型因其对大量可训练参数的需求及其结构复杂性而受到挑战。在这里,我们提出了新的基于时间卷积的手势识别体系结构(TC-HGR),以减少这种计算负担。通过这种方法,我们通过注意机制和时间卷积,通过表面肌电图(sEMG)信号对17种手势进行分类。该方法对300ms和200ms窗口的分类准确率分别为81.65%和80.72%。训练建议的TC-HGR体系结构的参数数量比其最先进的对应结构少11.9倍。 摘要:Advances in biosignal signal processing and machine learning, in particular Deep Neural Networks (DNNs), have paved the way for the development of innovative Human-Machine Interfaces for decoding the human intent and controlling artificial limbs. DNN models have shown promising results with respect to other algorithms for decoding muscle electrical activity, especially for recognition of hand gestures. Such data-driven models, however, have been challenged by their need for a large number of trainable parameters and their structural complexity. Here we propose the novel Temporal Convolutions-based Hand Gesture Recognition architecture (TC-HGR) to reduce this computational burden. With this approach, we classified 17 hand gestures via surface Electromyogram (sEMG) signals by the adoption of attention mechanisms and temporal convolutions. The proposed method led to 81.65% and 80.72% classification accuracy for window sizes of 300ms and 200ms, respectively. The number of parameters to train the proposed TC-HGR architecture is 11.9 times less than that of its state-of-the-art counterpart.
【5】 On the Pareto Frontier of Regret Minimization and Best Arm Identification in Stochastic Bandits 标题:随机土匪中遗憾最小化的Pareto前沿与最佳臂识别 链接:https://arxiv.org/abs/2110.08627
作者:Zixin Zhong,Wang Chi Cheung,Vincent Y. F. Tan 机构: 1Department of Mathematics, National University of Singapore, Singapore 2Department of Electrical and Computer Engineering, Singapore 3Department of In-dustrial Systems and Management, National University of Singa-pore 备注:27 pages, 8 figures 摘要:我们研究了随机土匪中两个原型目标的帕累托边界,即具有固定视界的后悔最小化(RM)和最佳手臂识别(BAI)。民间传说,开发和勘探之间的平衡对RM和BAI都至关重要,但勘探对于实现后一个目标的最佳绩效更为关键。为了使这一点更加精确,我们首先设计并分析了BoBW-lil'UCB$({\gamma})$算法,该算法在${\gamma}$的不同值下实现RM或BAI的顺序最优性能。作为补充,我们证明,对于RM和BAI目标,没有任何算法可以同时实现最佳性能。更准确地说,我们建立了具有给定BAI失效概率的任何算法可实现的遗憾的非平凡下界。这一分析表明,在某些制度下,BoBW-lil'UCB$({\gamma})$达到了帕累托最优,直至常数或小项。数值实验进一步证明,当应用于困难情况时,BoBW-lil'UCB的表现优于竞争对手UCB${\alpha}$(Degenne et al.,2019),后者是为RM和BAI设计的,具有固定置信度。 摘要:We study the Pareto frontier of two archetypal objectives in stochastic bandits, namely, regret minimization (RM) and best arm identification (BAI) with a fixed horizon. It is folklore that the balance between exploitation and exploration is crucial for both RM and BAI, but exploration is more critical in achieving the optimal performance for the latter objective. To make this precise, we first design and analyze the BoBW-lil'UCB$({\gamma})$ algorithm, which achieves order-wise optimal performance for RM or BAI under different values of ${\gamma}$. Complementarily, we show that no algorithm can simultaneously perform optimally for both the RM and BAI objectives. More precisely, we establish non-trivial lower bounds on the regret achievable by any algorithm with a given BAI failure probability. This analysis shows that in some regimes BoBW-lil'UCB$({\gamma})$ achieves Pareto-optimality up to constant or small terms. Numerical experiments further demonstrate that when applied to difficult instances, BoBW-lil'UCB outperforms a close competitor UCB$_{\alpha}$ (Degenne et al., 2019), which is designed for RM and BAI with a fixed confidence.
【6】 Mapping illegal waste dumping sites with neural-network classification of satellite imagery 标题:基于卫星影像神经网络分类的非法垃圾场制图 链接:https://arxiv.org/abs/2110.08599
作者:Devesa,Maria Roberta,Vazquez Brust,H. Antonio 机构:Dymaxion Labs, Buenos Aires, Argentina, Fundación Bunge y Born 备注:5 pages, 3 figures, KDD Workshop on Data-driven Humanitarian Mapping held with the 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, August 14, 2021 摘要:公共卫生和人居质量是城市规划的关键目标。近年来,非法垃圾倾倒场的严重社会和环境影响使其成为全球南方城市面临的最严重问题之一,因为决策所需的信息非常匮乏。为了帮助确定倾倒地点的位置并跟踪其随时间的演变,我们采用了来自机器学习领域的数据驱动模型,分析卫星图像。这使我们能够利用地理空间开放数据、高分辨率卫星图像和开放源代码工具的日益增加的可用性,利用布宜诺斯艾利斯的一小部分已知垃圾倾倒场来训练机器学习算法,然后以高速和低成本预测其他地点在广大地区的位置。本案例研究展示了Dymaxion实验室与Fundaci掼on Bunge y Born合作利用这一技术的结果,以创建该地区非法垃圾倾倒场潜在位置的综合地图。 摘要:Public health and habitat quality are crucial goals of urban planning. In recent years, the severe social and environmental impact of illegal waste dumping sites has made them one of the most serious problems faced by cities in the Global South, in a context of scarce information available for decision making. To help identify the location of dumping sites and track their evolution over time we adopt a data-driven model from the machine learning domain, analyzing satellite images. This allows us to take advantage of the increasing availability of geo-spatial open-data, high-resolution satellite imagery, and open source tools to train machine learning algorithms with a small set of known waste dumping sites in Buenos Aires, and then predict the location of other sites over vast areas at high speed and low cost. This case study shows the results of a collaboration between Dymaxion Labs and Fundaci\'on Bunge y Born to harness this technique in order to create a comprehensive map of potential locations of illegal waste dumping sites in the region.
【7】 Comparing Human and Machine Bias in Face Recognition 标题:人脸识别中人与机器偏差的比较 链接:https://arxiv.org/abs/2110.08396
作者:Samuel Dooley,Ryan Downing,George Wei,Nathan Shankar,Bradon Thymes,Gudrun Thorkelsdottir,Tiye Kurtz-Miott,Rachel Mattson,Olufemi Obiwumi,Valeriia Cherepanova,Micah Goldblum,John P Dickerson,Tom Goldstein 机构:University of Maryland, University of Massachusetts, Amherst, Pomona College, Howard University, University of California, San Diego, University of Georgia, Haverford College 摘要:最近的许多研究发现并讨论了面部分析技术中存在的严重偏差问题,发现了基于感知性别、皮肤类型、照明条件的人群之间的表现差异,这些审计在测量算法偏差方面非常重要和成功,但有两个主要挑战:审计(1)使用缺乏高质量元数据的面部识别数据集,如LFW和CelebA,以及(2)不将观察到的算法偏差与其人类备选方案的偏差进行比较。在本文中,我们对LFW和CelebA数据集进行了改进,这将使未来的研究人员能够获得不受数据集中主要缺陷影响的算法偏差测量值(例如,画廊和测试集中出现的相同图像)。我们还利用这些新数据开发了一系列具有挑战性的面部识别和验证问题,并对各种算法和大量平衡的人类评论者样本进行了验证。我们发现,计算机模型和人类调查参与者在验证任务中的表现显著更好,黑皮肤或女性受试者在这两项任务中的准确率通常较低,并且当他们的人口统计数据与问题匹配时,准确率较高。在这两项任务上,观察到计算机模型比调查参与者的准确度更高,并且显示出与人类调查参与者相似的偏差程度。 摘要:Much recent research has uncovered and discussed serious concerns of bias in facial analysis technologies, finding performance disparities between groups of people based on perceived gender, skin type, lighting condition, etc. These audits are immensely important and successful at measuring algorithmic bias but have two major challenges: the audits (1) use facial recognition datasets which lack quality metadata, like LFW and CelebA, and (2) do not compare their observed algorithmic bias to the biases of their human alternatives. In this paper, we release improvements to the LFW and CelebA datasets which will enable future researchers to obtain measurements of algorithmic bias that are not tainted by major flaws in the dataset (e.g. identical images appearing in both the gallery and test set). We also use these new data to develop a series of challenging facial identification and verification questions that we administered to various algorithms and a large, balanced sample of human reviewers. We find that both computer models and human survey participants perform significantly better at the verification task, generally obtain lower accuracy rates on dark-skinned or female subjects for both tasks, and obtain higher accuracy rates when their demographics match that of the question. Computer models are observed to achieve a higher level of accuracy than the survey participants on both tasks and exhibit bias to similar degrees as the human survey participants.
【8】 A Neural Network Ensemble Approach to System Identification 标题:一种用于系统辨识的神经网络集成方法 链接:https://arxiv.org/abs/2110.08382
作者:Elisa Negrini,Giovanna Citti,Luca Capogna 机构: Worcester Polytechnic Institute, 100 Institute Road, USA 2Department of Mathematics, University of Bologna 摘要:我们提出了一种新的算法学习未知控制方程的轨迹数据,使用和集成的神经网络。给定未知动力系统$\dot{x}(t)=f(t,x(t))$的解样本$x(t)$,我们使用神经网络集合来近似函数$f$。我们以积分形式表示方程,并使用Euler方法预测每个连续时间步的解,在每次迭代中使用不同的神经网络作为$f$的先验。这个过程产生M-1时间独立网络,其中M是观测到$x(t)$的时间步数。最后,我们通过神经网络插值得到单个函数$f(t,x(t))$。与我们以前的工作不同,我们数值计算了数据的导数,并将其作为Lipschitz正则化神经网络的目标,以接近$f$,我们的新方法避免了数值微分,这在噪声存在时是不稳定的。我们在数据中有噪声和无噪声的多个例子上测试了新算法。我们的经验表明,通过在我们的损失函数中添加Lipschitz正则项,控制方程的泛化和恢复得到改善,并且该方法改进了我们以前的方法,特别是在存在噪声的情况下,当数值微分提供低质量的目标数据时。最后,我们将我们的结果与Raissi等人提出的方法进行比较。arXiv:1801.01236(2018)和SINDy。 摘要:We present a new algorithm for learning unknown governing equations from trajectory data, using and ensemble of neural networks. Given samples of solutions $x(t)$ to an unknown dynamical system $\dot{x}(t)=f(t,x(t))$, we approximate the function $f$ using an ensemble of neural networks. We express the equation in integral form and use Euler method to predict the solution at every successive time step using at each iteration a different neural network as a prior for $f$. This procedure yields M-1 time-independent networks, where M is the number of time steps at which $x(t)$ is observed. Finally, we obtain a single function $f(t,x(t))$ by neural network interpolation. Unlike our earlier work, where we numerically computed the derivatives of data, and used them as target in a Lipschitz regularized neural network to approximate $f$, our new method avoids numerical differentiations, which are unstable in presence of noise. We test the new algorithm on multiple examples both with and without noise in the data. We empirically show that generalization and recovery of the governing equation improve by adding a Lipschitz regularization term in our loss function and that this method improves our previous one especially in presence of noise, when numerical differentiation provides low quality target data. Finally, we compare our results with the method proposed by Raissi, et al. arXiv:1801.01236 (2018) and with SINDy.
【9】 Deep Learning Based EDM Subgenre Classification using Mel-Spectrogram and Tempogram Features 标题:基于深度学习的MEL谱图和时间图特征在EDM亚类分类中的应用 链接:https://arxiv.org/abs/2110.08862
作者:Wei-Han Hsu,Bo-Yu Chen,Yi-Hsuan Yang 机构:Research Center for IT Innovation, Academia Sinica, Taipei, Taiwan 摘要:随着音乐技术的发展,近年来出现了大量的电子舞蹈音乐(EDM)风格或“亚流派”。虽然区分EDM和非EDM的分类任务经常在音乐体裁分类的背景下进行研究,但对更具挑战性的EDM子体裁分类的研究却很少。最先进的模型基于高度随机的树,可以通过深度学习方法加以改进。在本文中,我们将最先进的音乐自动标记模型“短块CNN+Resnet”扩展到EDM子体裁分类,并添加了两种中层节奏相关特征表示,称为傅立叶节奏图和自相关节奏图。并且,我们探索了两种融合策略,早期融合和晚期融合,以聚合这两种类型的温度图。我们使用一个包含75000首歌曲的大型数据集对所提出的模型进行了评估,该数据集针对30种不同的EDM子体裁,并表明采用深度学习模型和节奏特征确实可以提高分类精度。 摘要:Along with the evolution of music technology, a large number of styles, or "subgenres," of Electronic Dance Music(EDM) have emerged in recent years. While the classification task of distinguishing between EDM and non-EDM has been often studied in the context of music genre classification, little work has been done on the more challenging EDM subgenre classification. The state-of-art model is based on extremely randomized trees and could be improved by deep learning methods. In this paper, we extend the state-of-art music auto-tagging model "short-chunkCNN+Resnet" to EDM subgenre classification, with the addition of two mid-level tempo-related feature representations, called the Fourier tempogram and autocorrelation tempogram. And, we explore two fusion strategies, early fusion and late fusion, to aggregate the two types of tempograms. We evaluate the proposed models using a large dataset consisting of 75,000 songs for 30 different EDM subgenres, and show that the adoption of deep learning models and tempo features indeed leads to higher classification accuracy.
表征(2篇)
【1】 Growing Representation Learning 标题:成长型表征学习 链接:https://arxiv.org/abs/2110.08857
作者:Ryan King,Bobak Mortazavi 备注:8 pages, 5 figures 摘要:由于机器学习能够学习越来越复杂的任务,因此它继续受到欢迎。然而,对于许多监督模型,数据分布的变化或新事件的出现可能会导致模型性能的严重下降。根据对组织或系统施加的约束,使用更新的数据从头开始重新训练模型可能需要大量资源,也可能不可能。持续学习方法试图使模型适应新课程,而不是再训练。然而,这些方法中的许多都没有针对新类的检测方法,也没有对类的分布进行假设。在本文中,我们开发了一种基于注意的高斯混合,称为GMAT,它可以学习有或没有标签的数据的可解释表示。我们将该方法与现有的神经结构搜索技术结合起来,开发了一种算法,通过训练一个不断增长的神经网络的迭代过程来检测新事件,以获得最佳数量的表示。我们证明了我们的方法能够在没有标签或标签分布假设的情况下学习数据的新表示。此外,我们还开发了一种方法,使我们的模型能够利用标签更准确地开发表示。最后,我们证明了我们的方法可以通过重放学习表示的样本来避免灾难性遗忘。 摘要:Machine learning continues to grow in popularity due to its ability to learn increasingly complex tasks. However, for many supervised models, the shift in a data distribution or the appearance of a new event can result in a severe decrease in model performance. Retraining a model from scratch with updated data can be resource intensive or impossible depending on the constraints placed on an organization or system. Continual learning methods attempt to adapt models to new classes instead of retraining. However, many of these methods do not have a detection method for new classes or make assumptions about the distribution of classes. In this paper, we develop an attention based Gaussian Mixture, called GMAT, that learns interpretable representations of data with or without labels. We incorporate this method with existing Neural Architecture Search techniques to develop an algorithm for detection new events for an optimal number of representations through an iterative process of training a growing. We show that our method is capable learning new representations of data without labels or assumptions about the distributions of labels. We additionally develop a method that allows our model to utilize labels to more accurately develop representations. Lastly, we show that our method can avoid catastrophic forgetting by replaying samples from learned representations.
【2】 Virtual Augmentation Supported Contrastive Learning of Sentence Representations 标题:虚拟增强支持的句子表征对比学习 链接:https://arxiv.org/abs/2110.08552
作者:Dejiao Zhang,Wei Xiao,Henghui Zhu,Xiaofei Ma,Andrew O. Arnold 机构:AWS AI Labs, New York 备注:8 pages, 3 figures, 3 tables 摘要:尽管取得了巨大的成功,对比表征学习仍然依赖于使用特定领域知识精心设计的数据扩充。在自然语言处理中,由于自然语言的离散性,不存在数据扩充的一般规则,这一挑战被放大。我们通过提出一个虚拟增强支持的句子表征对比学习(VaSCL)来应对这一挑战。基于数据增强本质上构建每个训练实例的邻域的解释,我们反过来利用邻域生成有效的数据增强。利用对比学习的大训练批量,我们通过实例在表示空间中的K-最近邻来近似实例的邻域。然后,我们在该邻域内定义实例识别任务,并以对抗性训练方式生成虚拟增强。我们评估了VaSCL在一系列下游任务上的表现,并为无监督的句子表征学习开创了新的技术水平。 摘要:Despite profound successes, contrastive representation learning relies on carefully designed data augmentations using domain specific knowledge. This challenge is magnified in natural language processing where no general rules exist for data augmentation due to the discrete nature of natural language. We tackle this challenge by presenting a Virtual augmentation Supported Contrastive Learning of sentence representations (VaSCL). Originating from the interpretation that data augmentation essentially constructs the neighborhoods of each training instance, we in turn utilize the neighborhood to generate effective data augmentations. Leveraging the large training batch size of contrastive learning, we approximate the neighborhood of an instance via its K-nearest in-batch neighbors in the representation space. We then define an instance discrimination task within this neighborhood, and generate the virtual augmentation in an adversarial training manner. We access the performance of VaSCL on a wide range of downstream tasks, and set a new state-of-the-art for unsupervised sentence representation learning.
编码器(1篇)
【1】 Mesh Convolutional Autoencoder for Semi-Regular Meshes of Different Sizes 标题:适用于不同大小半规则网格的网格卷积自动编码器 链接:https://arxiv.org/abs/2110.09401
作者:Sara Hahner,Jochen Garcle 机构:Fraunhofer Center for Machine Learning and SCAI, Jochen Garcke, University of Bonn 摘要:由于低维嵌入可用于可视化底层动力学,因此自动编码器可加速变形三维曲面网格的分析。但是,最先进的网格卷积自动编码器需要由自动编码器处理的所有输入网格的固定连接。这是由于使用了频谱卷积层或依赖于网格的池操作。因此,一个人可以研究的数据集类型是有限的,所学的知识不能转移到表现出类似行为的其他数据集。为了解决这个问题,我们将曲面的离散化转化为具有局部规则连通性且网格划分为层次的半规则网格。这允许我们将相同的空间卷积滤波器应用于局部邻域,并定义可应用于每个半规则网格的池算子。我们将相同的网格自动编码器应用于不同的数据集,我们的重建误差比最先进的模型的误差低50%以上,这些模型必须分别针对每个网格进行训练。此外,我们通过在不同类别的网格上训练的自动编码器来可视化看不见的网格序列的基本动力学。 摘要:The analysis of deforming 3D surface meshes is accelerated by autoencoders since the low-dimensional embeddings can be used to visualize underlying dynamics. But, state-of-the-art mesh convolutional autoencoders require a fixed connectivity of all input meshes handled by the autoencoder. This is due to either the use of spectral convolutional layers or mesh dependent pooling operations. Therefore, the types of datasets that one can study are limited and the learned knowledge cannot be transferred to other datasets that exhibit similar behavior. To address this, we transform the discretization of the surfaces to semi-regular meshes that have a locally regular connectivity and whose meshing is hierarchical. This allows us to apply the same spatial convolutional filters to the local neighborhoods and to define a pooling operator that can be applied to every semi-regular mesh. We apply the same mesh autoencoder to different datasets and our reconstruction error is more than 50% lower than the error from state-of-the-art models, which have to be trained for every mesh separately. Additionally, we visualize the underlying dynamics of unseen mesh sequences with an autoencoder trained on different classes of meshes.
优化|敛散性(8篇)
【1】 Optimistic Policy Optimization is Provably Efficient in Non-stationary MDPs 标题:乐观策略优化在非平稳MDP中被证明是有效的 链接:https://arxiv.org/abs/2110.08984
作者:Han Zhong,Zhuoran Yang,Zhaoran Wang Csaba Szepesvári 机构:Zhaoran Wang‡, Csaba Szepesv´ari§ 摘要:我们研究了非平稳线性核马尔可夫决策过程(MDP)中的情景强化学习(RL)。在该设置中,奖励函数和过渡内核相对于给定的特征映射都是线性的,并且允许随时间变化,只要它们各自的参数变化不超过特定的变化预算。我们提出了$\underline{\text{p}}$周期性$\underline{\text{r}}$estarted$\underline{\text{o}}$optimistic$\underline{\text{p}}$olicy$\underline{\text{o}$优化算法(PROPO),这是一种具有线性函数近似的乐观策略优化算法。PROPO具有两种机制:基于滑动窗口的策略评估和基于定期重启的策略改进,这两种机制是为非平稳环境中的策略优化而定制的。此外,我们仅利用滑动窗口技术,提出了一种值迭代算法。我们建立了所提方法的动态上界和一个匹配的极大极小下界,该下界显示了所提方法的(近)最优性。据我们所知,PROPO是第一个可证明有效的处理非平稳性的策略优化算法。 摘要:We study episodic reinforcement learning (RL) in non-stationary linear kernel Markov decision processes (MDPs). In this setting, both the reward function and the transition kernel are linear with respect to the given feature maps and are allowed to vary over time, as long as their respective parameter variations do not exceed certain variation budgets. We propose the $\underline{\text{p}}$eriodically $\underline{\text{r}}$estarted $\underline{\text{o}}$ptimistic $\underline{\text{p}}$olicy $\underline{\text{o}}$ptimization algorithm (PROPO), which is an optimistic policy optimization algorithm with linear function approximation. PROPO features two mechanisms: sliding-window-based policy evaluation and periodic-restart-based policy improvement, which are tailored for policy optimization in a non-stationary environment. In addition, only utilizing the technique of sliding window, we propose a value-iteration algorithm. We establish dynamic upper bounds for the proposed methods and a matching minimax lower bound which shows the (near-) optimality of the proposed methods. To our best knowledge, PROPO is the first provably efficient policy optimization algorithm that handles non-stationarity.
【2】 Distributed Optimization using Heterogeneous Compute Systems 标题:使用异构计算系统的分布式优化 链接:https://arxiv.org/abs/2110.08941
作者:Vineeth S 机构:Division of Electrical, Electronics, and Computer Sciences, Indian Institute of Science, Bangalore, India 摘要:近年来,硬件计算能力以前所未有的速度增长。在学术界和工业界,利用这些进步在以较少的时间产生更好的结果方面发挥着关键作用。然而,将现有硬件与同一生态系统中的最新硬件合并是一项具有挑战性的任务。在这种情况下,关键挑战之一是计算能力的变化。在本文中,我们考虑在具有不同计算能力的工人的分布式系统上训练深度神经网络。同步分布式训练的简单实现将导致更快的工作人员等待最慢的工作人员完成处理。为了缓解这个问题,我们建议在训练期间动态调整分配给每个工人的数据。我们为每个工作者分配一个与其计算能力成比例的总数据分区。我们的实验表明,动态调整数据分区有助于提高系统的利用率,并显著减少训练所需的时间。代码可在存储库中找到:\url{https://github.com/vineeths96/Heterogeneous-Systems}. 摘要:Hardware compute power has been growing at an unprecedented rate in recent years. The utilization of such advancements plays a key role in producing better results in less time -- both in academia and industry. However, merging the existing hardware with the latest hardware within the same ecosystem poses a challenging task. One of the key challenges, in this case, is varying compute power. In this paper, we consider the training of deep neural networks on a distributed system of workers with varying compute power. A naive implementation of synchronous distributed training will result in the faster workers waiting for the slowest worker to complete processing. To mitigate this issue, we propose to dynamically adjust the data assigned for each worker during the training. We assign each worker a partition of total data proportional to its computing power. Our experiments show that dynamically adjusting the data partition helps to improve the utilization of the system and significantly reduces the time taken for training. Code is available at the repository: \url{https://github.com/vineeths96/Heterogeneous-Systems}.
【3】 Online Target Q-learning with Reverse Experience Replay: Efficiently finding the Optimal Policy for Linear MDPs 标题:基于反向经验回放的在线目标Q学习:线性MDP最优策略的有效求解 链接:https://arxiv.org/abs/2110.08440
作者:Naman Agarwal,Syomantak Chaudhuri,Prateek Jain,Dheeraj Nagaraj,Praneeth Netrapalli 机构:Google Research, University of California, Berkeley, MIT 备注:Under Review 摘要:Q-学习是一种流行的强化学习(RL)算法,在函数逼近的实践中得到广泛应用。相比之下,现有的理论结果对Q学习持悲观态度。例如,\ citep{Baird1995残数}表明,对于线性MDP,Q-学习即使在线性函数近似下也不会收敛。此外,即使对于具有同步更新的表格MDP,Q-学习也被证明具有次优样本复杂度\citep{li2021q,azar2013 minimax}。这项工作的目标是弥合Q学习的实际成功与相对悲观的理论结果之间的差距。我们工作的出发点是观察到,在实践中,Q-learning的使用有两个重要的修改:(i)使用两个网络进行训练,同时称为在线网络和目标网络(在线目标学习,或OTL),以及(ii)体验重播(ER)\citep{mnih2015human}。虽然已经观察到它们在Q-learning的实际成功中发挥了重要作用,但文献中对这两种修改如何改善Q-learning的收敛行为缺乏全面的理论理解。通过将Q-learning与OTL和经验重放(RER)相结合,我们提出了新的方法Q-Rex和Q-RexDaRe(Q-Rex+数据重用)。我们证明了Q-Rex有效地找到了线性MDP的最优策略(或者更一般地说,对于具有零固有Bellman误差的线性近似MDP(ZIBEL)),并提供了样本复杂度的非渐近界——这是标准假设下这类MDP的Q-学习方法的第一个这样的结果。此外,我们还证明了Q-RexDaRe实际上在表格设置中实现了接近最优的样本复杂度,从而改进了普通Q-learning的现有结果。 摘要:Q-learning is a popular Reinforcement Learning (RL) algorithm which is widely used in practice with function approximation \citep{mnih2015human}. In contrast, existing theoretical results are pessimistic about Q-learning. For example, \citep{baird1995residual} shows that Q-learning does not converge even with linear function approximation for linear MDPs. Furthermore, even for tabular MDPs with synchronous updates, Q-learning was shown to have sub-optimal sample complexity \citep{li2021q,azar2013minimax}. The goal of this work is to bridge the gap between practical success of Q-learning and the relatively pessimistic theoretical results. The starting point of our work is the observation that in practice, Q-learning is used with two important modifications: (i) training with two networks, called online network and target network simultaneously (online target learning, or OTL) , and (ii) experience replay (ER) \citep{mnih2015human}. While they have been observed to play a significant role in the practical success of Q-learning, a thorough theoretical understanding of how these two modifications improve the convergence behavior of Q-learning has been missing in literature. By carefully combining Q-learning with OTL and \emph{reverse} experience replay (RER) (a form of experience replay), we present novel methods Q-Rex and Q-RexDaRe (Q-Rex + data reuse). We show that Q-Rex efficiently finds the optimal policy for linear MDPs (or more generally for MDPs with zero inherent Bellman error with linear approximation (ZIBEL)) and provide non-asymptotic bounds on sample complexity -- the first such result for a Q-learning method for this class of MDPs under standard assumptions. Furthermore, we demonstrate that Q-RexDaRe in fact achieves near optimal sample complexity in the tabular setting, improving upon the existing results for vanilla Q-learning.
【4】 Training Neural Networks for Solving 1-D Optimal Piecewise Linear Approximation 标题:训练神经网络求解一维最优分段线性逼近 链接:https://arxiv.org/abs/2110.08259
作者:Hangcheng Dong,Jingxiao Liao,Yan Wang,Yixin Chen,Bingguo Liu,Dong Ye,Guodong Liu 机构:Liu a, School of Instrumentation Science and Engineering, Harbin Institute of Techonoloy, Harbin, China, BIOMIND, A R T I C L E I N F O 摘要:近年来,深度学习的可解释性引起了人们的广泛关注。大量的方法试图通过特征可视化、显著性映射、模型提取等来解释神经网络。然而,这些方法很难揭示神经网络的内在特性。在这项工作中,我们研究了1-D最优分段线性逼近(PWLA)问题,并将其与一个设计的神经网络,即晶格神经网络(LNN)相关联。我们提出了以下四个基本问题:(1)PWLA问题最优解的特征是什么?(2) LNN能收敛到全局最优吗?(3) LNN能收敛到局部最优吗?(4) LNN能解决PWLA问题吗?我们的主要贡献是,我们提出了描述PWLA问题最优解的定理,并给出了求解该问题的LNN方法。我们在近似任务上对提出的LNN进行了评估,建立了一种经验方法来提高LNN的性能。实验证明,我们的LNN方法与art方法相比具有竞争力。 摘要:Recently, the interpretability of deep learning has attracted a lot of attention. A plethora of methods have attempted to explain neural networks by feature visualization, saliency maps, model distillation, and so on. However, it is hard for these methods to reveal the intrinsic properties of neural networks. In this work, we studied the 1-D optimal piecewise linear approximation (PWLA) problem, and associated it with a designed neural network, named lattice neural network (LNN). We asked four essential questions as following: (1) What are the characters of the optimal solution of the PWLA problem? (2) Can an LNN converge to the global optimum? (3) Can an LNN converge to the local optimum? (4) Can an LNN solve the PWLA problem? Our main contributions are that we propose the theorems to characterize the optimal solution of the PWLA problem and present the LNN method for solving it. We evaluated the proposed LNNs on approximation tasks, forged an empirical method to improve the performance of LNNs. The experiments verified that our LNN method is competitive with the start-of-the-art method.
【5】 Efficient Exploration in Binary and Preferential Bayesian Optimization 标题:二元优先贝叶斯优化的高效探索 链接:https://arxiv.org/abs/2110.09361
作者:Tristan Fauvel,Matthew Chalk 机构:Sorbonne Universit´e, INSERM, CNRS, Institut de la Vision, F-, Paris, France 摘要:贝叶斯优化(BO)是一种优化昂贵的黑箱函数的有效方法,它寻求在利用(选择可能达到最大值的参数)和探索(选择目标函数不确定的参数)之间进行权衡。在许多实际情况下,不可能直接测量目标函数,只能使用二元测量,如成功/失败或成对比较。为了在这种环境下进行有效的探索,我们证明了BO算法区分不同类型的不确定性是很重要的:关于未知目标函数的认知不确定性,以及来自噪声观测且无法减少的任意不确定性。事实上,只有前者对有效勘探很重要。基于此,我们提出了几种新的捕获函数,它们在二进制和优先BO中的性能优于最先进的启发式算法,同时计算速度快,易于实现。然后,我们将这些获取规则推广到批量学习,即同时执行多个查询。 摘要:Bayesian optimization (BO) is an effective approach to optimize expensive black-box functions, that seeks to trade-off between exploitation (selecting parameters where the maximum is likely) and exploration (selecting parameters where we are uncertain about the objective function). In many real-world situations, direct measurements of the objective function are not possible, and only binary measurements such as success/failure or pairwise comparisons are available. To perform efficient exploration in this setting, we show that it is important for BO algorithms to distinguish between different types of uncertainty: epistemic uncertainty, about the unknown objective function, and aleatoric uncertainty, which comes from noisy observations and cannot be reduced. In effect, only the former is important for efficient exploration. Based on this, we propose several new acquisition functions that outperform state-of-the-art heuristics in binary and preferential BO, while being fast to compute and easy to implement. We then generalize these acquisition rules to batch learning, where multiple queries are performed simultaneously.
【6】 Rejoinder: Learning Optimal Distributionally Robust Individualized Treatment Rules 标题:反驳:学习最优分布稳健个性化治疗规则 链接:https://arxiv.org/abs/2110.08936
作者:Weibin Mo,Zhengling Qi,Yufeng Liu 备注:None 摘要:我们感谢编辑们为本次讨论提供的机会,感谢讨论者们的深刻评论和深思熟虑的贡献。我们还要祝贺Kallus(2020)在通过重新确定目标提高政策学习效率方面所做的鼓舞人心的工作。受Dukes和Vansteelandt(2020)中讨论的启发,我们在第1节中首先指出了我们的工作与Kallus(2020)之间有趣的联系和区别。特别是,这两篇论文中考虑的假设和变化来源导致了不同范围和重点的不同研究问题。在第2节中,在Li等人(2020)的讨论之后;梁和赵(2020),我们还考虑了有效的政策评估问题时,我们有一些数据从测试分布在训练阶段可用。我们表明,在假设训练和测试的样本量以相同的顺序增长的情况下,有效的价值函数估计可以提供有竞争力的性能。我们进一步展示了这些估计与现有文献的一些联系。然而,当可用于训练的测试样本量增长较慢时,有效的值函数估计可能不再有效。相比之下,DRITR的测试样本量要求不如使用组合数据进行有效的政策评估。最后,我们在第3节中强调了DRITR的普遍适用性和有用性。 摘要:We thank the opportunity offered by editors for this discussion and the discussants for their insightful comments and thoughtful contributions. We also want to congratulate Kallus (2020) for his inspiring work in improving the efficiency of policy learning by retargeting. Motivated from the discussion in Dukes and Vansteelandt (2020), we first point out interesting connections and distinctions between our work and Kallus (2020) in Section 1. In particular, the assumptions and sources of variation for consideration in these two papers lead to different research problems with different scopes and focuses. In Section 2, following the discussions in Li et al. (2020); Liang and Zhao (2020), we also consider the efficient policy evaluation problem when we have some data from the testing distribution available at the training stage. We show that under the assumption that the sample sizes from training and testing are growing in the same order, efficient value function estimates can deliver competitive performance. We further show some connections of these estimates with existing literature. However, when the growth of testing sample size available for training is in a slower order, efficient value function estimates may not perform well anymore. In contrast, the requirement of the testing sample size for DRITR is not as strong as that of efficient policy evaluation using the combined data. Finally, we highlight the general applicability and usefulness of DRITR in Section 3.
【7】 Pareto Navigation Gradient Descent: a First-Order Algorithm for Optimization in Pareto Set 标题:Pareto导航梯度下降:Pareto集中的一阶优化算法 链接:https://arxiv.org/abs/2110.08713
作者:Mao Ye,Qiang Liu 机构:University of Texas at Austin 摘要:许多现代机器学习应用,如多任务学习,需要找到最佳的模型参数来权衡可能相互冲突的多个目标函数。帕累托集的概念使我们能够专注于无法严格改进的模型集(通常是无限数量的)。但它并没有提供一个可操作的程序来选择一个或几个特殊的模型返回给实际用户。在本文中,我们考虑了帕累托集(Opt in帕累托)中的“EMPH{{优化”},找到了在帕累托集内优化一个额外的参考准则函数的帕累托模型的问题。该函数可以对用户的特定偏好进行编码,或者表示通用多样性度量,以获得代表整个帕累托集的一组多样性帕累托模型。不幸的是,尽管是一个非常有用的框架,有效的选择-加入-帕累托算法在很大程度上仍然缺失,特别是对于深度学习中的大规模、非凸和非线性目标。一种简单的方法是在Pareto集上应用黎曼流形梯度下降,由于需要对Hessian矩阵进行特征计算,因此会产生较高的计算成本。我们提出了一种仅利用梯度信息近似求解OPT-in-Pareto的一阶算法,具有较高的实用效率和理论上保证的收敛性。经验上,我们证明了我们的方法对于各种具有挑战性的多任务相关问题是有效的。 摘要:Many modern machine learning applications, such as multi-task learning, require finding optimal model parameters to trade-off multiple objective functions that may conflict with each other. The notion of the Pareto set allows us to focus on the set of (often infinite number of) models that cannot be strictly improved. But it does not provide an actionable procedure for picking one or a few special models to return to practical users. In this paper, we consider \emph{optimization in Pareto set (OPT-in-Pareto)}, the problem of finding Pareto models that optimize an extra reference criterion function within the Pareto set. This function can either encode a specific preference from the users, or represent a generic diversity measure for obtaining a set of diversified Pareto models that are representative of the whole Pareto set. Unfortunately, despite being a highly useful framework, efficient algorithms for OPT-in-Pareto have been largely missing, especially for large-scale, non-convex, and non-linear objectives in deep learning. A naive approach is to apply Riemannian manifold gradient descent on the Pareto set, which yields a high computational cost due to the need for eigen-calculation of Hessian matrices. We propose a first-order algorithm that approximately solves OPT-in-Pareto using only gradient information, with both high practical efficiency and theoretically guaranteed convergence property. Empirically, we demonstrate that our method works efficiently for a variety of challenging multi-task-related problems.
【8】 Nys-Curve: Nyström-Approximated Curvature for Stochastic Optimization 标题:NYS曲线:随机优化的Nyström近似曲率 链接:https://arxiv.org/abs/2110.08577
作者:Hardik Tankaria,Dinesh Singh,Makoto Yamada 机构:RIKEN AIP,Kyoto University 摘要:拟牛顿方法通常通过使用割线方程近似Hessian曲线来提供曲率信息。然而,由于使用了一阶导数,割线方程在近似牛顿步时变得平淡无奇。在这项研究中,我们提出了一种基于近似牛顿步的随机优化算法,用于线性收敛的凸函数的大规模经验风险最小化。具体地说,我们使用$k\ll d$随机选择的变量计算大小为($d\times k$)的部分列Hessian,然后使用\textit{Nystr\“om method}更好地近似完整Hessian矩阵。为了进一步降低每次迭代的计算复杂度,我们直接计算更新步骤($\Delta\boldsymbol{w}$)无需计算和存储完整Hessian或其逆。此外,为了解决即使计算部分Hessian也可能需要大量时间的大规模场景,我们使用了分布保持(DP)子采样以计算部分Hessian。DP子采样生成具有类似一阶和二阶分布统计信息的$p$子样本,并在每个历元以循环方式选择单个子样本以计算部分Hessian。我们将近似Hessian与随机梯度下降和随机方差re相结合数值实验表明,所提出的方法能够获得牛顿法的更好逼近,其性能与最新的一阶和随机拟牛顿法相当。 摘要:The quasi-Newton methods generally provide curvature information by approximating the Hessian using the secant equation. However, the secant equation becomes insipid in approximating the Newton step owing to its use of the first-order derivatives. In this study, we propose an approximate Newton step-based stochastic optimization algorithm for large-scale empirical risk minimization of convex functions with linear convergence rates. Specifically, we compute a partial column Hessian of size ($d\times k$) with $k\ll d$ randomly selected variables, then use the \textit{Nystr\"om method} to better approximate the full Hessian matrix. To further reduce the computational complexity per iteration, we directly compute the update step ($\Delta\boldsymbol{w}$) without computing and storing the full Hessian or its inverse. Furthermore, to address large-scale scenarios in which even computing a partial Hessian may require significant time, we used distribution-preserving (DP) sub-sampling to compute a partial Hessian. The DP sub-sampling generates $p$ sub-samples with similar first and second-order distribution statistics and selects a single sub-sample at each epoch in a round-robin manner to compute the partial Hessian. We integrate our approximated Hessian with stochastic gradient descent and stochastic variance-reduced gradients to solve the logistic regression problem. The numerical experiments show that the proposed approach was able to obtain a better approximation of Newton\textquotesingle s method with performance competitive with the state-of-the-art first-order and the stochastic quasi-Newton methods.
预测|估计(13篇)
【1】 On Predictive Explanation of Data Anomalies 标题:关于数据异常的预测性解释 链接:https://arxiv.org/abs/2110.09467
作者:Nikolaos Myrtakis,Ioannis Tsamardinos,Vassilis Christophides 机构:University of Crete, Heraklion, Greece, ENSEA, Cergy, France 备注:12 pages 摘要:Numerous algorithms have been proposed for detecting anomalies (outliers, novelties) in an unsupervised manner. Unfortunately, it is not trivial, in general, to understand why a given sample (record) is labelled as an anomaly and thus diagnose its root causes. We propose the following reduced-dimensionality, surrogate model approach to explain detector decisions: approximate the detection model with another one that employs only a small subset of features. Subsequently, samples can be visualized in this low-dimensionality space for human understanding. To this end, we develop PROTEUS, an AutoML pipeline to produce the surrogate model, specifically designed for feature selection on imbalanced datasets. The PROTEUS surrogate model can not only explain the training data, but also the out-of-sample (unseen) data. In other words, PROTEUS produces predictive explanations by approximating the decision surface of an unsupervised detector. PROTEUS is designed to return an accurate estimate of out-of-sample predictive performance to serve as a metric of the quality of the approximation. Computational experiments confirm the efficacy of PROTEUS to produce predictive explanations for different families of detectors and to reliably estimate their predictive performance in unseen data. Unlike several ad-hoc feature importance methods, PROTEUS is robust to high-dimensional data.
【2】 Prediction of Occurrence of Extreme Events using Machine Learning 标题:基于机器学习的极端事件发生预测 链接:https://arxiv.org/abs/2110.09304
作者:J. Meiyazhagan,S. Sudharsan,A. Venkatasen,M. Senthilvelan 机构:Department of Nonlinear Dynamics, Bharathidasan University, Tiruchirappalli - , Tamilnadu, India, PG and Research Department of Physics, Nehru Memorial College (Autonomous), Puthanampatti, Tiruchirappalli , Tamil Nadu, India. 摘要:Machine learning models play a vital role in the prediction task in several fields of study. In this work, we utilize the ability of machine learning algorithms for the prediction of occurrence of extreme events in a nonlinear mechanical system. Extreme events are rare events which occur ubiquitously in nature. We consider four machine learning models, namely Logistic Regression, Support Vector Machine, Random Forest and Multi-Layer Perceptron in our prediction task. We train these four machine learning models using training set data and compute the performance of each model using the test set data. We show that Multi-Layer Perceptron model performs better among the four models in the prediction of extreme events in the considered system. The persistent behaviour of the considered machine learning models are cross-checked with randomly shuffled training set and test set data.
【3】 Real-time Mortality Prediction Using MIMIC-IV ICU Data Via Boosted Nonparametric Hazards 标题:通过增强的非参数风险使用MIMIC-IV ICU数据进行实时死亡率预测 链接:https://arxiv.org/abs/2110.08949
作者:Zhale Nowroozilarki,Arash Pakbin,James Royalty,Donald K. K. Lee,Bobak J. Mortazavi 机构:∗Department of Computer Science & Engineering, Texas A&M University 摘要:Electronic Health Record (EHR) systems provide critical, rich and valuable information at high frequency. One of the most exciting applications of EHR data is in developing a real-time mortality warning system with tools from survival analysis. However, most of the survival analysis methods used recently are based on (semi)parametric models using static covariates. These models do not take advantage of the information conveyed by the time-varying EHR data. In this work, we present an application of a highly scalable survival analysis method, BoXHED 2.0 to develop a real-time in-ICU mortality warning indicator based on the MIMIC IV data set. Importantly, BoXHED can incorporate time-dependent covariates in a fully nonparametric manner and is backed by theory. Our in-ICU mortality model achieves an AUC-PRC of 0.41 and AUC-ROC of 0.83 out of sample, demonstrating the benefit of real-time monitoring.
【4】 Predicting the Performance of Multilingual NLP Models 标题:多语言自然语言处理模型的性能预测 链接:https://arxiv.org/abs/2110.08875
作者:Anirudh Srinivasan,Sunayana Sitaram,Tanuja Ganu,Sandipan Dandapat,Kalika Bali,Monojit Choudhury 机构: The University of Texas at Austin 摘要:Recent advancements in NLP have given us models like mBERT and XLMR that can serve over 100 languages. The languages that these models are evaluated on, however, are very few in number, and it is unlikely that evaluation datasets will cover all the languages that these models support. Potential solutions to the costly problem of dataset creation are to translate datasets to new languages or use template-filling based techniques for creation. This paper proposes an alternate solution for evaluating a model across languages which make use of the existing performance scores of the model on languages that a particular task has test sets for. We train a predictor on these performance scores and use this predictor to predict the model's performance in different evaluation settings. Our results show that our method is effective in filling the gaps in the evaluation for an existing set of languages, but might require additional improvements if we want it to generalize to unseen languages.
【5】 TIP: Task-Informed Motion Prediction for Intelligent Systems 标题:提示:面向智能系统的任务知晓运动预测 链接:https://arxiv.org/abs/2110.08750
作者:Xin Huang,Guy Rosman,Ashkan Jasour,Stephen G. McGill,John J. Leonard,Brian C. Williams 机构:Mas-sachusettsInstituteofTechnology, edu 2Toyota Research Institute 备注:8 pages, 6 figures, 2 tables 摘要:Motion prediction is important for intelligent driving systems, providing the future distributions of road agent behaviors and supporting various decision making tasks. Existing motion predictors are often optimized and evaluated via task-agnostic measures based on prediction accuracy. Such measures fail to account for the use of prediction in downstream tasks, and could result in sub-optimal task performance. We propose a task-informed motion prediction framework that jointly reasons about prediction accuracy and task utility, to better support downstream tasks through its predictions. The task utility function does not require the full task information, but rather a specification of the utility of the task, resulting in predictors that serve a wide range of downstream tasks. We demonstrate our framework on two use cases of task utilities, in the context of autonomous driving and parallel autonomy, and show the advantage of task-informed predictors over task-agnostic ones on the Waymo Open Motion dataset.
【6】 DFW-PP: Dynamic Feature Weighting based Popularity Prediction for Social Media Content 标题:DFW-PP:基于动态特征权重的社交媒体内容热度预测 链接:https://arxiv.org/abs/2110.08510
作者:Viswanatha Reddy G,Chaitanya B S N V,Prathyush P,Sumanth M,Mrinalini C,Dileep Kumar P,Snehasis Mukherjee 机构:IIIT Sri CIty, Shiv Nadar University 摘要:The increasing popularity of social media platforms makes it important to study user engagement, which is a crucial aspect of any marketing strategy or business model. The over-saturation of content on social media platforms has persuaded us to identify the important factors that affect content popularity. This comes from the fact that only an iota of the humongous content available online receives the attention of the target audience. Comprehensive research has been done in the area of popularity prediction using several Machine Learning techniques. However, we observe that there is still significant scope for improvement in analyzing the social importance of media content. We propose the DFW-PP framework, to learn the importance of different features that vary over time. Further, the proposed method controls the skewness of the distribution of the features by applying a log-log normalization. The proposed method is experimented with a benchmark dataset, to show promising results. The code will be made publicly available at https://github.com/chaitnayabasava/DFW-PP.
【7】 PG^2Net: Personalized and Group Preferences Guided Network for Next Place Prediction标题:PG^2NET:用于下一地点预测的个性化和群体偏好引导网络链接:https://arxiv.org/abs/2110.08266
作者:Huifeng Li,Bin Wang,Fan Xia,Xi Zhai,Sulei Zhu,Yanyan Xu 机构:ShandongUniversity 摘要:Predicting the next place to visit is a key in human mobility behavior modeling, which plays a significant role in various fields, such as epidemic control, urban planning, traffic management, and travel recommendation. To achieve this, one typical solution is designing modules based on RNN to capture their preferences to various locations. Although these RNN-based methods can effectively learn individual's hidden personalized preferences to her visited places, the interactions among users can only be weakly learned through the representations of locations. Targeting this, we propose an end-to-end framework named personalized and group preference guided network (PG$^2$Net), considering the users' preferences to various places at both individual and collective levels. Specifically, PG$^2$Net concatenates Bi-LSTM and attention mechanism to capture each user's long-term mobility tendency. To learn population's group preferences, we utilize spatial and temporal information of the visitations to construct a spatio-temporal dependency module. We adopt a graph embedding method to map users' trajectory into a hidden space, capturing their sequential relation. In addition, we devise an auxiliary loss to learn the vectorial representation of her next location. Experiment results on two Foursquare check-in datasets and one mobile phone dataset indicate the advantages of our model compared to the state-of-the-art baselines. Source codes are available at https://github.com/urbanmobility/PG2Net.
【8】 Sector Volatility Prediction Performance Using GARCH Models and Artificial Neural Networks 标题:基于GARCH模型和人工神经网络的行业波动性预测性能 链接:https://arxiv.org/abs/2110.09489
作者:Curtis Nybo 机构:University of London 备注:26 pages 摘要:Recently artificial neural networks (ANNs) have seen success in volatility prediction, but the literature is divided on where an ANN should be used rather than the common GARCH model. The purpose of this study is to compare the volatility prediction performance of ANN and GARCH models when applied to stocks with low, medium, and high volatility profiles. This approach intends to identify which model should be used for each case. The volatility profiles comprise of five sectors that cover all stocks in the U.S stock market from 2005 to 2020. Three GARCH specifications and three ANN architectures are examined for each sector, where the most adequate model is chosen to move on to forecasting. The results indicate that the ANN model should be used for predicting volatility of assets with low volatility profiles, and GARCH models should be used when predicting volatility of medium and high volatility assets.
【9】 Prediction of liquid fuel properties using machine learning models with Gaussian processes and probabilistic conditional generative learning 标题:基于高斯过程和概率条件产生式学习的机器学习模型预测液体燃料性质 链接:https://arxiv.org/abs/2110.09360
作者:Rodolfo S. M. Freitas,Ágatha P. F. Lima,Cheng Chen,Fernando A. Rochinha,Daniel Mira,Xi Jiang 机构:Dept. Mechanical Engineering, COPPE Federal University of Rio de Janeiro, RJ ,-, Rio de Janeiro, Brazil, School of Engineering and Materials Science, Queen Mary University of London, Mile End Road, London E,NS, UK, Barcelona Supercomputing Center, Barcelona, Spain, ∗ 备注:22 pages, 13 figures 摘要:Accurate determination of fuel properties of complex mixtures over a wide range of pressure and temperature conditions is essential to utilizing alternative fuels. The present work aims to construct cheap-to-compute machine learning (ML) models to act as closure equations for predicting the physical properties of alternative fuels. Those models can be trained using the database from MD simulations and/or experimental measurements in a data-fusion-fidelity approach. Here, Gaussian Process (GP) and probabilistic generative models are adopted. GP is a popular non-parametric Bayesian approach to build surrogate models mainly due to its capacity to handle the aleatory and epistemic uncertainties. Generative models have shown the ability of deep neural networks employed with the same intent. In this work, ML analysis is focused on a particular property, the fuel density, but it can also be extended to other physicochemical properties. This study explores the versatility of the ML models to handle multi-fidelity data. The results show that ML models can predict accurately the fuel properties of a wide range of pressure and temperature conditions.
【10】 Neural message passing for predicting abnormal grain growth in Monte Carlo simulations of microstructural evolution 标题:预测显微组织演变Monte Carlo模拟中异常晶粒长大的神经信息传递 链接:https://arxiv.org/abs/2110.09326
作者:Ryan Cohn,Elizabeth Holm 机构:Department of Materials Science. Carnegie Mellon University. Pittsburgh, PA, USA 备注:17 pages, 11 figures 摘要:Abnormal grain growth can significantly alter the properties of materials during processing. This can cause significant variation in the properties and performance of in-spec feedstock components subjected to identical processing paths. Understanding and controlling abnormal grain growth has proved to be elusive due to the stochastic nature of this phenomenon. However, recent advances in deep learning provide a promising alternative to traditional experimental and physics-based methods for understanding this phenomenon. Neural message passing allows deep learning to be applied to irregular inputs including graph representations of grain structures in a material. In this study we generate a large database of Monte Carlo simulations of abnormal grain growth in an idealized system. We apply message passing neural networks to predict the occurrence of abnormal grain growth in these simulations using only the initial state of the system as input. A computer vision model is also trained for the same task for comparison. The preliminary results indicate that the message passing approach outperforms the computer vision method and achieved 75% prediction accuracy, significantly better than random guessing. Analysis of the uncertainty in the Monte Carlo simulations provides a road map for ongoing work on this project.
【11】 Predicting Status of Pre and Post M&A Deals Using Machine Learning and Deep Learning Techniques 标题:基于机器学习和深度学习技术的并购前后状态预测 链接:https://arxiv.org/abs/2110.09315
作者:Tugce Karatas,Ali Hirsa 机构:†Department of IEOR, Columbia University, edu‡Department of IEOR 备注:21 pages 摘要:Risk arbitrage or merger arbitrage is a well-known investment strategy that speculates on the success of M&A deals. Prediction of the deal status in advance is of great importance for risk arbitrageurs. If a deal is mistakenly classified as a completed deal, then enormous cost can be incurred as a result of investing in target company shares. On the contrary, risk arbitrageurs may lose the opportunity of making profit. In this paper, we present an ML and DL based methodology for takeover success prediction problem. We initially apply various ML techniques for data preprocessing such as kNN for data imputation, PCA for lower dimensional representation of numerical variables, MCA for categorical variables, and LSTM autoencoder for sentiment scores. We experiment with different cost functions, different evaluation metrics, and oversampling techniques to address class imbalance in our dataset. We then implement feedforward neural networks to predict the success of the deal status. Our preliminary results indicate that our methodology outperforms the benchmark models such as logit and weighted logit models. We also integrate sentiment scores into our methodology using different model architectures, but our preliminary results show that the performance is not changing much compared to the simple FFNN framework. We will explore different architectures and employ a thorough hyperparameter tuning for sentiment scores as a future work.
【12】 Fast Strain Estimation and Frame Selection in Ultrasound Elastography using Machine Learning 标题:超声弹性成像中基于机器学习的快速应变估计和帧选择 链接:https://arxiv.org/abs/2110.08668
作者:Abdelrahman Zayed,Hassan Rivaz 机构: UltrasoundAbdelrahman Zayed and Hassan Rivaz are with the Department of Electricaland Computer Engineering and PERFORM Centre, Concordia University 备注:None 摘要:Ultrasound Elastography aims to determine the mechanical properties of the tissue by monitoring tissue deformation due to internal or external forces. Tissue deformations are estimated from ultrasound radio frequency (RF) signals and are often referred to as time delay estimation (TDE). Given two RF frames I1 and I2, we can compute a displacement image which shows the change in the position of each sample in I1 to a new position in I2. Two important challenges in TDE include high computational complexity and the difficulty in choosing suitable RF frames. Selecting suitable frames is of high importance because many pairs of RF frames either do not have acceptable deformation for extracting informative strain images or are decorrelated and deformation cannot be reliably estimated. Herein, we introduce a method that learns 12 displacement modes in quasi-static elastography by performing Principal Component Analysis (PCA) on displacement fields of a large training database. In the inference stage, we use dynamic programming (DP) to compute an initial displacement estimate of around 1% of the samples, and then decompose this sparse displacement into a linear combination of the 12 displacement modes. Our method assumes that the displacement of the whole image could also be described by this linear combination of principal components. We then use the GLobal Ultrasound Elastography (GLUE) method to fine-tune the result yielding the exact displacement image. Our method, which we call PCA-GLUE, is more than 10 times faster than DP in calculating the initial displacement map while giving the same result. Our second contribution in this paper is determining the suitability of the frame pair I1 and I2 for strain estimation, which we achieve by using the weight vector that we calculated for PCA-GLUE as an input to a multi-layer perceptron (MLP) classifier.
【13】 Mode and Ridge Estimation in Euclidean and Directional Product Spaces: A Mean Shift Approach 标题:欧氏空间和方向积空间中的模和岭估计:均值漂移方法 链接:https://arxiv.org/abs/2110.08505
作者:Yikun Zhang,Yen-Chi Chen 机构:Department of Statistics, University of Washington, Seattle, WA , USA 备注:51 pages, 10 figures 摘要:The set of local modes and the ridge lines estimated from a dataset are important summary characteristics of the data-generating distribution. In this work, we consider estimating the local modes and ridges from point cloud data in a product space with two or more Euclidean/directional metric spaces. Specifically, we generalize the well-known (subspace constrained) mean shift algorithm to the product space setting and illuminate some pitfalls in such generalization. We derive the algorithmic convergence of the proposed method, provide practical guidelines on the implementation, and demonstrate its effectiveness on both simulated and real datasets.
其他神经网络|深度学习|模型|建模(45篇)
【1】 Discovering and Achieving Goals via World Models 标题:通过世界模型发现和实现目标 链接:https://arxiv.org/abs/2110.09514
作者:Russell Mendonca,Oleh Rybkin,Kostas Daniilidis,Danijar Hafner,Deepak Pathak 机构:Carnegie Mellon University, University of Pennsylvania, University of Toronto 备注:NeurIPS 2021. First two authors contributed equally. Website at this https URL 摘要:How can artificial agents learn to solve many diverse tasks in complex visual environments in the absence of any supervision? We decompose this question into two problems: discovering new goals and learning to reliably achieve them. We introduce Latent Explorer Achiever (LEXA), a unified solution to these that learns a world model from image inputs and uses it to train an explorer and an achiever policy from imagined rollouts. Unlike prior methods that explore by reaching previously visited states, the explorer plans to discover unseen surprising states through foresight, which are then used as diverse targets for the achiever to practice. After the unsupervised phase, LEXA solves tasks specified as goal images zero-shot without any additional learning. LEXA substantially outperforms previous approaches to unsupervised goal-reaching, both on prior benchmarks and on a new challenging benchmark with a total of 40 test tasks spanning across four standard robotic manipulation and locomotion domains. LEXA further achieves goals that require interacting with multiple objects in sequence. Finally, to demonstrate the scalability and generality of LEXA, we train a single general agent across four distinct environments. Code and videos at https://orybkin.github.io/lexa/
【2】 Learning in High Dimension Always Amounts to Extrapolation 标题:高维学习总是等同于外推 链接:https://arxiv.org/abs/2110.09485
作者:Randall Balestriero,Jerome Pesenti,Yann LeCun 机构:Facebook AI Research,NYU 摘要:The notion of interpolation and extrapolation is fundamental in various fields from deep learning to function approximation. Interpolation occurs for a sample $x$ whenever this sample falls inside or on the boundary of the given dataset's convex hull. Extrapolation occurs when $x$ falls outside of that convex hull. One fundamental (mis)conception is that state-of-the-art algorithms work so well because of their ability to correctly interpolate training data. A second (mis)conception is that interpolation happens throughout tasks and datasets, in fact, many intuitions and theories rely on that assumption. We empirically and theoretically argue against those two points and demonstrate that on any high-dimensional ($>$100) dataset, interpolation almost surely never happens. Those results challenge the validity of our current interpolation/extrapolation definition as an indicator of generalization performances.
【3】 TLDR: Twin Learning for Dimensionality Reduction 标题:TLDR:用于降维的孪生学习 链接:https://arxiv.org/abs/2110.09455
作者:Yannis Kalantidis,Carlos Lassance,Jon Almazan,Diane Larlus 机构:Jon Almazán, NAVER LABS Europe 备注:Code available at: this https URL 摘要:Dimensionality reduction methods are unsupervised approaches which learn low-dimensional spaces where some properties of the initial space, typically the notion of "neighborhood", are preserved. They are a crucial component of diverse tasks like visualization, compression, indexing, and retrieval. Aiming for a totally different goal, self-supervised visual representation learning has been shown to produce transferable representation functions by learning models that encode invariance to artificially created distortions, e.g. a set of hand-crafted image transformations. Unlike manifold learning methods that usually require propagation on large k-NN graphs or complicated optimization solvers, self-supervised learning approaches rely on simpler and more scalable frameworks for learning. In this paper, we unify these two families of approaches from the angle of manifold learning and propose TLDR, a dimensionality reduction method for generic input spaces that is porting the simple self-supervised learning framework of Barlow Twins to a setting where it is hard or impossible to define an appropriate set of distortions by hand. We propose to use nearest neighbors to build pairs from a training set and a redundancy reduction loss borrowed from the self-supervised literature to learn an encoder that produces representations invariant across such pairs. TLDR is a method that is simple, easy to implement and train, and of broad applicability; it consists of an offline nearest neighbor computation step that can be highly approximated, and a straightforward learning process that does not require mining negative samples to contrast, eigendecompositions, or cumbersome optimization solvers. By replacing PCA with TLDR, we are able to increase the performance of GeM-AP by 4% mAP for 128 dimensions, and to retain its performance with 16x fewer dimensions.
【4】 Goal Agnostic Planning using Maximum Likelihood Paths in Hypergraph World Models 标题:超图世界模型中基于最大似然路径的目标不可知规划 链接:https://arxiv.org/abs/2110.09442
作者:Christopher Robinson 机构:Department of Electrical and Computer Engineering, University of Louisville, Louisville, KY , USA, Editor: 备注:58 pages, 27 figures, comments 摘要:In this paper, we present a hypergraph--based machine learning algorithm, a datastructure--driven maintenance method, and a planning algorithm based on a probabilistic application of Dijkstra's algorithm. Together, these form a goal agnostic automated planning engine for an autonomous learning agent which incorporates beneficial properties of both classical Machine Learning and traditional Artificial Intelligence. We prove that the algorithm determines optimal solutions within the problem space, mathematically bound learning performance, and supply a mathematical model analyzing system state progression through time yielding explicit predictions for learning curves, goal achievement rates, and response to abstractions and uncertainty. To validate performance, we exhibit results from applying the agent to three archetypal planning problems, including composite hierarchical domains, and highlight empirical findings which illustrate properties elucidated in the analysis.
【5】 Fair Tree Learning 标题:公平树学习 链接:https://arxiv.org/abs/2110.09295
作者:António Pereira Barata,Cor J. Veenman 机构:Leiden Institute of Advanced Computer Science, Leiden University, The Netherlands 摘要:When dealing with sensitive data in automated data-driven decision-making, an important concern is to learn predictors with high performance towards a class label, whilst minimising for the discrimination towards some sensitive attribute, like gender or race, induced from biased data. Various hybrid optimisation criteria exist which combine classification performance with a fairness metric. However, while the threshold-free ROC-AUC is the standard for measuring traditional classification model performance, current fair decision tree methods only optimise for a fixed threshold on both the classification task as well as the fairness metric. Moreover, current tree learning frameworks do not allow for fair treatment with respect to multiple categories or multiple sensitive attributes. Lastly, the end-users of a fair model should be able to balance fairness and classification performance according to their specific ethical, legal, and societal needs. In this paper we address these shortcomings by proposing a threshold-independent fairness metric termed uniform demographic parity, and a derived splitting criterion entitled SCAFF -- Splitting Criterion AUC for Fairness -- towards fair decision tree learning, which extends to bagged and boosted frameworks. Compared to the state-of-the-art, our method provides three main advantages: (1) classifier performance and fairness are defined continuously instead of relying upon an, often arbitrary, decision threshold; (2) it leverages multiple sensitive attributes simultaneously, of which the values may be multicategorical; and (3) the unavoidable performance-fairness trade-off is tunable during learning. In our experiments, we demonstrate how SCAFF attains high predictive performance towards the class label and low discrimination with respect to binary, multicategorical, and multiple sensitive attributes, further substantiating our claims.
【6】 Speeding-Up Back-Propagation in DNN: Approximate Outer Product with Memory 标题:DNN中的加速反向传播:带记忆的近似外积 链接:https://arxiv.org/abs/2110.09164
作者:Eduin E. Hernandez,Stefano Rini,Tolga M. Duman 机构:†NYCU, Taiwan, ⋆Bilkent University, Turkey 备注:5 pages, 3 figures 摘要:In this paper, an algorithm for approximate evaluation of back-propagation in DNN training is considered, which we term Approximate Outer Product Gradient Descent with Memory (Mem-AOP-GD). The Mem-AOP-GD algorithm implements an approximation of the stochastic gradient descent by considering only a subset of the outer products involved in the matrix multiplications that encompass backpropagation. In order to correct for the inherent bias in this approximation, the algorithm retains in memory an accumulation of the outer products that are not used in the approximation. We investigate the performance of the proposed algorithm in terms of DNN training loss under two design parameters: (i) the number of outer products used for the approximation, and (ii) the policy used to select such outer products. We experimentally show that significant improvements in computational complexity as well as accuracy can indeed be obtained through Mem-AOPGD.
【7】 A Dimensionality Reduction Approach for Convolutional Neural Networks 标题:卷积神经网络的一种降维方法 链接:https://arxiv.org/abs/2110.09163
作者:Laura Meneghetti,Nicola Demo,Gianluigi Rozza 机构:Mathematics Area, mathLab, SISSA, via Bonomea , I-, Trieste, Italy 摘要:The focus of this paper is the application of classical model order reduction techniques, such as Active Subspaces and Proper Orthogonal Decomposition, to Deep Neural Networks. We propose a generic methodology to reduce the number of layers of a pre-trained network by combining the aforementioned techniques for dimensionality reduction with input-output mappings, such as Polynomial Chaos Expansion and Feedforward Neural Networks. The necessity of compressing the architecture of an existing Convolutional Neural Network is motivated by its application in embedded systems with specific storage constraints. Our experiment shows that the reduced nets obtained can achieve a level of accuracy similar to the original Convolutional Neural Network under examination, while saving in memory allocation.
【8】 EmbRace: Accelerating Sparse Communication for Distributed Training of NLP Neural Networks 标题:ACCEPT:加速稀疏通信以实现NLP神经网络的分布式训练 链接:https://arxiv.org/abs/2110.09132
作者:Shengwei Li,Zhiquan Lai,Dongsheng Li,Xiangyu Ye,Yabo Duan 机构:National Key Laboratory of Parallel and Distributed Processing, Computer College, National University of Defense Technology, China 摘要:Distributed data-parallel training has been widely used for natural language processing (NLP) neural network models. However, the embedding tables in NLP models, holding a large portion of parameters and bringing dramatic sparsity in communication, make it a big challenge to efficiently scale the distributed training. Current distributed training frameworks mainly concentrate on dense models but neglect the sparsity of NLP models, resulting in significant communication overhead and relatively poor scalability. In this paper, we propose EmbRace, an efficient communication framework designed to accelerate sparse communication of distributed NLP model training. EmbRace introduces Sparsity-aware Hybrid Communication, which combines AlltoAll and AllReduce to optimize the communication overhead for sparse and dense data in NLP models. EmbRace further introduces a 2D Communication Scheduling approach to thoroughly overlap communication with computation by optimizing model computation procedure, relaxing the dependency of embeddings, and scheduling communication with a priority queue. We implement EmbRace based on PyTorch and Horovod, and conduct comprehensive evaluations with four representative NLP models on two high-performance GPU clusters. Experimental results show that EmbRace achieves up to 30.66X speedup on 16 GPUs clusters among four popular distributed training baselines.
【9】 Analyzing Wikipedia Membership Dataset and PredictingUnconnected Nodes in the Signed Networks 标题:分析维基百科成员数据集并预测签名网络中的未连接节点 链接:https://arxiv.org/abs/2110.09111
作者:Zhihao Wu,Taoran Li,Ray Roman 机构:University of California, Los Angeles 备注:The work was done in UCLA CS249 17Spring 摘要:In the age of digital interaction, person-to-person relationships existing on social media may be different from the very same interactions that exist offline. Examining potential or spurious relationships between members in a social network is a fertile area of research for computer scientists -- here we examine how relationships can be predicted between two unconnected people in a social network by using area under Precison-Recall curve and ROC. Modeling the social network as a signed graph, we compare Triadic model,Latent Information model and Sentiment model and use them to predict peer to peer interactions, first using a plain signed network, and second using a signed network with comments as context. We see that our models are much better than random model and could complement each other in different cases.
【10】 Edge Rewiring Goes Neural: Boosting Network Resilience via Policy Gradient 标题:边缘重新布线走向神经:通过策略梯度提高网络弹性 链接:https://arxiv.org/abs/2110.09035
作者:Shanchao Yang,Kaili Ma,Baoxiang Wang,Hongyuan Zha 机构:School of Data Science, The Chinese University of Hong Kong, Shenzhen, Department of Computer Science and Engineering, Shenzhen Institute of Artificial Intelligence and Robotics for Society 摘要:Improving the resilience of a network protects the system from natural disasters and malicious attacks. This is typically achieved by introducing new edges, which however may reach beyond the maximum number of connections a node could sustain. Many studies then resort to the degree-preserving operation of rewiring, which swaps existing edges $AC, BD$ to new edges $AB, CD$. A significant line of studies focuses on this technique for theoretical and practical results while leaving three limitations: network utility loss, local optimality, and transductivity. In this paper, we propose ResiNet, a reinforcement learning (RL)-based framework to discover resilient network topologies against various disasters and attacks. ResiNet is objective agnostic which allows the utility to be balanced by incorporating it into the objective function. The local optimality, typically seen in greedy algorithms, is addressed by casting the cumulative resilience gain into a sequential decision process of step-wise rewiring. The transductivity, which refers to the necessity to run a computationally intensive optimization for each input graph, is lifted by our variant of RL with auto-regressive permutation-invariant variable action space. ResiNet is armed by our technical innovation, Filtration enhanced GNN (FireGNN), which distinguishes graphs with minor differences. It is thus possible for ResiNet to capture local structure changes and adapt its decision among consecutive graphs, which is known to be infeasible for GNN. Extensive experiments demonstrate that with a small number of rewiring operations, ResiNet achieves a near-optimal resilience gain on multiple graphs while balancing the utility, with a large margin compared to existing approaches.
【11】 Deep Learning-Based Power Control for Uplink Cell-Free Massive MIMO Systems 标题:基于深度学习的上行无小区大规模MIMO系统功率控制 链接:https://arxiv.org/abs/2110.09001
作者:Yongshun Zhang,Jiayi Zhang,Yu Jin,Stefano Buzzi,Bo Ai 机构: Beijing Jiaotong University, Buzzi is with Department of Electrical and Information Engineering, University of Cassino and Lazio Meridionale 备注:6 pages, 6 figures, accepted by IEEE Globecom 2021 摘要:In this paper, a general framework for deep learning-based power control methods for max-min, max-product and max-sum-rate optimization in uplink cell-free massive multiple-input multiple-output (CF mMIMO) systems is proposed. Instead of using supervised learning, the proposed method relies on unsupervised learning, in which optimal power allocations are not required to be known, and thus has low training complexity. More specifically, a deep neural network (DNN) is trained to learn the map between fading coefficients and power coefficients within short time and with low computational complexity. It is interesting to note that the spectral efficiency of CF mMIMO systems with the proposed method outperforms previous optimization methods for max-min optimization and fits well for both max-sum-rate and max-product optimizations.
【12】 Finding Everything within Random Binary Networks 标题:查找随机二进制网络中的所有内容 链接:https://arxiv.org/abs/2110.08996
作者:Kartik Sreenivasan,Shashank Rajput,Jy-yong Sohn,Dimitris Papailiopoulos 机构:University of Wisconsin-Madison 摘要:A recent work by Ramanujan et al. (2020) provides significant empirical evidence that sufficiently overparameterized, random neural networks contain untrained subnetworks that achieve state-of-the-art accuracy on several predictive tasks. A follow-up line of theoretical work provides justification of these findings by proving that slightly overparameterized neural networks, with commonly used continuous-valued random initializations can indeed be pruned to approximate any target network. In this work, we show that the amplitude of those random weights does not even matter. We prove that any target network can be approximated up to arbitrary accuracy by simply pruning a random network of binary $\{\pm1\}$ weights that is only a polylogarithmic factor wider and deeper than the target network.
【13】 Explaining generalization in deep learning: progress and fundamental limits 标题:解释深度学习中的泛化:进展与基本局限 链接:https://arxiv.org/abs/2110.08922
作者:Vaishnavh Nagarajan 机构:CMU-CS-,-, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA , Thesis Committee:, J. Zico Kolter, Chair, Andrej Risteski, Ameet Talwalkar, Nathan Srebro, Toyota Technological Institute at Chicago, Submitted in partial fulfillment of the requirements 备注:arXiv admin note: text overlap with arXiv:1902.04742 摘要:This dissertation studies a fundamental open challenge in deep learning theory: why do deep networks generalize well even while being overparameterized, unregularized and fitting the training data to zero error? In the first part of the thesis, we will empirically study how training deep networks via stochastic gradient descent implicitly controls the networks' capacity. Subsequently, to show how this leads to better generalization, we will derive {\em data-dependent} {\em uniform-convergence-based} generalization bounds with improved dependencies on the parameter count. Uniform convergence has in fact been the most widely used tool in deep learning literature, thanks to its simplicity and generality. Given its popularity, in this thesis, we will also take a step back to identify the fundamental limits of uniform convergence as a tool to explain generalization. In particular, we will show that in some example overparameterized settings, {\em any} uniform convergence bound will provide only a vacuous generalization bound. With this realization in mind, in the last part of the thesis, we will change course and introduce an {\em empirical} technique to estimate generalization using unlabeled data. Our technique does not rely on any notion of uniform-convergece-based complexity and is remarkably precise. We will theoretically show why our technique enjoys such precision. We will conclude by discussing how future work could explore novel ways to incorporate distributional assumptions in generalization bounds (such as in the form of unlabeled data) and explore other tools to derive bounds, perhaps by modifying uniform convergence or by developing completely new tools altogether.
【14】 Network Augmentation for Tiny Deep Learning 标题:用于微深度学习的网络增强 链接:https://arxiv.org/abs/2110.08890
作者:Han Cai,Chuang Gan,Ji Lin,Song Han 机构:Massachusetts Institute of Technology, MIT-IBM Watson AI Lab 摘要:We introduce Network Augmentation (NetAug), a new training method for improving the performance of tiny neural networks. Existing regularization techniques (e.g., data augmentation, dropout) have shown much success on large neural networks (e.g., ResNet50) by adding noise to overcome over-fitting. However, we found these techniques hurt the performance of tiny neural networks. We argue that training tiny models are different from large models: rather than augmenting the data, we should augment the model, since tiny models tend to suffer from under-fitting rather than over-fitting due to limited capacity. To alleviate this issue, NetAug augments the network (reverse dropout) instead of inserting noise into the dataset or the network. It puts the tiny model into larger models and encourages it to work as a sub-model of larger models to get extra supervision, in addition to functioning as an independent model. At test time, only the tiny model is used for inference, incurring zero inference overhead. We demonstrate the effectiveness of NetAug on image classification and object detection. NetAug consistently improves the performance of tiny models, achieving up to 2.1% accuracy improvement on ImageNet, and 4.3% on Cars. On Pascal VOC, NetAug provides 2.96% mAP improvement with the same computational cost.
【15】 Contrastive Learning of Visual-Semantic Embeddings 标题:视觉语义嵌入的对比学习 链接:https://arxiv.org/abs/2110.08872
作者:Anurag Jain,Yashaswi Verma 机构: Verma is with the Department of Computer Science and Engi-neering, Indian Institute of Technology 摘要:Contrastive learning is a powerful technique to learn representations that are semantically distinctive and geometrically invariant. While most of the earlier approaches have demonstrated its effectiveness on single-modality learning tasks such as image classification, recently there have been a few attempts towards extending this idea to multi-modal data. In this paper, we propose two loss functions based on normalized cross-entropy to perform the task of learning joint visual-semantic embedding using batch contrastive training. In a batch, for a given anchor point from one modality, we consider its negatives only from another modality, and define our first contrastive loss based on expected violations incurred by all the negatives. Next, we update this loss and define the second contrastive loss based on the violation incurred only by the hardest negative. We compare our results with existing visual-semantic embedding methods on cross-modal image-to-text and text-to-image retrieval tasks using the MS-COCO and Flickr30K datasets, where we outperform the state-of-the-art on the MS-COCO dataset and achieve comparable results on the Flickr30K dataset.
【16】 Online Continual Learning Via Candidates Voting 标题:通过考生投票实现在线持续学习 链接:https://arxiv.org/abs/2110.08855
作者:Jiangpeng He,Fengqing Zhu 机构:School of Electrical and Computer Engineering, Purdue University, West Lafayette, Indiana USA 备注:Accepted paper at Winter Conference on Applications of Computer Vision (WACV 2022) 摘要:Continual learning in online scenario aims to learn a sequence of new tasks from data stream using each data only once for training, which is more realistic than in offline mode assuming data from new task are all available. However, this problem is still under-explored for the challenging class-incremental setting in which the model classifies all classes seen so far during inference. Particularly, performance struggles with increased number of tasks or additional classes to learn for each task. In addition, most existing methods require storing original data as exemplars for knowledge replay, which may not be feasible for certain applications with limited memory budget or privacy concerns. In this work, we introduce an effective and memory-efficient method for online continual learning under class-incremental setting through candidates selection from each learned task together with prior incorporation using stored feature embeddings instead of original data as exemplars. Our proposed method implemented for image classification task achieves the best results under different benchmark datasets for online continual learning including CIFAR-10, CIFAR-100 and CORE-50 while requiring much less memory resource compared with existing works.
【17】 Compression-aware Projection with Greedy Dimension Reduction for Convolutional Neural Network Activations 标题:卷积神经网络激活的贪婪降维压缩感知投影 链接:https://arxiv.org/abs/2110.08828
作者:Yu-Shan Tai,Chieh-Fang Teng,Cheng-Yang Chang,An-Yeu Wu 机构:Graduate Institute of Electrical Engineering, National Taiwan University, Taipei, Taiwan 备注:5 pages, 5 figures, submitted to 2022 ICASSP 摘要:Convolutional neural networks (CNNs) achieve remarkable performance in a wide range of fields. However, intensive memory access of activations introduces considerable energy consumption, impeding deployment of CNNs on resourceconstrained edge devices. Existing works in activation compression propose to transform feature maps for higher compressibility, thus enabling dimension reduction. Nevertheless, in the case of aggressive dimension reduction, these methods lead to severe accuracy drop. To improve the trade-off between classification accuracy and compression ratio, we propose a compression-aware projection system, which employs a learnable projection to compensate for the reconstruction loss. In addition, a greedy selection metric is introduced to optimize the layer-wise compression ratio allocation by considering both accuracy and #bits reduction simultaneously. Our test results show that the proposed methods effectively reduce 2.91x~5.97x memory access with negligible accuracy drop on MobileNetV2/ResNet18/VGG16.
【18】 Exploring Deep Neural Networks on Edge TPU 标题:基于边缘TPU的深度神经网络研究 链接:https://arxiv.org/abs/2110.08826
作者:Seyedehfaezeh Hosseininoorbin,Siamak Layeghy,Brano Kusy,Raja Jurdak,Marius Portmann 机构:DATA, Commonwealth Scientific and Industrial Research Organisation (CSIRO), Australia ,School of, Information Technology and Electrical Engineering, The University of Queensland, Australia, Queensland University of Technology, Australia 备注:12 pages, 16 figures 摘要:This paper explores the performance of Google's Edge TPU on feed forward neural networks. We consider Edge TPU as a hardware platform and explore different architectures of deep neural network classifiers, which traditionally has been a challenge to run on resource constrained edge devices. Based on the use of a joint-time-frequency data representation, also known as spectrogram, we explore the trade-off between classification performance and the energy consumed for inference. The energy efficiency of Edge TPU is compared with that of widely-used embedded CPU ARM Cortex-A53. Our results quantify the impact of neural network architectural specifications on the Edge TPU's performance, guiding decisions on the TPU's optimal operating point, where it can provide high classification accuracy with minimal energy consumption. Also, our evaluations highlight the crossover in performance between the Edge TPU and Cortex-A53, depending on the neural network specifications. Based on our analysis, we provide a decision chart to guide decisions on platform selection based on the model parameters and context.
【19】 S-Cyc: A Learning Rate Schedule for Iterative Pruning of ReLU-based Networks 标题:S-Cyc:一种基于REU的网络迭代剪枝学习速率调度 链接:https://arxiv.org/abs/2110.08764
作者:Shiyu Liu,Chong Min John Tan,Mehul Motani 机构:Department of Electrical and Computer Engineering, School of Engineering, National University of Singapore 备注:7 pages main paper with 5 pages appendix 摘要:We explore a new perspective on adapting the learning rate (LR) schedule to improve the performance of the ReLU-based network as it is iteratively pruned. Our work and contribution consist of four parts: (i) We find that, as the ReLU-based network is iteratively pruned, the distribution of weight gradients tends to become narrower. This leads to the finding that as the network becomes more sparse, a larger value of LR should be used to train the pruned network. (ii) Motivated by this finding, we propose a novel LR schedule, called S-Cyclical (S-Cyc) which adapts the conventional cyclical LR schedule by gradually increasing the LR upper bound (max_lr) in an S-shape as the network is iteratively pruned.We highlight that S-Cyc is a method agnostic LR schedule that applies to many iterative pruning methods. (iii) We evaluate the performance of the proposed S-Cyc and compare it to four LR schedule benchmarks. Our experimental results on three state-of-the-art networks (e.g., VGG-19, ResNet-20, ResNet-50) and two popular datasets (e.g., CIFAR-10, ImageNet-200) demonstrate that S-Cyc consistently outperforms the best performing benchmark with an improvement of 2.1% - 3.4%, without substantial increase in complexity. (iv) We evaluate S-Cyc against an oracle and show that S-Cyc achieves comparable performance to the oracle, which carefully tunes max_lr via grid search.
【20】 A Riemannian Mean Field Formulation for Two-layer Neural Networks with Batch Normalization 标题:批次归一化双层神经网络的黎曼平均场公式 链接:https://arxiv.org/abs/2110.08725
作者:Chao Ma,Lexing Ying 机构:Department of Mathematics, Stanford University 摘要:The training dynamics of two-layer neural networks with batch normalization (BN) is studied. It is written as the training dynamics of a neural network without BN on a Riemannian manifold. Therefore, we identify BN's effect of changing the metric in the parameter space. Later, the infinite-width limit of the two-layer neural networks with BN is considered, and a mean-field formulation is derived for the training dynamics. The training dynamics of the mean-field formulation is shown to be the Wasserstein gradient flow on the manifold. Theoretical analysis are provided on the well-posedness and convergence of the Wasserstein gradient flow.
【21】 A Q-Learning-based Approach for Distributed Beam Scheduling in mmWave Networks 标题:一种基于Q学习的毫米波网络分布式波束调度方法 链接:https://arxiv.org/abs/2110.08704
作者:Xiang Zhang,Shamik Sarkar,Arupjyoti Bhuyan,Sneha Kumar Kasera,Mingyue Ji 机构:Department of Electrical and Computer Engineering, University of Utah∗, School of Computing, University of Utah †, Idaho National Laboratory‡ 备注:10 pages 摘要:We consider the problem of distributed downlink beam scheduling and power allocation for millimeter-Wave (mmWave) cellular networks where multiple base stations (BSs) belonging to different service operators share the same unlicensed spectrum with no central coordination or cooperation among them. Our goal is to design efficient distributed beam scheduling and power allocation algorithms such that the network-level payoff, defined as the weighted sum of the total throughput and a power penalization term, can be maximized. To this end, we propose a distributed scheduling approach to power allocation and adaptation for efficient interference management over the shared spectrum by modeling each BS as an independent Q-learning agent. As a baseline, we compare the proposed approach to the state-of-the-art non-cooperative game-based approach which was previously developed for the same problem. We conduct extensive experiments under various scenarios to verify the effect of multiple factors on the performance of both approaches. Experiment results show that the proposed approach adapts well to different interference situations by learning from experience and can achieve higher payoff than the game-based approach. The proposed approach can also be integrated into our previously developed Lyapunov stochastic optimization framework for the purpose of network utility maximization with optimality guarantee. As a result, the weights in the payoff function can be automatically and optimally determined by the virtual queue values from the sub-problems derived from the Lyapunov optimization framework.
【22】 A Learning-based Approach Towards Automated Tuning of SSD Configurations 标题:一种基于学习的固态硬盘配置自动调整方法 链接:https://arxiv.org/abs/2110.08685
作者:Daixuan Li,Jian Huang 机构:Systems Platform Research Group, University of Illinois at Urbana-Champaign 摘要:Thanks to the mature manufacturing techniques, solid-state drives (SSDs) are highly customizable for applications today, which brings opportunities to further improve their storage performance and resource utilization. However, the SSD efficiency is usually determined by many hardware parameters, making it hard for developers to manually tune them and determine the optimal SSD configurations. In this paper, we present an automated learning-based framework, named LearnedSSD, that utilizes both supervised and unsupervised machine learning (ML) techniques to drive the tuning of hardware configurations for SSDs. LearnedSSD automatically extracts the unique access patterns of a new workload using its block I/O traces, maps the workload to previously workloads for utilizing the learned experiences, and recommends an optimal SSD configuration based on the validated storage performance. LearnedSSD accelerates the development of new SSD devices by automating the hard-ware parameter configurations and reducing the manual efforts. We develop LearnedSSD with simple yet effective learning algorithms that can run efficiently on multi-core CPUs. Given a target storage workload, our evaluation shows that LearnedSSD can always deliver an optimal SSD configuration for the target workload, and this configuration will not hurt the performance of non-target workloads.
【23】 Towards Robust Waveform-Based Acoustic Models 标题:走向稳健的基于波形的声学模型 链接:https://arxiv.org/abs/2110.08634
作者:Dino Oglic,Zoran Cvetkovic,Peter Sollich,Steve Renals,Bin Yu 机构: Sollich is with the Department of Mathematics 摘要:We propose an approach for learning robust acoustic models in adverse environments, characterized by a significant mismatch between training and test conditions. This problem is of paramount importance for the deployment of speech recognition systems that need to perform well in unseen environments. Our approach is an instance of vicinal risk minimization, which aims to improve risk estimates during training by replacing the delta functions that define the empirical density over the input space with an approximation of the marginal population density in the vicinity of the training samples. More specifically, we assume that local neighborhoods centered at training samples can be approximated using a mixture of Gaussians, and demonstrate theoretically that this can incorporate robust inductive bias into the learning process. We characterize the individual mixture components implicitly via data augmentation schemes, designed to address common sources of spurious correlations in acoustic models. To avoid potential confounding effects on robustness due to information loss, which has been associated with standard feature extraction techniques (e.g., FBANK and MFCC features), we focus our evaluation on the waveform-based setting. Our empirical results show that the proposed approach can generalize to unseen noise conditions, with 150% relative improvement in out-of-distribution generalization compared to training using the standard risk minimization principle. Moreover, the results demonstrate competitive performance relative to models learned using a training sample designed to match the acoustic conditions characteristic of test utterances (i.e., optimal vicinal densities).
【24】 Hydra: A System for Large Multi-Model Deep Learning 标题:Hydra:一个大型多模型深度学习系统 链接:https://arxiv.org/abs/2110.08633
作者:Kabir Nagrecha,Arun Kumar 机构: This inter-shard orderingconstraint enforces an execution order that results in massive 1Department of Computer Science and Engineering, Universityof California San Diego 备注:12 pages including references. Under submission at Conference on Systems and Machine Learning Foundation 摘要:Training deep learning (DL) models that do not fit into the memory of a single GPU is a vexed process, forcing users to procure multiple GPUs to adopt model-parallel execution. Unfortunately, sequential dependencies in neural architectures often block efficient multi-device training, leading to suboptimal performance. We present 'model spilling', a technique aimed at models such as Transformers and CNNs to move groups of layers, or shards, between DRAM and GPU memory, thus enabling arbitrarily large models to be trained even on just one GPU. We then present a set of novel techniques leveraging spilling to raise efficiency for multi-model training workloads such as model selection: a new hybrid of task- and model-parallelism, a new shard scheduling heuristic, and 'double buffering' to hide latency. We prototype our ideas into a system we call HYDRA to support seamless single-model and multi-model training of large DL models. Experiments with real benchmark workloads show that HYDRA is over 7x faster than regular model parallelism and over 50% faster than state-of-the-art industrial tools for pipeline parallelism.
【25】 Learning velocity model for complex media with deep convolutional neural networks 标题:基于深卷积神经网络的复杂介质学习速度模型 链接:https://arxiv.org/abs/2110.08626
作者:A. Stankevich,I. Nechepurenko,A. Shevchenko,L. Gremyachikh,A. Ustyuzhanin,A. Vasyukov 机构:Vasyukov 1 1Moscow Institute of Physics and Technology, Russia 2HSE University 备注:14 pages, 6 figures, 6 tables 摘要:The paper considers the problem of velocity model acquisition for a complex media based on boundary measurements. The acoustic model is used to describe the media. We used an open-source dataset of velocity distributions to compare the presented results with the previous works directly. Forward modeling is performed using the grid-characteristic numerical method. The inverse problem is solved using deep convolutional neural networks. Modifications for a baseline UNet architecture are proposed to improve both structural similarity index measure quantitative correspondence of the velocity profiles with the ground truth. We evaluate our enhancements and demonstrate the statistical significance of the results.
【26】 BNAS v2: Learning Architectures for Binary Networks with Empirical Improvements 标题:BNAs v2:具有经验改进的二进制网络学习体系结构 链接:https://arxiv.org/abs/2110.08562
作者:Dahyun Kim,Kunal Pratap Singh,Jonghyun Choi 机构:GIST, South Korea, Allen Institute for AI 备注:arXiv admin note: text overlap with arXiv:2002.06963 摘要:Backbone architectures of most binary networks are well-known floating point (FP) architectures such as the ResNet family. Questioning that the architectures designed for FP networks might not be the best for binary networks, we propose to search architectures for binary networks (BNAS) by defining a new search space for binary architectures and a novel search objective. Specifically, based on the cell based search method, we define the new search space of binary layer types, design a new cell template, and rediscover the utility of and propose to use the Zeroise layer instead of using it as a placeholder. The novel search objective diversifies early search to learn better performing binary architectures. We show that our method searches architectures with stable training curves despite the quantization error inherent in binary networks. Quantitative analyses demonstrate that our searched architectures outperform the architectures used in state-of-the-art binary networks and outperform or perform on par with state-of-the-art binary networks that employ various techniques other than architectural changes. In addition, we further propose improvements to the training scheme of our searched architectures. With the new training scheme for our searched architectures, we achieve the state-of-the-art performance by binary networks by outperforming all previous methods by non-trivial margins.
【27】 Sharpness-Aware Minimization Improves Language Model Generalization 标题:清晰度感知最小化改进了语言模型泛化 链接:https://arxiv.org/abs/2110.08529
作者:Dara Bahri,Hossein Mobahi,Yi Tay 机构:Google Research, Mountain View, CA, USA 摘要:The allure of superhuman-level capabilities has led to considerable interest in language models like GPT-3 and T5, wherein the research has, by and large, revolved around new model architectures, training tasks, and loss objectives, along with substantial engineering efforts to scale up model capacity and dataset size. Comparatively little work has been done to improve the generalization of these models through better optimization. In this work, we show that Sharpness-Aware Minimization (SAM), a recently proposed optimization procedure that encourages convergence to flatter minima, can substantially improve the generalization of language models without much computational overhead. We show that SAM is able to boost performance on SuperGLUE, GLUE, Web Questions, Natural Questions, Trivia QA, and TyDiQA, with particularly large gains when training data for these tasks is limited.
【28】 An Empirical Survey of the Effectiveness of Debiasing Techniques for Pre-Trained Language Models 标题:预训练语言模型去偏技术有效性的实证研究 链接:https://arxiv.org/abs/2110.08527
作者:Nicholas Meade,Elinor Poole-Dayan,Siva Reddy 机构:MilaMcGill University, Facebook CIFAR AI Chair 摘要:Recent work has shown that pre-trained language models capture social biases from the text corpora they are trained on. This has attracted attention to developing techniques that mitigate such biases. In this work, we perform a empirical survey of five recently proposed debiasing techniques: Counterfactual Data Augmentation (CDA), Dropout, Iterative Nullspace Projection, Self-Debias, and SentenceDebias. We quantify the effectiveness of each technique using three different bias benchmarks while also measuring the impact of these techniques on a model's language modeling ability, as well as its performance on downstream NLU tasks. We experimentally find that: (1) CDA and Self-Debias are the strongest of the debiasing techniques, obtaining improved scores on most of the bias benchmarks (2) Current debiasing techniques do not generalize well beyond gender bias; And (3) improvements on bias benchmarks such as StereoSet and CrowS-Pairs by using debiasing strategies are usually accompanied by a decrease in language modeling ability, making it difficult to determine whether the bias mitigation is effective.
【29】 What do Compressed Large Language Models Forget? Robustness Challenges in Model Compression 标题:压缩的大型语言模型会忘记什么?模型压缩中的健壮性挑战 链接:https://arxiv.org/abs/2110.08419
作者:Mengnan Du,Subhabrata Mukherjee,Yu Cheng,Milad Shokouhi,Xia Hu,Ahmed Hassan Awadallah 机构:Texas A&M University ,Microsoft Research, Rice University 摘要:Recent works have focused on compressing pre-trained language models (PLMs) like BERT where the major focus has been to improve the compressed model performance for downstream tasks. However, there has been no study in analyzing the impact of compression on the generalizability and robustness of these models. Towards this end, we study two popular model compression techniques including knowledge distillation and pruning and show that compressed models are significantly less robust than their PLM counterparts on adversarial test sets although they obtain similar performance on in-distribution development sets for a task. Further analysis indicates that the compressed models overfit on the easy samples and generalize poorly on the hard ones. We further leverage this observation to develop a regularization strategy for model compression based on sample uncertainty. Experimental results on several natural language understanding tasks demonstrate our mitigation framework to improve both the adversarial generalization as well as in-distribution task performance of the compressed models.
【30】 Invariant Language Modeling 标题:不变量语言建模 链接:https://arxiv.org/abs/2110.08413
作者:Maxime Peyrard,Sarvjeet Singh Ghotra,Martin Josifoski,Vidhan Agarwal,Barun Patra,Dean Carignan,Emre Kiciman,Robert West 机构:♢EPFL, ♠Microsoft Corporation 摘要:Modern pretrained language models are critical components of NLP pipelines. Yet, they suffer from spurious correlations, poor out-of-domain generalization, and biases. Inspired by recent progress in causal machine learning, in particular the invariant risk minimization (IRM) paradigm, we propose invariant language modeling, a framework for learning invariant representations that generalize better across multiple environments. In particular, we adapt a game-theoretic implementation of IRM (IRM-games) to language models, where the invariance emerges from a specific training schedule in which all the environments compete to optimize their own environment-specific loss by updating subsets of the model in a round-robin fashion. In a series of controlled experiments, we demonstrate the ability of our method to (i) remove structured noise, (ii) ignore specific spurious correlations without affecting global performance, and (iii) achieve better out-of-domain generalization. These benefits come with a negligible computational overhead compared to standard training, do not require changing the local loss, and can be applied to any language model architecture. We believe this framework is promising to help mitigate spurious correlations and biases in language models.
【31】 Surrogate- and invariance-boosted contrastive learning for data-scarce applications in science 标题:用于科学中数据稀缺应用的代理和不变性增强的对比学习 链接:https://arxiv.org/abs/2110.08406
作者:Charlotte Loh,Thomas Christensen,Rumen Dangovski,Samuel Kim,Marin Soljacic 机构:Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA, Department of Physics, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA 备注:21 pages, 10 figures 摘要:Deep learning techniques have been increasingly applied to the natural sciences, e.g., for property prediction and optimization or material discovery. A fundamental ingredient of such approaches is the vast quantity of labelled data needed to train the model; this poses severe challenges in data-scarce settings where obtaining labels requires substantial computational or labor resources. Here, we introduce surrogate- and invariance-boosted contrastive learning (SIB-CL), a deep learning framework which incorporates three ``inexpensive'' and easily obtainable auxiliary information sources to overcome data scarcity. Specifically, these are: 1)~abundant unlabeled data, 2)~prior knowledge of symmetries or invariances and 3)~surrogate data obtained at near-zero cost. We demonstrate SIB-CL's effectiveness and generality on various scientific problems, e.g., predicting the density-of-states of 2D photonic crystals and solving the 3D time-independent Schrodinger equation. SIB-CL consistently results in orders of magnitude reduction in the number of labels needed to achieve the same network accuracies.
【32】 Differentiable Network Pruning for Microcontrollers 标题:微控制器的可微网络剪枝 链接:https://arxiv.org/abs/2110.08350
作者:Edgar Liberis,Nicholas D. Lane 机构: 1Department of Computer Science and Technology, Universityof Cambridge 摘要:Embedded and personal IoT devices are powered by microcontroller units (MCUs), whose extreme resource scarcity is a major obstacle for applications relying on on-device deep learning inference. Orders of magnitude less storage, memory and computational capacity, compared to what is typically required to execute neural networks, impose strict structural constraints on the network architecture and call for specialist model compression methodology. In this work, we present a differentiable structured network pruning method for convolutional neural networks, which integrates a model's MCU-specific resource usage and parameter importance feedback to obtain highly compressed yet accurate classification models. Our methodology (a) improves key resource usage of models up to 80x; (b) prunes iteratively while a model is trained, resulting in little to no overhead or even improved training time; (c) produces compressed models with matching or improved resource usage up to 1.7x in less time compared to prior MCU-specific methods. Compressed models are available for download.
【33】 Exploratory Lagrangian-Based Particle Tracing Using Deep Learning 标题:基于深度学习的探索性拉格朗日粒子跟踪 链接:https://arxiv.org/abs/2110.08338
作者:Mengjiao Han,Sudhanshu Sane,Chris R. Johnson 机构:• Sudhanshu Sane is with Scientific Computing and Imaging Institute, University of Utah 摘要:Time-varying vector fields produced by computational fluid dynamics simulations are often prohibitively large and pose challenges for accurate interactive analysis and exploration. To address these challenges, reduced Lagrangian representations have been increasingly researched as a means to improve scientific time-varying vector field exploration capabilities. This paper presents a novel deep neural network-based particle tracing method to explore time-varying vector fields represented by Lagrangian flow maps. In our workflow, in situ processing is first utilized to extract Lagrangian flow maps, and deep neural networks then use the extracted data to learn flow field behavior. Using a trained model to predict new particle trajectories offers a fixed small memory footprint and fast inference. To demonstrate and evaluate the proposed method, we perform an in-depth study of performance using a well-known analytical data set, the Double Gyre. Our study considers two flow map extraction strategies as well as the impact of the number of training samples and integration durations on efficacy, evaluates multiple sampling options for training and testing and informs hyperparameter settings. Overall, we find our method requires a fixed memory footprint of 10.5 MB to encode a Lagrangian representation of a time-varying vector field while maintaining accuracy. For post hoc analysis, loading the trained model costs only two seconds, significantly reducing the burden of I/O when reading data for visualization. Moreover, our parallel implementation can infer one hundred locations for each of two thousand new pathlines across the entire temporal resolution in 1.3 seconds using one NVIDIA Titan RTX GPU.
【34】 Solving Image PDEs with a Shallow Network 标题:用浅层网络求解图像偏微分方程 链接:https://arxiv.org/abs/2110.08327
作者:Pascal Tom Getreuer,Peyman Milanfar,Xiyang Luo 备注:21 pages, 22 figures, references arXiv:1802.06130, arXiv:1711.10700, arXiv:1606.01299 摘要:Partial differential equations (PDEs) are typically used as models of physical processes but are also of great interest in PDE-based image processing. However, when it comes to their use in imaging, conventional numerical methods for solving PDEs tend to require very fine grid resolution for stability, and as a result have impractically high computational cost. This work applies BLADE (Best Linear Adaptive Enhancement), a shallow learnable filtering framework, to PDE solving, and shows that the resulting approach is efficient and accurate, operating more reliably at coarse grid resolutions than classical methods. As such, the model can be flexibly used for a wide variety of problems in imaging.
【35】 Robustness of different loss functions and their impact on networks learning capability 标题:不同损失函数的鲁棒性及其对网络学习能力的影响 链接:https://arxiv.org/abs/2110.08322
作者:Vishal Rajput 机构:Computer Science Department, KU Leuven, Belgium 摘要:Recent developments in AI have made it ubiquitous, every industry is trying to adopt some form of intelligent processing of their data. Despite so many advances in the field, AIs full capability is yet to be exploited by the industry. Industries that involve some risk factors still remain cautious about the usage of AI due to the lack of trust in such autonomous systems. Present-day AI might be very good in a lot of things but it is very bad in reasoning and this behavior of AI can lead to catastrophic results. Autonomous cars crashing into a person or a drone getting stuck in a tree are a few examples where AI decisions lead to catastrophic results. To develop insight and generate an explanation about the learning capability of AI, we will try to analyze the working of loss functions. For our case, we will use two sets of loss functions, generalized loss functions like Binary cross-entropy or BCE and specialized loss functions like Dice loss or focal loss. Through a series of experiments, we will establish whether combining different loss functions is better than using a single loss function and if yes, then what is the reason behind it. In order to establish the difference between generalized loss and specialized losses, we will train several models using the above-mentioned losses and then compare their robustness on adversarial examples. In particular, we will look at how fast the accuracy of different models decreases when we change the pixels corresponding to the most salient gradients.
【36】 Reduced Order Dynamical Models For Complex Dynamics in Manufacturing and Natural Systems Using Machine Learning 标题:基于机器学习的制造和自然系统复杂动力学降阶模型 链接:https://arxiv.org/abs/2110.08313
作者:William Farlessyost,Shweta Singh 机构:Agricultural & Biological Engineering, Purdue University, West Lafayette, IN, USA, Environmental & Ecological Engineering 备注:16 pages, 11 figures 摘要:Dynamical analysis of manufacturing and natural systems provides critical information about production of manufactured and natural resources respectively, thus playing an important role in assessing sustainability of these systems. However, current dynamic models for these systems exist as mechanistic models, simulation of which is computationally intensive and does not provide a simplified understanding of the mechanisms driving the overall dynamics. For such systems, lower-order models can prove useful to enable sustainability analysis through coupled dynamical analysis. There have been few attempts at finding low-order models of manufacturing and natural systems, with existing work focused on model development of individual mechanism level. This work seeks to fill this current gap in the literature of developing simplified dynamical models for these systems by developing reduced-order models using a machine learning (ML) approach. The approach is demonstrated on an entire soybean-oil to soybean-diesel process plant and a lake system. We use a grey-box ML method with a standard nonlinear optimization approach to identify relevant models of governing dynamics as ODEs using the data simulated from mechanistic models. Results show that the method identifies a high accuracy linear ODE models for the process plant, reflective of underlying linear stoichiometric mechanisms and mass balance driving the dynamics. For the natural systems, we modify the ML approach to include the effect of past dynamics, which gives non-linear ODE. While the modified approach provides a better match to dynamics of stream flow, it falls short of completely recreating the dynamics. We conclude that the proposed ML approach work well for systems where dynamics is smooth, such as in manufacturing plant whereas does not work perfectly well in case of chaotic dynamics such as water stream flow.
【37】 GrowSpace: Learning How to Shape Plants 标题:GrowSpace:学习如何塑造植物 链接:https://arxiv.org/abs/2110.08307
作者:Yasmeen Hitti,Ionelia Buzatu,Manuel Del Verme,Mark Lefsrud,Florian Golemo,Audrey Durand 机构:McGill University, Mila, Johannes Kepler Universität Linz, Université de Montréal, Mila, Element AI, Université Laval, Mila 摘要:Plants are dynamic systems that are integral to our existence and survival. Plants face environment changes and adapt over time to their surrounding conditions. We argue that plant responses to an environmental stimulus are a good example of a real-world problem that can be approached within a reinforcement learning (RL)framework. With the objective of controlling a plant by moving the light source, we propose GrowSpace, as a new RL benchmark. The back-end of the simulator is implemented using the Space Colonisation Algorithm, a plant growing model based on competition for space. Compared to video game RL environments, this simulator addresses a real-world problem and serves as a test bed to visualize plant growth and movement in a faster way than physical experiments. GrowSpace is composed of a suite of challenges that tackle several problems such as control, multi-stage learning,fairness and multi-objective learning. We provide agent baselines alongside case studies to demonstrate the difficulty of the proposed benchmark.
【38】 Training Deep Neural Networks with Joint Quantization and Pruning of Weights and Activations 标题:基于联合量化和权值修剪的深度神经网络训练 链接:https://arxiv.org/abs/2110.08271
作者:Xinyu Zhang,Ian Colbert,Ken Kreutz-Delgado,Srinjoy Das 机构: 1Department of Electrical and Computer Engineering, 2School of Mathematical and Data Sciences, West VirginiaUniversity 摘要:Quantization and pruning are core techniques used to reduce the inference costs of deep neural networks. State-of-the-art quantization techniques are currently applied to both the weights and activations; however, pruning is most often applied to only the weights of the network. In this work, we jointly apply novel uniform quantization and unstructured pruning methods to both the weights and activations of deep neural networks during training. Using our methods, we empirically evaluate the currently accepted prune-then-quantize paradigm across a wide range of computer vision tasks and observe a non-commutative nature when applied to both the weights and activations of deep neural networks. Informed by these observations, we articulate the non-commutativity hypothesis: for a given deep neural network being trained for a specific task, there exists an exact training schedule in which quantization and pruning can be introduced to optimize network performance. We identify that this optimal ordering not only exists, but also varies across discriminative and generative tasks. Using the optimal training schedule within our training framework, we demonstrate increased performance per memory footprint over existing solutions.
【39】 Effective Certification of Monotone Deep Equilibrium Models 标题:单调深度平衡模型的有效证明 链接:https://arxiv.org/abs/2110.08260
作者:Mark Niklas Müller,Robin Staab,Marc Fischer,Martin Vechev 机构:Department of Computer Science, ETH Zurich, Switzerland 摘要:Monotone Operator Equilibrium Models (monDEQs) represent a class of models combining the powerful deep equilibrium paradigm with convergence guarantees. Further, their inherent robustness to adversarial perturbations makes investigating their certifiability a promising research direction. Unfortunately, existing approaches are either imprecise or severely limited in scalability. In this work, we propose the first scalable and precise monDEQ verifier, based on two key ideas: (i) a novel convex relaxation enabling efficient inclusion checks, and (ii) non-trivial mathematical insights characterizing the fixpoint operations at the heart of monDEQs on sets rather than concrete inputs. An extensive evaluation of our verifier on the challenging $\ell_\infty$ perturbations demonstrates that it exceeds state-of-the-art performance in terms of speed (two orders of magnitude) and scalability (an order of magnitude) while yielding 25% higher certified accuracies on the same networks.
【40】 A Field Guide to Scientific XAI: Transparent and Interpretable Deep Learning for Bioinformatics Research 标题:科学XAI实地调查指南:生物信息学研究的透明和可解释的深度学习 链接:https://arxiv.org/abs/2110.08253
作者:Thomas P Quinn,Sunil Gupta,Svetha Venkatesh,Vuong Le 机构:Applied Artificial Intelligence Institute (A,I,), Deakin University, Geelong, Australia 摘要:Deep learning has become popular because of its potential to achieve high accuracy in prediction tasks. However, accuracy is not always the only goal of statistical modelling, especially for models developed as part of scientific research. Rather, many scientific models are developed to facilitate scientific discovery, by which we mean to abstract a human-understandable representation of the natural world. Unfortunately, the opacity of deep neural networks limit their role in scientific discovery, creating a new demand for models that are transparently interpretable. This article is a field guide to transparent model design. It provides a taxonomy of transparent model design concepts, a practical workflow for putting design concepts into practice, and a general template for reporting design choices. We hope this field guide will help researchers more effectively design transparently interpretable models, and thus enable them to use deep learning for scientific discovery.
【41】 A Rate-Distortion Framework for Explaining Black-box Model Decisions 标题:解释黑盒模型决策的率失真框架 链接:https://arxiv.org/abs/2110.08252
作者:Stefan Kolek,Duc Anh Nguyen,Ron Levie,Joan Bruna,Gitta Kutyniok 机构:Department of Mathematics, Ludwig Maximilian University, Munich, Courant Institute of Mathematical Sciences, New York University, New York 摘要:We present the Rate-Distortion Explanation (RDE) framework, a mathematically well-founded method for explaining black-box model decisions. The framework is based on perturbations of the target input signal and applies to any differentiable pre-trained model such as neural networks. Our experiments demonstrate the framework's adaptability to diverse data modalities, particularly images, audio, and physical simulations of urban environments.
【42】 A Bayesian approach to multi-task learning with network lasso 标题:基于网络套索的贝叶斯多任务学习方法 链接:https://arxiv.org/abs/2110.09040
作者:Kaito Shimamura,Shuichi Kawano 摘要:Network lasso is a method for solving a multi-task learning problem through the regularized maximum likelihood method. A characteristic of network lasso is setting a different model for each sample. The relationships among the models are represented by relational coefficients. A crucial issue in network lasso is to provide appropriate values for these relational coefficients. In this paper, we propose a Bayesian approach to solve multi-task learning problems by network lasso. This approach allows us to objectively determine the relational coefficients by Bayesian estimation. The effectiveness of the proposed method is shown in a simulation study and a real data analysis.
【43】 Gravitational wave surrogates through automated machine learning 标题:通过自动机器学习实现引力波代理 链接:https://arxiv.org/abs/2110.08901
作者:Damián Barsotti,Franco Cerino,Manuel Tiglio,Aarón Villanueva 机构:Facultad de Matem´atica, Astronom´ıa, F´ısica y Computaci´on, Universidad Nacional de C´ordoba, (,), C´ordoba, Argentina 备注:15 pages, 7 figures 摘要:We analyze a prospect for predicting gravitational waveforms from compact binaries based on automated machine learning (AutoML) from around a hundred different possible regression models, without having to resort to tedious and manual case-by-case analyses and fine-tuning. The particular study of this article is within the context of the gravitational waves emitted by the collision of two spinless black holes in initial quasi-circular orbit. We find, for example, that approaches such as Gaussian process regression with radial bases as kernels do provide a sufficiently accurate solution, an approach which is generalizable to multiple dimensions with low computational evaluation cost. The results here presented suggest that AutoML might provide a framework for regression in the field of surrogates for gravitational waveforms. Our study is within the context of surrogates of numerical relativity simulations based on Reduced Basis and the Empirical Interpolation Method, where we find that for the particular case analyzed AutoML can produce surrogates which are essentially indistinguishable from the NR simulations themselves.
【44】 ASR4REAL: An extended benchmark for speech models 标题:ASR4REAL:一种扩展的语音模型基准 链接:https://arxiv.org/abs/2110.08583
作者:Morgane Riviere,Jade Copet,Gabriel Synnaeve 机构:†Facebook AI Research 备注:Submitted to ICASSP 2022 摘要:Popular ASR benchmarks such as Librispeech and Switchboard are limited in the diversity of settings and speakers they represent. We introduce a set of benchmarks matching real-life conditions, aimed at spotting possible biases and weaknesses in models. We have found out that even though recent models do not seem to exhibit a gender bias, they usually show important performance discrepancies by accent, and even more important ones depending on the socio-economic status of the speakers. Finally, all tested models show a strong performance drop when tested on conversational speech, and in this precise context even a language model trained on a dataset as big as Common Crawl does not seem to have significant positive effect which reiterates the importance of developing conversational language models
【45】 Dropping diversity of products of large US firms: Models and measures 标题:降低美国大公司产品多样性的模式与措施 链接:https://arxiv.org/abs/2110.08367
作者:Ananthan Nambiar,Tobias Rubel,James McCaull,Jon deVries,Mark Bedau 机构: Department of Bioengineering, University of Illinois at Urbana-Champaign, Urbana, Carl R. Woese Institute for Genomic Biology, University of Illinois at, Urbana-Champaign, Urbana, Illinois, USA, Department of Philosophy, Reed College, Portland, Oregon, USA 摘要:It is widely assumed that in our lifetimes the products available in the global economy have become more diverse. This assumption is difficult to investigate directly, however, because it is difficult to collect the necessary data about every product in an economy each year. We solve this problem by mining publicly available textual descriptions of the products of every large US firms each year from 1997 to 2017. Although many aspects of economic productivity have been steadily rising during this period, our text-based measurements show that the diversity of the products of at least large US firms has steadily declined. This downward trend is visible using a variety of product diversity metrics, including some that depend on a measurement of the similarity of the products of every single pair of firms. The current state of the art in comprehensive and detailed firm-similarity measurements is a Boolean word vector model due to Hoberg and Phillips. We measure diversity using firm-similarities from this Boolean model and two more sophisticated variants, and we consistently observe a significant dropping trend in product diversity. These results make it possible to frame and start to test specific hypotheses for explaining the dropping product diversity trend.
其他(45篇)
【1】 Improving Robustness using Generated Data 标题:使用生成的数据提高稳健性 链接:https://arxiv.org/abs/2110.09468
作者:Sven Gowal,Sylvestre-Alvise Rebuffi,Olivia Wiles,Florian Stimberg,Dan Andrei Calian,Timothy Mann 机构:DeepMind, London 备注:Accepted at NeurIPS 2021 摘要:Recent work argues that robust training requires substantially larger datasets than those required for standard classification. On CIFAR-10 and CIFAR-100, this translates into a sizable robust-accuracy gap between models trained solely on data from the original training set and those trained with additional data extracted from the "80 Million Tiny Images" dataset (TI-80M). In this paper, we explore how generative models trained solely on the original training set can be leveraged to artificially increase the size of the original training set and improve adversarial robustness to $\ell_p$ norm-bounded perturbations. We identify the sufficient conditions under which incorporating additional generated data can improve robustness, and demonstrate that it is possible to significantly reduce the robust-accuracy gap to models trained with additional real data. Surprisingly, we even show that even the addition of non-realistic random data (generated by Gaussian sampling) can improve robustness. We evaluate our approach on CIFAR-10, CIFAR-100, SVHN and TinyImageNet against $\ell_\infty$ and $\ell_2$ norm-bounded perturbations of size $\epsilon = 8/255$ and $\epsilon = 128/255$, respectively. We show large absolute improvements in robust accuracy compared to previous state-of-the-art methods. Against $\ell_\infty$ norm-bounded perturbations of size $\epsilon = 8/255$, our models achieve 66.10% and 33.49% robust accuracy on CIFAR-10 and CIFAR-100, respectively (improving upon the state-of-the-art by +8.96% and +3.29%). Against $\ell_2$ norm-bounded perturbations of size $\epsilon = 128/255$, our model achieves 78.31% on CIFAR-10 (+3.81%). These results beat most prior works that use external data.
【2】 In a Nutshell, the Human Asked for This: Latent Goals for Following Temporal Specifications 标题:简而言之,人类提出了这样的要求:以下时间规范的潜在目标 链接:https://arxiv.org/abs/2110.09461
作者:Borja G. León,Murray Shanahan,Francesco Belardinelli 机构:Department of Computing, Imperial College London, London, United Kingdom 摘要:We address the problem of building agents whose goal is to satisfy out-of distribution (OOD) multi-task instructions expressed in temporal logic (TL) by using deep reinforcement learning (DRL). Recent works provided evidence that the deep learning architecture is a key feature when teaching a DRL agent to solve OOD tasks in TL. Yet, the studies on their performance are still limited. In this work, we analyse various state-of-the-art (SOTA) architectures that include generalisation mechanisms such as relational layers, the soft-attention mechanism, or hierarchical configurations, when generalising safety-aware tasks expressed in TL. Most importantly, we present a novel deep learning architecture that induces agents to generate latent representations of their current goal given both the human instruction and the current observation from the environment. We find that applying our proposed configuration to SOTA architectures yields significantly stronger performance when executing new tasks in OOD environments.
【3】 Comparing Deep Neural Nets with UMAP Tour 标题:深度神经网络与UMAP Tour的比较 链接:https://arxiv.org/abs/2110.09431
作者:Mingwei Li,Carlos Scheidegger 机构:Department of Computer Science, The University of Arizona, Tucson, AZ 摘要:Neural networks should be interpretable to humans. In particular, there is a growing interest in concepts learned in a layer and similarity between layers. In this work, a tool, UMAP Tour, is built to visually inspect and compare internal behavior of real-world neural network models using well-aligned, instance-level representations. The method used in the visualization also implies a new similarity measure between neural network layers. Using the visual tool and the similarity measure, we find concepts learned in state-of-the-art models and dissimilarities between them, such as GoogLeNet and ResNet.
【4】 Distinguishing Natural and Computer-Generated Images using Multi-Colorspace fused EfficientNet 标题:利用多色空间融合高效网区分自然图像和计算机生成图像 链接:https://arxiv.org/abs/2110.09428
作者:Manjary P Gangan,Anoop K,Lajish V L 备注:13 pages 摘要:The problem of distinguishing natural images from photo-realistic computer-generated ones either addresses natural images versus computer graphics or natural images versus GAN images, at a time. But in a real-world image forensic scenario, it is highly essential to consider all categories of image generation, since in most cases image generation is unknown. We, for the first time, to our best knowledge, approach the problem of distinguishing natural images from photo-realistic computer-generated images as a three-class classification task classifying natural, computer graphics, and GAN images. For the task, we propose a Multi-Colorspace fused EfficientNet model by parallelly fusing three EfficientNet networks that follow transfer learning methodology where each network operates in different colorspaces, RGB, LCH, and HSV, chosen after analyzing the efficacy of various colorspace transformations in this image forensics problem. Our model outperforms the baselines in terms of accuracy, robustness towards post-processing, and generalizability towards other datasets. We conduct psychophysics experiments to understand how accurately humans can distinguish natural, computer graphics, and GAN images where we could observe that humans find difficulty in classifying these images, particularly the computer-generated images, indicating the necessity of computational algorithms for the task. We also analyze the behavior of our model through visual explanations to understand salient regions that contribute to the model's decision making and compare with manual explanations provided by human participants in the form of region markings, where we could observe similarities in both the explanations indicating the powerful nature of our model to take the decisions meaningfully.
【5】 Compositional Attention: Disentangling Search and Retrieval 标题:作文注意:拆解、搜索和检索 链接:https://arxiv.org/abs/2110.09419
作者:Sarthak Mittal,Sharath Chandra Raparthy,Irina Rish,Yoshua Bengio,Guillaume Lajoie 机构:Mila, Universit´e de Montr´eal 摘要:Multi-head, key-value attention is the backbone of the widely successful Transformer model and its variants. This attention mechanism uses multiple parallel key-value attention blocks (called heads), each performing two fundamental computations: (1) search - selection of a relevant entity from a set via query-key interactions, and (2) retrieval - extraction of relevant features from the selected entity via a value matrix. Importantly, standard attention heads learn a rigid mapping between search and retrieval. In this work, we first highlight how this static nature of the pairing can potentially: (a) lead to learning of redundant parameters in certain tasks, and (b) hinder generalization. To alleviate this problem, we propose a novel attention mechanism, called Compositional Attention, that replaces the standard head structure. The proposed mechanism disentangles search and retrieval and composes them in a dynamic, flexible and context-dependent manner through an additional soft competition stage between the query-key combination and value pairing. Through a series of numerical experiments, we show that it outperforms standard multi-head attention on a variety of tasks, including some out-of-distribution settings. Through our qualitative analysis, we demonstrate that Compositional Attention leads to dynamic specialization based on the type of retrieval needed. Our proposed mechanism generalizes multi-head attention, allows independent scaling of search and retrieval, and can easily be implemented in lieu of standard attention heads in any network architecture.
【6】 Exploiting Domain-Specific Features to Enhance Domain Generalization 标题:利用领域特性增强领域泛化能力 链接:https://arxiv.org/abs/2110.09410
作者:Manh-Ha Bui,Toan Tran,Anh Tuan Tran,Dinh Phung 机构: VinAI Research, Vietnam, Monash University, Australia 备注:25 pages, 6 tables, 11 figures, published at Advances in Neural Information Processing Systems (NeurIPS), 2021 摘要:Domain Generalization (DG) aims to train a model, from multiple observed source domains, in order to perform well on unseen target domains. To obtain the generalization capability, prior DG approaches have focused on extracting domain-invariant information across sources to generalize on target domains, while useful domain-specific information which strongly correlates with labels in individual domains and the generalization to target domains is usually ignored. In this paper, we propose meta-Domain Specific-Domain Invariant (mDSDI) - a novel theoretically sound framework that extends beyond the invariance view to further capture the usefulness of domain-specific information. Our key insight is to disentangle features in the latent space while jointly learning both domain-invariant and domain-specific features in a unified framework. The domain-specific representation is optimized through the meta-learning framework to adapt from source domains, targeting a robust generalization on unseen domains. We empirically show that mDSDI provides competitive results with state-of-the-art techniques in DG. A further ablation study with our generated dataset, Background-Colored-MNIST, confirms the hypothesis that domain-specific is essential, leading to better results when compared with only using domain-invariant.
【7】 Result Diversification by Multi-objective Evolutionary Algorithms with Theoretical Guarantees 标题:具有理论保障的多目标进化算法结果多样化 链接:https://arxiv.org/abs/2110.09332
作者:Chao Qian,Dan-Xuan Liu,Zhi-Hua Zhou 机构:State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing , China 备注:46 pages, 2 figures 摘要:Given a ground set of items, the result diversification problem aims to select a subset with high "quality" and "diversity" while satisfying some constraints. It arises in various real-world artificial intelligence applications, such as web-based search, document summarization and feature selection, and also has applications in other areas, e.g., computational geometry, databases, finance and operations research. Previous algorithms are mainly based on greedy or local search. In this paper, we propose to reformulate the result diversification problem as a bi-objective maximization problem, and solve it by a multi-objective evolutionary algorithm (EA), i.e., the GSEMO. We theoretically prove that the GSEMO can achieve the (asymptotically) optimal theoretical guarantees under both static and dynamic environments. For cardinality constraints, the GSEMO can achieve the optimal polynomial-time approximation ratio, $1/2$. For more general matroid constraints, the GSEMO can achieve the asymptotically optimal polynomial-time approximation ratio, $1/2-\epsilon/(4n)$. Furthermore, when the objective function (i.e., a linear combination of quality and diversity) changes dynamically, the GSEMO can maintain this approximation ratio in polynomial running time, addressing the open question proposed by Borodin et al. This also theoretically shows the superiority of EAs over local search for solving dynamic optimization problems for the first time, and discloses the robustness of the mutation operator of EAs against dynamic changes. Experiments on the applications of web-based search, multi-label feature selection and document summarization show the superior performance of the GSEMO over the state-of-the-art algorithms (i.e., the greedy algorithm and local search) under both static and dynamic environments.
【8】 A Sociotechnical View of Algorithmic Fairness 标题:算法公平性的社会技术观 链接:https://arxiv.org/abs/2110.09253
作者:Mateusz Dolata,Stefan Feuerriegel,Gerhard Schwabe 机构:Please, refer to the journal publication:, Dolata, M., Feuerriegel, S., & Schwabe, G. (to appear) A Sociotechnical View of Algorithmic, Fairness. Information Systems Journal. 备注:Accepted at Information Systems Journal 摘要:Algorithmic fairness has been framed as a newly emerging technology that mitigates systemic discrimination in automated decision-making, providing opportunities to improve fairness in information systems (IS). However, based on a state-of-the-art literature review, we argue that fairness is an inherently social concept and that technologies for algorithmic fairness should therefore be approached through a sociotechnical lens. We advance the discourse on algorithmic fairness as a sociotechnical phenomenon. Our research objective is to embed AF in the sociotechnical view of IS. Specifically, we elaborate on why outcomes of a system that uses algorithmic means to assure fairness depends on mutual influences between technical and social structures. This perspective can generate new insights that integrate knowledge from both technical fields and social studies. Further, it spurs new directions for IS debates. We contribute as follows: First, we problematize fundamental assumptions in the current discourse on algorithmic fairness based on a systematic analysis of 310 articles. Second, we respond to these assumptions by theorizing algorithmic fairness as a sociotechnical construct. Third, we propose directions for IS researchers to enhance their impacts by pursuing a unique understanding of sociotechnical algorithmic fairness. We call for and undertake a holistic approach to AF. A sociotechnical perspective on algorithmic fairness can yield holistic solutions to systemic biases and discrimination.
【9】 MDP Abstraction with Successor Features 标题:具有后续功能的MDP抽象 链接:https://arxiv.org/abs/2110.09196
作者:Dongge Han,Michael Wooldridge,Sebastian Tschiatschek 机构:Department of Computer Sciences, University of Oxford, United Kingdom, University of Vienna, Austria 摘要:Abstraction plays an important role for generalisation of knowledge and skills, and is key to sample efficient learning and planning. For many complex problems an abstract plan can be formed first, which is then instantiated by filling in the necessary low-level details. Often, such abstract plans generalize well to related new problems. We study abstraction in the context of reinforcement learning, in which agents may perform state or temporal abstractions. Temporal abstractions aka options represent temporally-extended actions in the form of option policies. However, typically acquired option policies cannot be directly transferred to new environments due to changes in the state space or transition dynamics. Furthermore, many existing state abstraction schemes ignore the correlation between state and temporal abstraction. In this work, we propose successor abstraction, a novel abstraction scheme building on successor features. This includes an algorithm for encoding and instantiation of abstract options across different environments, and a state abstraction mechanism based on the abstract options. Our successor abstraction allows us to learn abstract environment models with semantics that are transferable across different environments through encoding and instantiation of abstract options. Empirically, we achieve better transfer and improved performance on a set of benchmark tasks as compared to relevant state of the art baselines.
【10】 Topologically Regularized Data Embeddings 标题:拓扑正则化数据嵌入 链接:https://arxiv.org/abs/2110.09193
作者:Robin Vandaele,Bo Kang,Jefrey Lijffijt,Tijl De Bie,Yvan Saeys 机构:Department of Applied Mathematics, Computer Science and Statistics, Ghent University, Ghent, Belgium, Data mining and Modelling for Biomedicine, VIB Inflammation Research Center, Ghent, Belgium 摘要:Unsupervised feature learning often finds low-dimensional embeddings that capture the structure of complex data. For tasks for which expert prior topological knowledge is available, incorporating this into the learned representation may lead to higher quality embeddings. For example, this may help one to embed the data into a given number of clusters, or to accommodate for noise that prevents one from deriving the distribution of the data over the model directly, which can then be learned more effectively. However, a general tool for integrating different prior topological knowledge into embeddings is lacking. Although differentiable topology layers have been recently developed that can (re)shape embeddings into prespecified topological models, they have two important limitations for representation learning, which we address in this paper. First, the currently suggested topological losses fail to represent simple models such as clusters and flares in a natural manner. Second, these losses neglect all original structural (such as neighborhood) information in the data that is useful for learning. We overcome these limitations by introducing a new set of topological losses, and proposing their usage as a way for topologically regularizing data embeddings to naturally represent a prespecified model. We include thorough experiments on synthetic and real data that highlight the usefulness and versatility of this approach, with applications ranging from modeling high-dimensional single cell data, to graph embedding.
【11】 State-Space Constraints Improve the Generalization of the Differentiable Neural Computer in some Algorithmic Tasks 标题:状态空间约束提高了微分神经计算机在某些算法任务中的泛化能力 链接:https://arxiv.org/abs/2110.09138
作者:Patrick Ofner,Roman Kern 机构: Universityof Freiburg, andGraz University of Technology 摘要:Memory-augmented neural networks (MANNs) can solve algorithmic tasks like sorting. However, they often do not generalize to lengths of input sequences not seen in the training phase. Therefore, we introduce two approaches constraining the state-space of the network controller to improve the generalization to out-of-distribution-sized input sequences: state compression and state regularization. We show that both approaches can improve the generalization capability of a particular type of MANN, the differentiable neural computer (DNC), and compare our approaches to a stateful and a stateless controller on a set of algorithmic tasks. Furthermore, we show that especially the combination of both approaches can enable a pre-trained DNC to be extended post hoc with a larger memory. Thus, our introduced approaches allow to train a DNC using shorter input sequences and thus save computational resources. Moreover, we observed that the capability for generalization is often accompanied by loop structures in the state-space, which could correspond to looping constructs in algorithms.
【12】 Real Additive Margin Softmax for Speaker Verification 标题:用于说话人确认的实加性余量Softmax 链接:https://arxiv.org/abs/2110.09116
作者:Lantian Li,Ruiqian Nai,Dong Wang 机构:Center for Speech and Language Technologies, BNRist, Tsinghua University, China 备注:Submitted to ICASSP 2022 摘要:The additive margin softmax (AM-Softmax) loss has delivered remarkable performance in speaker verification. A supposed behavior of AM-Softmax is that it can shrink within-class variation by putting emphasis on target logits, which in turn improves margin between target and non-target classes. In this paper, we conduct a careful analysis on the behavior of AM-Softmax loss, and show that this loss does not implement real max-margin training. Based on this observation, we present a Real AM-Softmax loss which involves a true margin function in the softmax training. Experiments conducted on VoxCeleb1, SITW and CNCeleb demonstrated that the corrected AM-Softmax loss consistently outperforms the original one. The code has been released at https://gitlab.com/csltstu/sunine.
【13】 Differentiable Rendering with Perturbed Optimizers 标题:带扰动优化器的可微渲染 链接:https://arxiv.org/abs/2110.09107
作者:Quentin Le Lidec,Ivan Laptev,Cordelia Schmid,Justin Carpentier 机构: PSL ResearchUniversity 摘要:Reasoning about 3D scenes from their 2D image projections is one of the core problems in computer vision. Solutions to this inverse and ill-posed problem typically involve a search for models that best explain observed image data. Notably, images depend both on the properties of observed scenes and on the process of image formation. Hence, if optimization techniques should be used to explain images, it is crucial to design differentiable functions for the projection of 3D scenes into images, also known as differentiable rendering. Previous approaches to differentiable rendering typically replace non-differentiable operations by smooth approximations, impacting the subsequent 3D estimation. In this paper, we take a more general approach and study differentiable renderers through the prism of randomized optimization and the related notion of perturbed optimizers. In particular, our work highlights the link between some well-known differentiable renderer formulations and randomly smoothed optimizers, and introduces differentiable perturbed renderers. We also propose a variance reduction mechanism to alleviate the computational burden inherent to perturbed optimizers and introduce an adaptive scheme to automatically adjust the smoothing parameters of the rendering process. We apply our method to 3D scene reconstruction and demonstrate its advantages on the tasks of 6D pose estimation and 3D mesh reconstruction. By providing informative gradients that can be used as a strong supervisory signal, we demonstrate the benefits of perturbed renderers to obtain more accurate solutions when compared to the state-of-the-art alternatives using smooth gradient approximations.
【14】 Data Driven and Visualization based Strategization for University Rank Improvement using Decision Trees 标题:基于决策树的基于数据驱动和可视化的大学排名提升策略 链接:https://arxiv.org/abs/2110.09050
作者:Nishi Doshi,Samhitha Gundam,Bhaskar Chaudhury 机构:Chaudhury,, Group in Computational Science and HPC, DA-IICT, Gandhinagar, India., Corresponding author(s). E-mail(s):, †These authors contributed equally to this work. 备注:29 pages 摘要:Annual ranking of higher educational institutes (HEIs) is a global phenomena and past research shows that they have significant impact on higher education landscape. In spite of criticisms regarding the goals, methodologies and outcomes of such ranking systems, previous studies reveal that most of the universities pay close attention to ranking results and look forward to improving their ranks. Generally, each ranking framework uses its own set of parameters and the data for individual metrics are condensed into a single final score for determining the rank thereby making it a complex multivariate problem. Maintaining a good rank and ascending in the rankings is a difficult task because it requires considerable resources, efforts and accurate planning. In this work, we show how exploratory data analysis (EDA) using correlation heatmaps and box plots can aid in understanding the broad trends in the ranking data, however it is challenging to make institutional decisions for rank improvements completely based on EDA. We present a novel idea of classifying the rankings data using Decision Tree (DT) based algorithms and retrieve decision paths for rank improvement using data visualization techniques. Using Laplace correction to the probability estimate, we quantify the amount of certainty attached with different decision paths obtained from interpretable DT models . The proposed methodology can aid HEIs to quantitatively asses the scope of improvement, adumbrate a fine-grained long-term action plan and prepare a suitable road-map.
【15】 When Are Linear Stochastic Bandits Attackable? 标题:线性随机土匪什么时候是可攻击的? 链接:https://arxiv.org/abs/2110.09008
作者:Huazheng Wang,Haifeng Xu,Hongning Wang 机构:Princeton University, University of Virginia 摘要:We study adversarial attacks on linear stochastic bandits, a sequential decision making problem with many important applications in recommender systems, online advertising, medical treatment, and etc. By manipulating the rewards, an adversary aims to control the behaviour of the bandit algorithm. Perhaps surprisingly, we first show that some attack goals can never be achieved. This is in sharp contrast to context-free stochastic bandits, and is intrinsically due to the correlation among arms in linear stochastic bandits. Motivated by this observation, this paper studies the attackability of a $k$-armed linear bandit environment. We first provide a full necessity and sufficiency characterization of attackability based on the geometry of the context vectors. We then propose a two-stage attack method against LinUCB and Robust Phase Elimination. The method first asserts whether the current environment is attackable, and if Yes, modifies the rewards to force the algorithm to pull a target arm linear times using only a sublinear cost. Numerical experiments further validate the effectiveness and cost-efficiency of the proposed method.
【16】 Dimensionality Reduction for Wasserstein Barycenter 标题:Wasserstein重心的降维 链接:https://arxiv.org/abs/2110.08991
作者:Zachary Izzo,Sandeep Silwal,Samson Zhou 机构:Stanford University ∗, MIT†, Carnegie Mellon University‡ 备注:Published as a conference paper in NeurIPS 2021 摘要:The Wasserstein barycenter is a geometric construct which captures the notion of centrality among probability distributions, and which has found many applications in machine learning. However, most algorithms for finding even an approximate barycenter suffer an exponential dependence on the dimension $d$ of the underlying space of the distributions. In order to cope with this "curse of dimensionality," we study dimensionality reduction techniques for the Wasserstein barycenter problem. When the barycenter is restricted to support of size $n$, we show that randomized dimensionality reduction can be used to map the problem to a space of dimension $O(\log n)$ independent of both $d$ and $k$, and that \emph{any} solution found in the reduced dimension will have its cost preserved up to arbitrary small error in the original space. We provide matching upper and lower bounds on the size of the reduced dimension, showing that our methods are optimal up to constant factors. We also provide a coreset construction for the Wasserstein barycenter problem that significantly decreases the number of input distributions. The coresets can be used in conjunction with random projections and thus further improve computation time. Lastly, our experimental results validate the speedup provided by dimensionality reduction while maintaining solution quality.
【17】 Developing a novel fair-loan-predictor through a multi-sensitive debiasing pipeline: DualFair 标题:通过多敏感去偏管道开发一种新的公平贷款预测器:DualFair 链接:https://arxiv.org/abs/2110.08944
作者:Arashdeep Singh,Jashandeep Singh,Ariba Khan,Amar Gupta 机构:Massachusetts Institute of Technology, Computer Science and Artificial Intelligence Laboratory, Fresno, USA, Cambridge, USA 备注:10 pages, 2 figures, 3 tables, 1 pseudocode 摘要:Machine learning (ML) models are increasingly used for high-stake applications that can greatly impact people's lives. Despite their use, these models have the potential to be biased towards certain social groups on the basis of race, gender, or ethnicity. Many prior works have attempted to mitigate this "model discrimination" by updating the training data (pre-processing), altering the model learning process (in-processing), or manipulating model output (post-processing). However, these works have not yet been extended to the realm of multi-sensitive parameters and sensitive options (MSPSO), where sensitive parameters are attributes that can be discriminated against (e.g race) and sensitive options are options within sensitive parameters (e.g black or white), thus giving them limited real-world usability. Prior work in fairness has also suffered from an accuracy-fairness tradeoff that prevents both the accuracy and fairness from being high. Moreover, previous literature has failed to provide holistic fairness metrics that work with MSPSO. In this paper, we solve all three of these problems by (a) creating a novel bias mitigation technique called DualFair and (b) developing a new fairness metric (i.e. AWI) that can handle MSPSO. Lastly, we test our novel mitigation method using a comprehensive U.S mortgage lending dataset and show that our classifier, or fair loan predictor, obtains better fairness and accuracy metrics than current state-of-the-art models.
【18】 A Dual Approach to Constrained Markov Decision Processes with Entropy Regularization 标题:熵正则化约束马尔可夫决策过程的对偶方法 链接:https://arxiv.org/abs/2110.08923
作者:Donghao Ying,Yuhao Ding,Javad Lavaei 机构:University of California, Berkeley 备注:24 pages 摘要:We study entropy-regularized constrained Markov decision processes (CMDPs) under the soft-max parameterization, in which an agent aims to maximize the entropy-regularized value function while satisfying constraints on the expected total utility. By leveraging the entropy regularization, our theoretical analysis shows that its Lagrangian dual function is smooth and the Lagrangian duality gap can be decomposed into the primal optimality gap and the constraint violation. Furthermore, we propose an accelerated dual-descent method for entropy-regularized CMDPs. We prove that our method achieves the global convergence rate $\widetilde{\mathcal{O}}(1/T)$ for both the optimality gap and the constraint violation for entropy-regularized CMDPs. A discussion about a linear convergence rate for CMDPs with a single constraint is also provided.
【19】 Green Simulation Assisted Policy Gradient to Accelerate Stochastic Process Control 标题:绿色仿真辅助政策梯度加速随机过程控制 链接:https://arxiv.org/abs/2110.08902
作者:Hua Zheng,Wei Xie,M. Ben Feng 机构:Department of Mechanical and Industrial Engineering, Northeastern University, Boston, MA , Department of Statistics and Actuarial Science, University of Waterloo, Waterloo, ON Canada 备注:36 pages, 7 figures 摘要:This study is motivated by the critical challenges in the biopharmaceutical manufacturing, including high complexity, high uncertainty, and very limited process data. Each experiment run is often very expensive. To support the optimal and robust process control, we propose a general green simulation assisted policy gradient (GS-PG) framework for both online and offline learning settings. Basically, to address the key limitations of state-of-art reinforcement learning (RL), such as sample inefficiency and low reliability, we create a mixture likelihood ratio based policy gradient estimation that can leverage on the information from historical experiments conducted under different inputs, including process model coefficients and decision policy parameters. Then, to accelerate the learning of optimal and robust policy, we further propose a variance reduction based sample selection method that allows GS-PG to intelligently select and reuse most relevant historical trajectories. The selection rule automatically updates the samples to be reused during the learning of process mechanisms and the search for optimal policy. Our theoretical and empirical studies demonstrate that the proposed framework can perform better than the state-of-art policy gradient approach and accelerate the optimal robust process control for complex stochastic systems under high uncertainty.
【20】 Provable RL with Exogenous Distractors via Multistep Inverse Dynamics 标题:用多步逆动力学方法证明外源性牵引物的可证RL 链接:https://arxiv.org/abs/2110.08847
作者:Yonathan Efroni,Dipendra Misra,Akshay Krishnamurthy,Alekh Agarwal,John Langford 机构:Microsoft Research, New York, NY, Google 摘要:Many real-world applications of reinforcement learning (RL) require the agent to deal with high-dimensional observations such as those generated from a megapixel camera. Prior work has addressed such problems with representation learning, through which the agent can provably extract endogenous, latent state information from raw observations and subsequently plan efficiently. However, such approaches can fail in the presence of temporally correlated noise in the observations, a phenomenon that is common in practice. We initiate the formal study of latent state discovery in the presence of such exogenous noise sources by proposing a new model, the Exogenous Block MDP (EX-BMDP), for rich observation RL. We start by establishing several negative results, by highlighting failure cases of prior representation learning based approaches. Then, we introduce the Predictive Path Elimination (PPE) algorithm, that learns a generalization of inverse dynamics and is provably sample and computationally efficient in EX-BMDPs when the endogenous state dynamics are near deterministic. The sample complexity of PPE depends polynomially on the size of the latent endogenous state space while not directly depending on the size of the observation space, nor the exogenous state space. We provide experiments on challenging exploration problems which show that our approach works empirically.
【21】 Localization with Sampling-Argmax 标题:带采样的定位-Argmax 链接:https://arxiv.org/abs/2110.08825
作者:Jiefeng Li,Tong Chen,Ruiqi Shi,Yujing Lou,Yong-Lu Li,Cewu Lu 机构:Shanghai Jiao Tong University 备注:NeurIPS 2021 摘要:Soft-argmax operation is commonly adopted in detection-based methods to localize the target position in a differentiable manner. However, training the neural network with soft-argmax makes the shape of the probability map unconstrained. Consequently, the model lacks pixel-wise supervision through the map during training, leading to performance degradation. In this work, we propose sampling-argmax, a differentiable training method that imposes implicit constraints to the shape of the probability map by minimizing the expectation of the localization error. To approximate the expectation, we introduce a continuous formulation of the output distribution and develop a differentiable sampling process. The expectation can be approximated by calculating the average error of all samples drawn from the output distribution. We show that sampling-argmax can seamlessly replace the conventional soft-argmax operation on various localization tasks. Comprehensive experiments demonstrate the effectiveness and flexibility of the proposed method. Code is available at https://github.com/Jeff-sjtu/sampling-argmax
【22】 On-board Fault Diagnosis of a Laboratory Mini SR-30 Gas Turbine Engine 标题:实验室小型SR-30型燃气轮机车载故障诊断 链接:https://arxiv.org/abs/2110.08820
作者:Richa Singh 机构:Department of Aerospace Engineering Indian Institute of Technology Bombay, Mumbai, - , India 摘要:Inspired by recent progress in machine learning, a data-driven fault diagnosis and isolation (FDI) scheme is explicitly developed for failure in the fuel supply system and sensor measurements of the laboratory gas turbine system. A passive approach of fault diagnosis is implemented where a model is trained using machine learning classifiers to detect a given set of fault scenarios in real-time on which it is trained. Towards the end, a comparative study is presented for well-known classification techniques, namely Support vector classifier, linear discriminant analysis, K-neighbor, and decision trees. Several simulation studies were carried out to demonstrate and illustrate the proposed fault diagnosis scheme's advantages, capabilities, and performance.
【23】 Centroid Approximation for Bootstrap 标题:Bootstrap的质心逼近 链接:https://arxiv.org/abs/2110.08720
作者:Mao Ye,Qiang Liu 机构:University of Texas at Austin 摘要:Bootstrap is a principled and powerful frequentist statistical tool for uncertainty quantification. Unfortunately, standard bootstrap methods are computationally intensive due to the need of drawing a large i.i.d. bootstrap sample to approximate the ideal bootstrap distribution; this largely hinders their application in large-scale machine learning, especially deep learning problems. In this work, we propose an efficient method to explicitly \emph{optimize} a small set of high quality "centroid" points to better approximate the ideal bootstrap distribution. We achieve this by minimizing a simple objective function that is asymptotically equivalent to the Wasserstein distance to the ideal bootstrap distribution. This allows us to provide an accurate estimation of uncertainty with a small number of bootstrap centroids, outperforming the naive i.i.d. sampling approach. Empirically, we show that our method can boost the performance of bootstrap in a variety of applications.
【24】 NeuralArTS: Structuring Neural Architecture Search with Type Theory 标题:NeuralArts:用类型理论构建神经结构搜索 链接:https://arxiv.org/abs/2110.08710
作者:Robert Wu,Nayan Saxena,Rohan Jain 机构:University of Toronto, King’s College Cir, Toronto, Ontario M,S 摘要:Neural Architecture Search (NAS) algorithms automate the task of finding optimal deep learning architectures given an initial search space of possible operations. Developing these search spaces is usually a manual affair with pre-optimized search spaces being more efficient, rather than searching from scratch. In this paper we present a new framework called Neural Architecture Type System (NeuralArTS) that categorizes the infinite set of network operations in a structured type system. We further demonstrate how NeuralArTS can be applied to convolutional layers and propose several future directions.
【25】 Terminal Embeddings in Sublinear Time 标题:次线性时间中的终端嵌入 链接:https://arxiv.org/abs/2110.08691
作者:Yeshwanth Cherapanamjeri,Jelani Nelson 备注:Accepted to FOCS 2021 摘要:Recently (Elkin, Filtser, Neiman 2017) introduced the concept of a {\it terminal embedding} from one metric space $(X,d_X)$ to another $(Y,d_Y)$ with a set of designated terminals $T\subset X$. Such an embedding $f$ is said to have distortion $\rho\ge 1$ if $\rho$ is the smallest value such that there exists a constant $C>0$ satisfying \begin{equation*} \forall x\in T\ \forall q\in X,\ C d_X(x, q) \le d_Y(f(x), f(q)) \le C \rho d_X(x, q) . \end{equation*} In the case that $X,Y$ are both Euclidean metrics with $Y$ being $m$-dimensional, recently (Narayanan, Nelson 2019), following work of (Mahabadi, Makarychev, Makarychev, Razenshteyn 2018), showed that distortion $1+\epsilon$ is achievable via such a terminal embedding with $m = O(\epsilon^{-2}\log n)$ for $n := |T|$. This generalizes the Johnson-Lindenstrauss lemma, which only preserves distances within $T$ and not to $T$ from the rest of space. The downside is that evaluating the embedding on some $q\in \mathbb{R}^d$ required solving a semidefinite program with $\Theta(n)$ constraints in $m$ variables and thus required some superlinear $\mathrm{poly}(n)$ runtime. Our main contribution in this work is to give a new data structure for computing terminal embeddings. We show how to pre-process $T$ to obtain an almost linear-space data structure that supports computing the terminal embedding image of any $q\in\mathbb{R}^d$ in sublinear time $n^{1-\Theta(\epsilon^2)+o(1)} + dn^{o(1)}$. To accomplish this, we leverage tools developed in the context of approximate nearest neighbor search.
【26】 Tackling the Imbalance for GNNs 标题:解决GNN的不平衡问题 链接:https://arxiv.org/abs/2110.08690
作者:Rui Wang,Weixuan Xiong,Qinghu Hou,Ou Wu 机构:National Center for Applied Mathematics, Tianjin, University, Tianjin, China 摘要:Different from deep neural networks for non-graph data classification, graph neural networks (GNNs) leverage the information exchange between nodes (or samples) when representing nodes. The category distribution shows an imbalance or even a highly-skewed trend on nearly all existing benchmark GNN data sets. The imbalanced distribution will cause misclassification of nodes in the minority classes, and even cause the classification performance on the entire data set to decrease. This study explores the effects of the imbalance problem on the performances of GNNs and proposes new methodologies to solve it. First, a node-level index, namely, the label difference index ($LDI$), is defined to quantitatively analyze the relationship between imbalance and misclassification. The less samples in a class, the higher the value of its average $LDI$; the higher the $LDI$ of a sample, the more likely the sample will be misclassified. We define a new loss and propose four new methods based on $LDI$. Experimental results indicate that the classification accuracies of the three among our proposed four new methods are better in both transductive and inductive settings. The $LDI$ can be applied to other GNNs.
【27】 MG-GCN: Scalable Multi-GPU GCN Training Framework 标题:MG-GCN:可扩展的多GPU GCN训练框架 链接:https://arxiv.org/abs/2110.08688
作者:Muhammed Fatih Balın,Kaan Sancak,Ümit V. Çatalyürek 机构:¨Umit V. C¸ataly¨urek †,‡ 备注:12 pages, 13 figures, Under Review 摘要:Full batch training of Graph Convolutional Network (GCN) models is not feasible on a single GPU for large graphs containing tens of millions of vertices or more. Recent work has shown that, for the graphs used in the machine learning community, communication becomes a bottleneck and scaling is blocked outside of the single machine regime. Thus, we propose MG-GCN, a multi-GPU GCN training framework taking advantage of the high-speed communication links between the GPUs present in multi-GPU systems. MG-GCN employs multiple High-Performance Computing optimizations, including efficient re-use of memory buffers to reduce the memory footprint of training GNN models, as well as communication and computation overlap. These optimizations enable execution on larger datasets, that generally do not fit into memory of a single GPU in state-of-the-art implementations. Furthermore, they contribute to achieve superior speedup compared to the state-of-the-art. For example, MG-GCN achieves super-linear speedup with respect to DGL, on the Reddit graph on both DGX-1 (V100) and DGX-A100.
【28】 Equivariant Discrete Normalizing Flows 标题:等变离散正则化流 链接:https://arxiv.org/abs/2110.08649
作者:Avishek Joey Bose,Ivan Kobyzev 机构:McGill University and Mila, Huawei Noah’s Ark Lab 备注:Preprint 摘要:At its core, generative modeling seeks to uncover the underlying factors that give rise to observed data that can often be modelled as the natural symmetries that manifest themselves through invariances and equivariances to certain transformations laws. However, current approaches are couched in the formalism of continuous normalizing flows that require the construction of equivariant vector fields -- inhibiting their simple application to conventional higher dimensional generative modelling domains like natural images. In this paper we focus on building equivariant normalizing flows using discrete layers. We first theoretically prove the existence of an equivariant map for compact groups whose actions are on compact spaces. We further introduce two new equivariant flows: $G$-coupling Flows and $G$-Residual Flows that elevate classical Coupling and Residual Flows with equivariant maps to a prescribed group $G$. Our construction of $G$-Residual Flows are also universal, in the sense that we prove an $G$-equivariant diffeomorphism can be exactly mapped by a $G$-residual flow. Finally, we complement our theoretical insights with experiments -- for the first time -- on image datasets like CIFAR-10 and show $G$-Equivariant Discrete Normalizing flows lead to increased data efficiency, faster convergence, and improved likelihood estimates.
【29】 DPNAS: Neural Architecture Search for Deep Learningwith Differential Privacy 标题:DPNAS:基于差分隐私的深度学习神经结构搜索 链接:https://arxiv.org/abs/2110.08557
作者:Anda Cheng,Jiaxing Wang,Xi Sheryl Zhang,Qiang Chen,Peisong Wang,Jian Cheng 机构: Jian Cheng 1 1Institute of Automation 摘要:Training deep neural networks (DNNs) for meaningful differential privacy (DP) guarantees severely degrades model utility. In this paper, we demonstrate that the architecture of DNNs has a significant impact on model utility in the context of private deep learning, whereas its effect is largely unexplored in previous studies. In light of this missing, we propose the very first framework that employs neural architecture search to automatic model design for private deep learning, dubbed as DPNAS. To integrate private learning with architecture search, we delicately design a novel search space and propose a DP-aware method for training candidate models. We empirically certify the effectiveness of the proposed framework. The searched model DPNASNet achieves state-of-the-art privacy/utility trade-offs, e.g., for the privacy budget of $(\epsilon, \delta)=(3, 1\times10^{-5})$, our model obtains test accuracy of $98.57\%$ on MNIST, $88.09\%$ on FashionMNIST, and $68.33\%$ on CIFAR-10. Furthermore, by studying the generated architectures, we provide several intriguing findings of designing private-learning-friendly DNNs, which can shed new light on model design for deep learning with differential privacy.
【30】 Lifelong Topological Visual Navigation 标题:终身拓扑视觉导航 链接:https://arxiv.org/abs/2110.08488
作者:Rey Reza Wiyatno,Anqi Xu,Liam Paull 机构: 1 illustrates howour agent uses the graph for planning a navigation task in 1ReyRezaWiyatnoandLiamPaullarewithMontr´ealRoboticsandEmbodiedAILab(REAL)andDIROattheUniversityofMontr´eal 备注:Project page: this https URL 摘要:The ability for a robot to navigate with only the use of vision is appealing due to its simplicity. Traditional vision-based navigation approaches required a prior map-building step that was arduous and prone to failure, or could only exactly follow previously executed trajectories. Newer learning-based visual navigation techniques reduce the reliance on a map and instead directly learn policies from image inputs for navigation. There are currently two prevalent paradigms: end-to-end approaches forego the explicit map representation entirely, and topological approaches which still preserve some loose connectivity of the space. However, while end-to-end methods tend to struggle in long-distance navigation tasks, topological map-based solutions are prone to failure due to spurious edges in the graph. In this work, we propose a learning-based topological visual navigation method with graph update strategies that improve lifelong navigation performance over time. We take inspiration from sampling-based planning algorithms to build image-based topological graphs, resulting in sparser graphs yet with higher navigation performance compared to baseline methods. Also, unlike controllers that learn from fixed training environments, we show that our model can be finetuned using a relatively small dataset from the real-world environment where the robot is deployed. We further assess performance of our system in real-world deployments.
【31】 Streaming Decision Trees and Forests 标题:流式决策树和林 链接:https://arxiv.org/abs/2110.08483
作者:Haoyin Xu,Jayanta Dey,Sambit Panda,Joshua T. Vogelstein 机构: 2Methods 1Johns Hopkins University 摘要:Machine learning has successfully leveraged modern data and provided computational solutions to innumerable real-world problems, including physical and biomedical discoveries. Currently, estimators could handle both scenarios with all samples available and situations requiring continuous updates. However, there is still room for improvement on streaming algorithms based on batch decision trees and random forests, which are the leading methods in batch data tasks. In this paper, we explore the simplest partial fitting algorithm to extend batch trees and test our models: stream decision tree (SDT) and stream decision forest (SDF) on three classification tasks of varying complexities. For reference, both existing streaming trees (Hoeffding trees and Mondrian forests) and batch estimators are included in the experiments. In all three tasks, SDF consistently produces high accuracy, whereas existing estimators encounter space restraints and accuracy fluctuations. Thus, our streaming trees and forests show great potential for further improvements, which are good candidates for solving problems like distribution drift and transfer learning.
【32】 Metadata Shaping: Natural Language Annotations for the Tail 标题:元数据整形:尾部的自然语言注释 链接:https://arxiv.org/abs/2110.08430
作者:Simran Arora,Sen Wu,Enci Liu,Christopher Re 机构:Stanford University 摘要:Language models (LMs) have made remarkable progress, but still struggle to generalize beyond the training data to rare linguistic patterns. Since rare entities and facts are prevalent in the queries users submit to popular applications such as search and personal assistant systems, improving the ability of LMs to reliably capture knowledge over rare entities is a pressing challenge studied in significant prior work. Noticing that existing approaches primarily modify the LM architecture or introduce auxiliary objectives to inject useful entity knowledge, we ask to what extent we could match the quality of these architectures using a base LM architecture, and only changing the data? We propose metadata shaping, a method in which readily available metadata, such as entity descriptions and categorical tags, are appended to examples based on information theoretic metrics. Intuitively, if metadata corresponding to popular entities overlap with metadata for rare entities, the LM may be able to better reason about the rare entities using patterns learned from similar popular entities. On standard entity-rich tasks (TACRED, FewRel, OpenEntity), with no changes to the LM whatsoever, metadata shaping exceeds the BERT-baseline by up to 5.3 F1 points, and achieves or competes with state-of-the-art results. We further show the improvements are up to 10x larger on examples containing tail versus popular entities.
【33】 Information-Theoretic Measures of Dataset Difficulty 标题:数据集难度的信息论测度 链接:https://arxiv.org/abs/2110.08420
作者:Kawin Ethayarajh,Yejin Choi,Swabha Swayamdipta 机构:Stanford University♥, Allen Institute for Artificial Intelligence♣, Paul G. Allen School of Computer Science, University of Washington♦ 摘要:Estimating the difficulty of a dataset typically involves comparing state-of-the-art models to humans; the bigger the performance gap, the harder the dataset is said to be. Not only is this framework informal, but it also provides little understanding of how difficult each instance is, or what attributes make it difficult for a given model. To address these problems, we propose an information-theoretic perspective, framing dataset difficulty as the absence of $\textit{usable information}$. Measuring usable information is as easy as measuring performance, but has certain theoretical advantages. While the latter only allows us to compare different models w.r.t the same dataset, the former also allows us to compare different datasets w.r.t the same model. We then introduce $\textit{pointwise}$ $\mathcal{V}-$$\textit{information}$ (PVI) for measuring the difficulty of individual instances, where instances with higher PVI are easier for model $\mathcal{V}$. By manipulating the input before measuring usable information, we can understand $\textit{why}$ a dataset is easy or difficult for a given model, which we use to discover annotation artefacts in widely-used benchmarks.
【34】 Evaluating the Faithfulness of Importance Measures in NLP by Recursively Masking Allegedly Important Tokens and Retraining 标题:用递归掩蔽和再训练评估自然语言处理中重要性度量的可信性 链接:https://arxiv.org/abs/2110.08412
作者:Andreas Madsen,Nicholas Meade,Vaibhav Adlakha,Siva Reddy 机构: Mila – Quebec AI Institute, Polytechnique Montréal, McGill, Facebook CIFAR AI Chair 摘要:To explain NLP models, many methods inform which inputs tokens are important for a prediction. However, an open question is if these methods accurately reflect the model's logic, a property often called faithfulness. In this work, we adapt and improve a recently proposed faithfulness benchmark from computer vision called ROAR (RemOve And Retrain), by Hooker et al. (2019). We improve ROAR by recursively removing dataset redundancies, which otherwise interfere with ROAR. We adapt and apply ROAR, to popular NLP importance measures, namely attention, gradient, and integrated gradients. Additionally, we use mutual information as an additional baseline. Evaluation is done on a suite of classification tasks often used in the faithfulness of attention literature. Finally, we propose a scalar faithfulness metric, which makes it easy to compare results across papers. We find that, importance measures considered to be unfaithful for computer vision tasks perform favorably for NLP tasks, the faithfulness of an importance measure is task-dependent, and the computational overhead of integrated gradient is rarely justified.
【35】 Return migration of German-affiliated researchers: Analyzing departure and return by gender, cohort, and discipline using Scopus bibliometric data 1996-2020 标题:德国附属研究人员的返回迁移:使用Scopus文献计量学数据1996-2020年按性别、队列和学科分析离开和返回 链接:https://arxiv.org/abs/2110.08340
作者:Xinyi Zhao,Samin Aref,Emilio Zagheni,Guy Stecklov 机构:Stecklov, Lab of Digital and Computational Demography, Max Planck Institute for Demographic, Research, Konrad-Zuse-Str. , Rostock, Mecklenburg-Vorpommern, Germany., Department of Mechanical and Industrial Engineering, University of Toronto, King’s 备注:21 pages, 6 figures 摘要:The international migration of researchers is a highly prized dimension of scientific mobility and motivates considerable policy debate. However, tracking migration life courses of researchers is challenging due to data limitations. In this study, we use Scopus bibliometric data on 8 million publications from 1.1 million researchers who have published at least once with an affiliation address from Germany in 1996-2020. We describe several key steps and algorithms we develop that enable us to construct the partial life histories of published researchers in this period. These tools allow us to explore both the out-migration of researchers with German affiliations as well as the subsequent return of a share of this group - the returnees. Our analyses shed light on important career stages and gender disparities between researchers who remain in Germany and those who both migrate out and those who eventually return. Return migration streams are even more gender imbalanced and point to the importance of additional efforts to attract female researchers back to Germany. We document a slightly declining trend in return migration with cohorts which, for most disciplines, is associated with decreasing German collaboration ties among cohorts of researchers who leave Germany. Also, gender disparities for the most gender imbalanced disciplines are unlikely to be mitigated by return migration given the gender compositions in cohorts of researchers who leave Germany and those who return. This analysis reveals new dimensions of scholarly migration by investigating the return migration of published researchers which is critical for science policy development.
【36】 C-AllOut: Catching & Calling Outliers by Type 标题:C-AllOut:通过类型捕获和调用离群值 链接:https://arxiv.org/abs/2110.08257
作者:Guilherme D. F. Silva,Leman Akoglu,Robson L. F. Cordeiro 机构:com†Carnegie Mellon University, edu‡University of S˜ao Paulo 备注:9+4 pages, 3 figures, 11 tables 摘要:Given an unlabeled dataset, wherein we have access only to pairwise similarities (or distances), how can we effectively (1) detect outliers, and (2) annotate/tag the outliers by type? Outlier detection has a large literature, yet we find a key gap in the field: to our knowledge, no existing work addresses the outlier annotation problem. Outliers are broadly classified into 3 types, representing distinct patterns that could be valuable to analysts: (a) global outliers are severe yet isolate cases that do not repeat, e.g., a data collection error; (b) local outliers diverge from their peers within a context, e.g., a particularly short basketball player; and (c) collective outliers are isolated micro-clusters that may indicate coalition or repetitions, e.g., frauds that exploit the same loophole. This paper presents C-AllOut: a novel and effective outlier detector that annotates outliers by type. It is parameter-free and scalable, besides working only with pairwise similarities (or distances) when it is needed. We show that C-AllOut achieves on par or significantly better performance than state-of-the-art detectors when spotting outliers regardless of their type. It is also highly effective in annotating outliers of particular types, a task that none of the baselines can perform.
【37】 Minimum \ell_{1}-norm interpolators: Precise asymptotics and multiple descent链接:https://arxiv.org/abs/2110.09502
作者:Yue Li,Yuting Wei 摘要:An evolving line of machine learning works observe empirical evidence that suggests interpolating estimators -- the ones that achieve zero training error -- may not necessarily be harmful. This paper pursues theoretical understanding for an important type of interpolators: the minimum $\ell_{1}$-norm interpolator, which is motivated by the observation that several learning algorithms favor low $\ell_1$-norm solutions in the over-parameterized regime. Concretely, we consider the noisy sparse regression model under Gaussian design, focusing on linear sparsity and high-dimensional asymptotics (so that both the number of features and the sparsity level scale proportionally with the sample size). We observe, and provide rigorous theoretical justification for, a curious multi-descent phenomenon; that is, the generalization risk of the minimum $\ell_1$-norm interpolator undergoes multiple (and possibly more than two) phases of descent and ascent as one increases the model capacity. This phenomenon stems from the special structure of the minimum $\ell_1$-norm interpolator as well as the delicate interplay between the over-parameterized ratio and the sparsity, thus unveiling a fundamental distinction in geometry from the minimum $\ell_2$-norm interpolator. Our finding is built upon an exact characterization of the risk behavior, which is governed by a system of two non-linear equations with two unknowns.
【38】 DBSegment: Fast and robust segmentation of deep brain structures -- Evaluation of transportability across acquisition domains 标题:DBSegment:大脑深层结构的快速而稳健的分割--跨采集域的可移植性评估 链接:https://arxiv.org/abs/2110.09473
作者:Mehri Baniasadi,Mikkel V. Petersen,Jorge Goncalves,Andreas Horn,Vanja Vlasov,Frank Hertel,Andreas Husch 机构:Luxembourg Center for Systems Biomedicine, University of Luxembourg, National Department of Neurosurgery, Centre Hospitalier de Luxembourg, Department of Clinical Medicine, Center of Functionally Integrative Neuroscience, University of Aarhus, Jorge Gonçalves 摘要:Segmenting deep brain structures from magnetic resonance images is important for patient diagnosis, surgical planning, and research. Most current state-of-the-art solutions follow a segmentation-by-registration approach, where subject MRIs are mapped to a template with well-defined segmentations. However, registration-based pipelines are time-consuming, thus, limiting their clinical use. This paper uses deep learning to provide a robust and efficient deep brain segmentation solution. The method consists of a pre-processing step to conform all MRI images to the same orientation, followed by a convolutional neural network using the nnU-Net framework. We use a total of 14 datasets from both research and clinical collections. Of these, seven were used for training and validation and seven were retained for independent testing. We trained the network to segment 30 deep brain structures, as well as a brain mask, using labels generated from a registration-based approach. We evaluated the generalizability of the network by performing a leave-one-dataset-out cross-validation, and extensive testing on external datasets. Furthermore, we assessed cross-domain transportability by evaluating the results separately on different domains. We achieved an average DSC of 0.89 $\pm$ 0.04 on the independent testing datasets when compared to the registration-based gold standard. On our test system, the computation time decreased from 42 minutes for a reference registration-based pipeline to 1 minute. Our proposed method is fast, robust, and generalizes with high reliability. It can be extended to the segmentation of other brain structures. The method is publicly available on GitHub, as well as a pip package for convenient usage.
【39】 RKHS-SHAP: Shapley Values for Kernel Methods 标题:RKHS-Shap:核方法的Shapley值 链接:https://arxiv.org/abs/2110.09167
作者:Siu Lun Chau,Javier Gonzalez,Dino Sejdinovic 机构:Department of Statistics, University of Oxford, United Kingdom, OX,LB, Microsoft Research Cambridge, United Kingdom, CB,FB 备注:11 pages, 4 figures 摘要:Feature attribution for kernel methods is often heuristic and not individualised for each prediction. To address this, we turn to the concept of Shapley values, a coalition game theoretical framework that has previously been applied to different machine learning model interpretation tasks, such as linear models, tree ensembles and deep networks. By analysing Shapley values from a functional perspective, we propose \textsc{RKHS-SHAP}, an attribution method for kernel machines that can efficiently compute both \emph{Interventional} and \emph{Observational Shapley values} using kernel mean embeddings of distributions. We show theoretically that our method is robust with respect to local perturbations - a key yet often overlooked desideratum for interpretability. Further, we propose \emph{Shapley regulariser}, applicable to a general empirical risk minimisation framework, allowing learning while controlling the level of specific feature's contributions to the model. We demonstrate that the Shapley regulariser enables learning which is robust to covariate shift of a given feature and fair learning which controls the Shapley values of sensitive features.
【40】 An actor-critic algorithm with deep double recurrent agents to solve the job shop scheduling problem 标题:一种求解Job Shop调度问题的具有深度双递归智能体的行动者-批评者算法 链接:https://arxiv.org/abs/2110.09076
作者:Marta Monaci,Valerio Agasucci,Giorgio Grani 机构:Sapienza University of Rome, Dep. of Computer Science, Control and Management Engineering, Rome, Italy, OptRail, Rome, Italy, SINTEF Digital, Dep. of Mathematics and Cybernetics, Oslo, Norway 摘要:There is a growing interest in integrating machine learning techniques and optimization to solve challenging optimization problems. In this work, we propose a deep reinforcement learning methodology for the job shop scheduling problem (JSSP). The aim is to build up a greedy-like heuristic able to learn on some distribution of JSSP instances, different in the number of jobs and machines. The need for fast scheduling methods is well known, and it arises in many areas, from transportation to healthcare. We model the JSSP as a Markov Decision Process and then we exploit the efficacy of reinforcement learning to solve the problem. We adopt an actor-critic scheme, where the action taken by the agent is influenced by policy considerations on the state-value function. The procedures are adapted to take into account the challenging nature of JSSP, where the state and the action space change not only for every instance but also after each decision. To tackle the variability in the number of jobs and operations in the input, we modeled the agent using two incident LSTM models, a special type of deep neural network. Experiments show the algorithm reaches good solutions in a short time, proving that is possible to generate new greedy heuristics just from learning-based methodologies. Benchmarks have been generated in comparison with the commercial solver CPLEX. As expected, the model can generalize, to some extent, to larger problems or instances originated by a different distribution from the one used in training.
【41】 Persuasion by Dimension Reduction 标题:降维的说服力 链接:https://arxiv.org/abs/2110.08884
作者:Semyon Malamud,Andreas Schrimpf 备注:arXiv admin note: text overlap with arXiv:2102.10909 摘要:How should an agent (the sender) observing multi-dimensional data (the state vector) persuade another agent to take the desired action? We show that it is always optimal for the sender to perform a (non-linear) dimension reduction by projecting the state vector onto a lower-dimensional object that we call the "optimal information manifold." We characterize geometric properties of this manifold and link them to the sender's preferences. Optimal policy splits information into "good" and "bad" components. When the sender's marginal utility is linear, revealing the full magnitude of good information is always optimal. In contrast, with concave marginal utility, optimal information design conceals the extreme realizations of good information and only reveals its direction (sign). We illustrate these effects by explicitly solving several multi-dimensional Bayesian persuasion problems.
【42】 Rheumatoid Arthritis: Automated Scoring of Radiographic Joint Damage 标题:类风湿性关节炎:放射学关节损伤的自动评分 链接:https://arxiv.org/abs/2110.08812
作者:Yan Ming Tan,Raphael Quek Hao Chong,Carol Anne Hargreaves 机构: Department of Statistics and Data Science, National University of Singapore;, Department of Electrical & Computer Engineering, National University of, Singapore, Singapore; 摘要:Rheumatoid arthritis is an autoimmune disease that causes joint damage due to inflammation in the soft tissue lining the joints known as the synovium. It is vital to identify joint damage as soon as possible to provide necessary treatment early and prevent further damage to the bone structures. Radiographs are often used to assess the extent of the joint damage. Currently, the scoring of joint damage from the radiograph takes expertise, effort, and time. Joint damage associated with rheumatoid arthritis is also not quantitated in clinical practice and subjective descriptors are used. In this work, we describe a pipeline of deep learning models to automatically identify and score rheumatoid arthritic joint damage from a radiographic image. Our automatic tool was shown to produce scores with extremely high balanced accuracy within a couple of minutes and utilizing this would remove the subjectivity of the scores between human reviewers.
【43】 Noise-Augmented Privacy-Preserving Empirical Risk Minimization with Dual-purpose Regularizer and Privacy Budget Retrieval and Recycling 标题:基于双用途规则化和隐私预算检索回收的噪声增强隐私保护经验风险最小化 链接:https://arxiv.org/abs/2110.08676
作者:Yinan Li,Fang Liu 机构:Applied and Computational Mathematics and Statistics, University of Notre Dame, Notre Dame, Indiana, USA 摘要:We propose Noise-Augmented Privacy-Preserving Empirical Risk Minimization (NAPP-ERM) that solves ERM with differential privacy guarantees. Existing privacy-preserving ERM approaches may be subject to over-regularization with the employment of an l2 term to achieve strong convexity on top of the target regularization. NAPP-ERM improves over the current approaches and mitigates over-regularization by iteratively realizing target regularization through appropriately designed augmented data and delivering strong convexity via a single adaptively weighted dual-purpose l2 regularizer. When the target regularization is for variable selection, we propose a new regularizer that achieves both privacy and sparsity guarantees simultaneously. Finally, we propose a strategy to retrieve privacy budget when the strong convexity requirement is met, which can be returned to users such that the DP of ERM is guaranteed at a lower privacy cost than originally planned, or be recycled to the ERM optimization procedure to reduce the injected DP noise and improve the utility of DP-ERM. From an implementation perspective, NAPP-ERM can be achieved by optimizing a non-perturbed object function given noise-augmented data and can thus leverage existing tools for non-private ERM optimization. We illustrate through extensive experiments the mitigation effect of the over-regularization and private budget retrieval by NAPP-ERM on variable selection and prediction.
【44】 Fast Projection onto the Capped Simplex withApplications to Sparse Regression in Bioinformatics 标题:覆盖单纯形的快速投影及其在生物信息学稀疏回归中的应用 链接:https://arxiv.org/abs/2110.08471
作者:Andersen Ang,Jianzhu Ma,Nianjun Liu,Kun Huang,Yijie Wang 机构:Dept. of Combinatorics and Optimization, University of Waterloo, Institute for Artificial Intelligence, Peking University, Dept. of Epidemiology and Biostatistics, Indiana University Bloomington, Dept. of Biostatistics and Health Data Science, Dept. of Computer Science 备注:12 pages, 5 figures 摘要:We consider the problem of projecting a vector onto the so-called k-capped simplex, which is a hyper-cube cut by a hyperplane. For an n-dimensional input vector with bounded elements, we found that a simple algorithm based on Newton's method is able to solve the projection problem to high precision with a complexity roughly about O(n), which has a much lower computational cost compared with the existing sorting-based methods proposed in the literature. We provide a theory for partial explanation and justification of the method. We demonstrate that the proposed algorithm can produce a solution of the projection problem with high precision on large scale datasets, and the algorithm is able to significantly outperform the state-of-the-art methods in terms of runtime (about 6-8 times faster than a commercial software with respect to CPU time for input vector with 1 million variables or more). We further illustrate the effectiveness of the proposed algorithm on solving sparse regression in a bioinformatics problem. Empirical results on the GWAS dataset (with 1,500,000 single-nucleotide polymorphisms) show that, when using the proposed method to accelerate the Projected Quasi-Newton (PQN) method, the accelerated PQN algorithm is able to handle huge-scale regression problem and it is more efficient (about 3-6 times faster) than the current state-of-the-art methods.
【45】 Nonlinear proper orthogonal decomposition for convection-dominated flows 标题:对流占优流动的非线性本征正交分解 链接:https://arxiv.org/abs/2110.08295
作者:Shady E. Ahmed,Omer San,Adil Rasheed,Traian Iliescu 机构:School of Mechanical & Aerospace Engineering, Oklahoma State University, Stillwater, OK , USA., Department of Engineering Cybernetics, Norwegian University of Science and Technology, N-, Trondheim, Norway., Department of Mathematics and Cybernetics, SINTEF Digital 摘要:Autoencoder techniques find increasingly common use in reduced order modeling as a means to create a latent space. This reduced order representation offers a modular data-driven modeling approach for nonlinear dynamical systems when integrated with a time series predictive model. In this letter, we put forth a nonlinear proper orthogonal decomposition (POD) framework, which is an end-to-end Galerkin-free model combining autoencoders with long short-term memory networks for dynamics. By eliminating the projection error due to the truncation of Galerkin models, a key enabler of the proposed nonintrusive approach is the kinematic construction of a nonlinear mapping between the full-rank expansion of the POD coefficients and the latent space where the dynamics evolve. We test our framework for model reduction of a convection-dominated system, which is generally challenging for reduced order models. Our approach not only improves the accuracy, but also significantly reduces the computational cost of training and testing.
机器翻译,仅供参考