cs.LG 方向,今日共计105篇
Graph相关(图学习|图神经网络|图优化等)(4篇)
【1】 TC-GNN: Accelerating Sparse Graph Neural Network Computation Via Dense Tensor Core on GPUs 标题:TC-GNN:在GPU上通过密集张量核加速稀疏图神经网络计算 链接:https://arxiv.org/abs/2112.02052
作者:Yuke Wang,Boyuan Feng,Yufei Ding 摘要:近年来,图形神经网络(GNNs)作为基于图形的机器学习的主干,在各个领域(如电子商务)都取得了巨大的成功。然而,由于基于图的操作的高度稀疏性和不规则性,GNNs的性能通常不令人满意。为此,我们提出了TC-GNN,这是第一个基于GPU张量核心单元(TCU)的GNN加速框架。其核心思想是协调“稀疏”GNN计算和“密集”TCU。具体来说,我们对主流GNN计算框架中的稀疏操作进行了深入分析。我们介绍了一种新的稀疏图转换技术,以便于TCU处理稀疏GNN工作负载。我们还实施了有效的CUDA核心和TCU协作设计,以充分利用GPU资源。我们将TC-GNN与Pytorch框架完全集成,以便于编程。严格的实验表明,在各种GNN模型和数据集设置中,与最先进的深度图形库框架相比,平均加速1.70倍。 摘要:Recently, graph neural networks (GNNs), as the backbone of graph-based machine learning, demonstrate great success in various domains (e.g., e-commerce). However, the performance of GNNs is usually unsatisfactory due to the highly sparse and irregular graph-based operations. To this end, we propose, TC-GNN, the first GPU Tensor Core Unit (TCU) based GNN acceleration framework. The core idea is to reconcile the "Sparse" GNN computation with "Dense" TCU. Specifically, we conduct an in-depth analysis of the sparse operations in mainstream GNN computing frameworks. We introduce a novel sparse graph translation technique to facilitate TCU processing of sparse GNN workload. We also implement an effective CUDA core and TCU collaboration design to fully utilize GPU resources. We fully integrate TC-GNN with the Pytorch framework for ease of programming. Rigorous experiments show an average of 1.70X speedup over the state-of-the-art Deep Graph Library framework across various GNN models and dataset settings.
【2】 Structure-Aware Multi-Hop Graph Convolution for Graph Neural Networks 标题:图神经网络的结构感知多跳图卷积 链接:https://arxiv.org/abs/2112.01714
作者:Yang Li,Yuichi Tanaka 摘要:在本文中,我们提出了一种空间图卷积(GC)来对图上的信号进行分类。现有的GC方法仅限于使用特征空间中的结构信息。此外,GCs的单步仅使用目标节点的一跳相邻节点上的功能。在本文中,我们提出了两种提高GCs性能的方法:1)利用特征空间中的结构信息,2)在一个GC步骤中利用多跳信息。在第一种方法中,我们在特征空间中定义了三种结构特征:特征角度、特征距离和关系嵌入。第二种方法聚合GC中多跳邻居的节点特征。这两种方法可以同时使用。我们还提出了图形神经网络(GNNs)集成所提出的GC,用于对3D点云和引用网络中的节点进行分类。在实验中,提出的GNNs比现有的方法具有更高的分类精度。 摘要:In this paper, we propose a spatial graph convolution (GC) to classify signals on a graph. Existing GC methods are limited to using the structural information in the feature space. Additionally, the single step of GCs only uses features on the one-hop neighboring nodes from the target node. In this paper, we propose two methods to improve the performance of GCs: 1) Utilizing structural information in the feature space, and 2) exploiting the multi-hop information in one GC step. In the first method, we define three structural features in the feature space: feature angle, feature distance, and relational embedding. The second method aggregates the node-wise features of multi-hop neighbors in a GC. Both methods can be simultaneously used. We also propose graph neural networks (GNNs) integrating the proposed GC for classifying nodes in 3D point clouds and citation networks. In experiments, the proposed GNNs exhibited a higher classification accuracy than existing methods.
【3】 SparRL: Graph Sparsification via Deep Reinforcement Learning 标题:SPARRL:基于深度强化学习的图稀疏 链接:https://arxiv.org/abs/2112.01565
作者:Ryan Wickman,Xiaofei Zhang,Weizi Li 备注:This article introduces the first general and effective graph sparsification framework enabled by deep reinforcement learning 摘要:图稀疏化涉及数据缩减,其中首选具有类似结构的边缩减图。现有的方法大多是基于采样的,这通常会带来较高的计算复杂度,并且对于不同的约简目标缺乏灵活性。我们提出了SparRL,这是第一个通用和有效的基于强化学习的图形稀疏化框架。SparRL可以很容易地适应不同的约简目标,并保证与图的大小无关的复杂性。大量实验表明,SparRL在生成涉及多种目标的高质量稀疏图方面优于所有流行的稀疏化方法。 摘要:Graph sparsification concerns data reduction where an edge-reduced graph of a similar structure is preferred. Existing methods are mostly sampling-based, which introduce high computation complexity in general and lack of flexibility for a different reduction objective. We present SparRL, the first general and effective reinforcement learning-based framework for graph sparsification. SparRL can easily adapt to different reduction goals and promise graph-size-independent complexity. Extensive experiments show that SparRL outperforms all prevailing sparsification methods in producing high-quality sparsified graphs concerning a variety of objectives.
【4】 Graph Neural Networks for Charged Particle Tracking on FPGAs 标题:图神经网络在FPGA带电粒子跟踪中的应用 链接:https://arxiv.org/abs/2112.02048
作者:Abdelrahman Elabd,Vesal Razavimaleki,Shi-Yu Huang,Javier Duarte,Markus Atkinson,Gage DeZoort,Peter Elmer,Jin-Xuan Hu,Shih-Chieh Hsu,Bo-Cheng Lai,Mark Neubauer,Isobel Ojalvo,Savannah Thais 备注:26 pages, 17 figures, 1 table 摘要:在欧洲核子研究中心大型强子对撞机(LHC)上确定碰撞中的带电粒子轨迹是一个重要但具有挑战性的问题,特别是在LHC(HL-LHC)未来高亮度阶段预期的高相互作用密度条件下。图形神经网络(GNNs)是一种几何深度学习算法,通过将跟踪器数据嵌入为图形(节点表示命中,而边表示可能的轨迹段)并将边分类为真轨迹段或假轨迹段,已成功应用于此任务。然而,由于计算量大,他们在基于硬件或软件的触发器应用方面的研究受到限制。在本文中,我们介绍了一个自动翻译工作流,它集成到一个更广泛的工具$\texttt{hls4ml}$中,用于将GNN转换为现场可编程门阵列(FPGA)固件。我们使用此转换工具在FPGA上实现带电粒子跟踪的GNN,使用TrackML挑战数据集进行训练,设计目标是不同的图形大小、任务复杂度和延迟/吞吐量要求。这项工作可以在HL-LHC实验的触发水平上包含带电粒子跟踪GNN。 摘要:The determination of charged particle trajectories in collisions at the CERN Large Hadron Collider (LHC) is an important but challenging problem, especially in the high interaction density conditions expected during the future high-luminosity phase of the LHC (HL-LHC). Graph neural networks (GNNs) are a type of geometric deep learning algorithm that has successfully been applied to this task by embedding tracker data as a graph -- nodes represent hits, while edges represent possible track segments -- and classifying the edges as true or fake track segments. However, their study in hardware- or software-based trigger applications has been limited due to their large computational cost. In this paper, we introduce an automated translation workflow, integrated into a broader tool called $\texttt{hls4ml}$, for converting GNNs into firmware for field-programmable gate arrays (FPGAs). We use this translation tool to implement GNNs for charged particle tracking, trained using the TrackML challenge dataset, on FPGAs with designs targeting different graph sizes, task complexites, and latency/throughput requirements. This work could enable the inclusion of charged particle tracking GNNs at the trigger level for HL-LHC experiments.
Transformer(5篇)
【1】 Linear algebra with transformers 标题:带Transformer的线性代数 链接:https://arxiv.org/abs/2112.01898
作者:François Charton 摘要:从积分到定理证明,Transformer在数学中的大多数应用都集中在符号计算上。在本文中,我们表明,Transformer可以进行训练,以执行高精度的数值计算。我们考虑线性代数的问题:矩阵转置、加法、乘法、特征值和向量、奇异值分解和反演。通过对随机矩阵数据集上的小型Transformer(最多六层)进行训练,我们在所有问题上都实现了高精度(超过90%)。我们还表明,经过训练的模型可以从其训练分布中推广出来,并且通过使用更多样化的数据集(特别是通过使用具有非独立和相同分布系数的矩阵进行训练),可以大大提高域外精度。最后,我们展示了很少的镜头学习可以用来重新训练模型以解决更大的问题。 摘要:Most applications of transformers to mathematics, from integration to theorem proving, focus on symbolic computation. In this paper, we show that transformers can be trained to perform numerical calculations with high accuracy. We consider problems of linear algebra: matrix transposition, addition, multiplication, eigenvalues and vectors, singular value decomposition, and inversion. Training small transformers (up to six layers) over datasets of random matrices, we achieve high accuracies (over 90%) on all problems. We also show that trained models can generalize out of their training distribution, and that out-of-domain accuracy can be greatly improved by working from more diverse datasets (in particular, by training from matrices with non-independent and identically distributed coefficients). Finally, we show that few-shot learning can be leveraged to re-train models to solve larger problems.
【2】 Efficient Two-Stage Detection of Human-Object Interactions with a Novel Unary-Pairwise Transformer 标题:一种新型一元成对Transformer人-物相互作用的高效两阶段检测 链接:https://arxiv.org/abs/2112.01838
作者:Frederic Z. Zhang,Dylan Campbell,Stephen Gould 备注:14 pages, 14 figures and 5 tables 摘要:用于视觉数据的Transformer模型的最新发展已导致识别和检测任务的显著改进。特别是,使用可学习查询代替区域建议已经产生了一类新的单阶段检测模型,由检测Transformer(DETR)牵头。此后,这种单阶段方法的变化主导了人机交互(HOI)检测。然而,这种单级HOI探测器的成功在很大程度上归功于Transformer的表现力。我们发现,当配备相同的Transformer时,两级Transformer的性能和内存效率会更高,而训练时间只需一小部分。在这项工作中,我们提出了一元成对变换器,这是一种两级检测器,利用了HOI的一元和成对表示。我们观察到,Transformer网络的一元部分和成对部分具有特殊性,前者优先增加正面示例的分数,后者减少负面示例的分数。我们在HICO-DET和V-COCO数据集上评估了我们的方法,并且显著优于最先进的方法。在推断时,我们使用ResNet50的模型在单个GPU上接近实时性能。 摘要:Recent developments in transformer models for visual data have led to significant improvements in recognition and detection tasks. In particular, using learnable queries in place of region proposals has given rise to a new class of one-stage detection models, spearheaded by the Detection Transformer (DETR). Variations on this one-stage approach have since dominated human-object interaction (HOI) detection. However, the success of such one-stage HOI detectors can largely be attributed to the representation power of transformers. We discovered that when equipped with the same transformer, their two-stage counterparts can be more performant and memory-efficient, while taking a fraction of the time to train. In this work, we propose the Unary-Pairwise Transformer, a two-stage detector that exploits unary and pairwise representations for HOIs. We observe that the unary and pairwise parts of our transformer network specialise, with the former preferentially increasing the scores of positive examples and the latter decreasing the scores of negative examples. We evaluate our method on the HICO-DET and V-COCO datasets, and significantly outperform state-of-the-art approaches. At inference time, our model with ResNet50 approaches real-time performance on a single GPU.
【3】 Improving Predictions of Tail-end Labels using Concatenated BioMed-Transformers for Long Medical Documents 标题:使用串联的BioMed-Transformers改进长医学文档的尾端标签预测 链接:https://arxiv.org/abs/2112.01718
作者:Vithya Yogarajan,Bernhard Pfahringer,Tony Smith,Jacob Montiel 摘要:多标签学习在考虑标签相关性的同时,从给定标签集中预测未知实例的标签子集。多标签分类的一个已知挑战是标签的长尾分布。许多研究侧重于改进模型的总体预测,因此没有优先考虑尾部标签。改进医学文本多标签分类中的尾端标签预测,有助于更好地了解患者并改善护理。由一个或多个不常见标签获得的知识可能会影响医疗决策和治疗计划的原因。这项研究提出了不同的连接领域特定的语言模型,包括多个BioMed转换器,以实现两个主要目标。首先,在多标签问题上提高不常见标签的F1分数,特别是长尾标签;第二,处理长医疗文本和多源电子健康记录(EHR),对于设计用于短输入序列的标准Transformer来说,这是一项具有挑战性的任务。这项研究的一个重要贡献是使用TransformerXL预测医学代码获得的最新技术(SOTA)结果。在重症监护医疗信息集市(MIMIC-III)数据库上进行了各种实验。结果表明,在整体微观和宏观F1分数以及尾端标签的单个F1分数方面,串联的BioMedTransformer优于标准Transformer,而对于长输入序列,其训练时间低于现有的基于Transformer的解决方案。 摘要:Multi-label learning predicts a subset of labels from a given label set for an unseen instance while considering label correlations. A known challenge with multi-label classification is the long-tailed distribution of labels. Many studies focus on improving the overall predictions of the model and thus do not prioritise tail-end labels. Improving the tail-end label predictions in multi-label classifications of medical text enables the potential to understand patients better and improve care. The knowledge gained by one or more infrequent labels can impact the cause of medical decisions and treatment plans. This research presents variations of concatenated domain-specific language models, including multi-BioMed-Transformers, to achieve two primary goals. First, to improve F1 scores of infrequent labels across multi-label problems, especially with long-tail labels; second, to handle long medical text and multi-sourced electronic health records (EHRs), a challenging task for standard transformers designed to work on short input sequences. A vital contribution of this research is new state-of-the-art (SOTA) results obtained using TransformerXL for predicting medical codes. A variety of experiments are performed on the Medical Information Mart for Intensive Care (MIMIC-III) database. Results show that concatenated BioMed-Transformers outperform standard transformers in terms of overall micro and macro F1 scores and individual F1 scores of tail-end labels, while incurring lower training times than existing transformer-based solutions for long input sequences.
【4】 LMR-CBT: Learning Modality-fused Representations with CB-Transformer for Multimodal Emotion Recognition from Unaligned Multimodal Sequences 标题:LMR-CBT:基于CB-Transform的学习模态融合表示法在未对齐多模态序列中的多模态情感识别 链接:https://arxiv.org/abs/2112.01697
作者:Ziwang Fu,Feng Liu,Hanyang Wang,Siyuan Shen,Jiahao Zhang,Jiayin Qi,Xiangling Fu,Aimin Zhou 备注:9 pages ,Figure 2, Table 5 摘要:学习模态融合表征和处理未对齐的多模态序列在多模态情感识别中具有重要意义和挑战性。现有的方法使用定向两两注意或信息中心来融合语言、视觉和音频模式。然而,这些方法在融合特征时引入了信息冗余,并且在不考虑模式互补性的情况下效率低下。在本文中,我们提出了一种有效的神经网络学习模式融合表示与CBTransformer(LMR-CBT)的多模态情感识别从未对齐的多模态序列。具体来说,我们首先分别对这三种模式进行特征提取,以获得序列的局部结构。然后,我们设计了一种新的跨模态块变换器(CB-transformer),该变换器支持不同模态的互补学习,主要分为局部时间学习、跨模态特征融合和全局自我注意表征。此外,我们将融合后的特征与原始特征拼接,对序列中的情感进行分类。最后,我们在三个具有挑战性的数据集(IEMOCAP、CMU-MOSI和CMU-MOSEI)上进行单词对齐和非对齐实验。实验结果表明,在这两种情况下,我们提出的方法都具有优越性和有效性。与主流方法相比,我们的方法以最少的参数数量达到了最先进的水平。 摘要:Learning modality-fused representations and processing unaligned multimodal sequences are meaningful and challenging in multimodal emotion recognition. Existing approaches use directional pairwise attention or a message hub to fuse language, visual, and audio modalities. However, those approaches introduce information redundancy when fusing features and are inefficient without considering the complementarity of modalities. In this paper, we propose an efficient neural network to learn modality-fused representations with CB-Transformer (LMR-CBT) for multimodal emotion recognition from unaligned multimodal sequences. Specifically, we first perform feature extraction for the three modalities respectively to obtain the local structure of the sequences. Then, we design a novel transformer with cross-modal blocks (CB-Transformer) that enables complementary learning of different modalities, mainly divided into local temporal learning,cross-modal feature fusion and global self-attention representations. In addition, we splice the fused features with the original features to classify the emotions of the sequences. Finally, we conduct word-aligned and unaligned experiments on three challenging datasets, IEMOCAP, CMU-MOSI, and CMU-MOSEI. The experimental results show the superiority and efficiency of our proposed method in both settings. Compared with the mainstream methods, our approach reaches the state-of-the-art with a minimum number of parameters.
【5】 MT-TransUNet: Mediating Multi-Task Tokens in Transformers for Skin Lesion Segmentation and Classification 标题:MT-TransUNet:在用于皮肤病变分割和分类的Transformer中调解多任务令牌 链接:https://arxiv.org/abs/2112.01767
作者:Jingye Chen,Jieneng Chen,Zongwei Zhou,Bin Li,Alan Yuille,Yongyi Lu 备注:A technical report. Code will be released 摘要:自动皮肤癌诊断的最新进展已经取得了与平板认证皮肤科医师一致的表现。然而,这些方法将皮肤癌诊断表述为一项简单的分类任务,忽略了病变分割的潜在好处。我们认为,准确的病变分割可以用附加的病变信息补充分类任务,如不对称性、边界、强度和物理大小;反过来,可靠的病变分类可以支持具有鉴别病变特征的分割任务。为此,本文提出了一个新的多任务框架MT-TransUNet,它能够通过在Transformer中调解多任务标记来协作分割和分类皮肤损伤。此外,我们还引入了双任务和关注区域一致性损失,以利用那些没有像素级注释的图像,确保模型在遇到相同图像时的鲁棒性,并考虑了增强。在ISIC-2017和PH2中,我们的MT Transune在病变分割和分类任务方面超过了之前的最新水平;更重要的是,它在模型参数(48M~vs.130M)和推理速度(每幅图像0.17s~vs.2.02s)方面保持了令人信服的计算效率。代码将在https://github.com/JingyeChen/MT-TransUNet. 摘要:Recent advances in automated skin cancer diagnosis have yielded performance on par with board-certified dermatologists. However, these approaches formulated skin cancer diagnosis as a simple classification task, dismissing the potential benefit from lesion segmentation. We argue that an accurate lesion segmentation can supplement the classification task with additive lesion information, such as asymmetry, border, intensity, and physical size; in turn, a faithful lesion classification can support the segmentation task with discriminant lesion features. To this end, this paper proposes a new multi-task framework, named MT-TransUNet, which is capable of segmenting and classifying skin lesions collaboratively by mediating multi-task tokens in Transformers. Furthermore, we have introduced dual-task and attended region consistency losses to take advantage of those images without pixel-level annotation, ensuring the model's robustness when it encounters the same image with an account of augmentation. Our MT-TransUNet exceeds the previous state of the art for lesion segmentation and classification tasks in ISIC-2017 and PH2; more importantly, it preserves compelling computational efficiency regarding model parameters (48M~vs.~130M) and inference speed (0.17s~vs.~2.02s per image). Code will be available at https://github.com/JingyeChen/MT-TransUNet.
GAN|对抗|攻击|生成相关(8篇)
【1】 Generative Adversarial Networks for Synthetic Data Generation: A Comparative Study 标题:用于合成数据生成的生成性对抗性网络的比较研究 链接:https://arxiv.org/abs/2112.01925
作者:Claire Little,Mark Elliot,Richard Allmendinger,Sahel Shariati Samani 摘要:生成性对抗网络(GAN)作为一种综合数据的手段,正受到越来越多的关注。到目前为止,这项工作的大部分已经应用于数据保密领域之外的用例,其中一个常见的应用是生成人工图像。在这里,我们考虑潜在的应用GANS的目的,以产生综合普查微数据。我们使用一组效用指标和披露风险指标(目标正确归因概率)来比较表格GANs生成的数据和使用传统数据合成方法生成的数据。 摘要:Generative Adversarial Networks (GANs) are gaining increasing attention as a means for synthesising data. So far much of this work has been applied to use cases outside of the data confidentiality domain with a common application being the production of artificial images. Here we consider the potential application of GANs for the purpose of generating synthetic census microdata. We employ a battery of utility metrics and a disclosure risk metric (the Targeted Correct Attribution Probability) to compare the data produced by tabular GANs with those produced using orthodox data synthesis methods.
【2】 Attack-Centric Approach for Evaluating Transferability of Adversarial Samples in Machine Learning Models 标题:机器学习模型中以攻击为中心评估敌方样本可转移性的方法 链接:https://arxiv.org/abs/2112.01777
作者:Tochukwu Idika,Ismail Akturk 摘要:对抗性样本的可转移性成为一个严重的问题,因为它们会影响机器学习系统部署的可靠性,因为它们会进入许多关键应用程序。了解影响对抗性样本可转移性的因素可以帮助专家就如何构建健壮可靠的机器学习系统做出明智的决策。本研究的目的是通过以攻击为中心的方法,深入了解敌方样本可转移性背后的机制。这种以攻击为中心的观点通过评估(产生机器学习攻击的)机器学习攻击对给定输入数据集的影响来解释敌对样本将如何转移。为了实现这一目标,我们使用攻击者模型生成对抗性样本,并将这些样本转移到受害者模型。我们分析了对抗性样本在被害人模型上的行为,概述了影响对抗性样本可转移性的四个因素。虽然这些因素不一定详尽无遗,但它们为机器学习系统的研究人员和实践者提供了有用的见解。 摘要:Transferability of adversarial samples became a serious concern due to their impact on the reliability of machine learning system deployments, as they find their way into many critical applications. Knowing factors that influence transferability of adversarial samples can assist experts to make informed decisions on how to build robust and reliable machine learning systems. The goal of this study is to provide insights on the mechanisms behind the transferability of adversarial samples through an attack-centric approach. This attack-centric perspective interprets how adversarial samples would transfer by assessing the impact of machine learning attacks (that generated them) on a given input dataset. To achieve this goal, we generated adversarial samples using attacker models and transferred these samples to victim models. We analyzed the behavior of adversarial samples on victim models and outlined four factors that can influence the transferability of adversarial samples. Although these factors are not necessarily exhaustive, they provide useful insights to researchers and practitioners of machine learning systems.
【3】 On the Existence of the Adversarial Bayes Classifier (Extended Version) 标题:关于对抗性贝叶斯分类器(扩展版)的存在性 链接:https://arxiv.org/abs/2112.01694
作者:Pranjal Awasthi,Natalie S. Frank,Mehryar Mohri 备注:49 pages, 8 figures. Extended version of the paper "On the Existence of the Adversarial Bayes Classifier" published in NeurIPS 摘要:对抗性鲁棒性是现代机器学习应用中的一个重要特性。虽然它是最近几次理论研究的主题,但与对抗鲁棒性相关的许多重要问题仍然悬而未决。在这项工作中,我们研究了一个关于对抗鲁棒性的贝叶斯最优性的基本问题。我们提供了一般的充分条件,在此条件下,贝叶斯最优分类器的存在可以保证对抗鲁棒性。我们的结果可以为后续研究对抗鲁棒性中的替代损失及其一致性特性提供有用的工具。这篇手稿是发表在NeurIPS上的“关于敌对贝叶斯分类器的存在”一文的扩展版本。本文的结果不适用于某些非严格凸范数。在这里,我们将结果扩展到所有可能的规范。 摘要:Adversarial robustness is a critical property in a variety of modern machine learning applications. While it has been the subject of several recent theoretical studies, many important questions related to adversarial robustness are still open. In this work, we study a fundamental question regarding Bayes optimality for adversarial robustness. We provide general sufficient conditions under which the existence of a Bayes optimal classifier can be guaranteed for adversarial robustness. Our results can provide a useful tool for a subsequent study of surrogate losses in adversarial robustness and their consistency properties. This manuscript is the extended version of the paper "On the Existence of the Adversarial Bayes Classifier" published in NeurIPS. The results of the original paper did not apply to some non-strictly convex norms. Here we extend our results to all possible norms.
【4】 Sample-Efficient Generation of Novel Photo-acid Generator Molecules using a Deep Generative Model 标题:基于深度生成模型的新型光酸发生器分子的高效样品生成 链接:https://arxiv.org/abs/2112.01625
作者:Samuel C. Hoffman,Vijil Chenthamarakshan,Dmitry Yu. Zubarev,Daniel P. Sanders,Payel Das 摘要:光致酸发生器(PAG)是一种在光照下释放酸($H^+$离子)的化合物。这些化合物是用于制造半导体逻辑和存储器芯片的光刻工艺的关键部件。半导体需求的指数式增长突出了发现新型光酸发生器的必要性。虽然使用深度生成模型的从头分子设计已被广泛用于药物发现和材料设计,但其应用于新型光致酸发生器的创建带来了一些独特的挑战,例如缺乏特性标签。在本文中,我们强调了这些挑战,并提出了一种生成建模方法,该方法利用预先训练的深度自动编码器和专家在环技术的条件生成。在主题专家的帮助下,评估了所提出方法的有效性,表明这种方法在新型光酸发生器之外的应用前景。 摘要:Photo-acid generators (PAGs) are compounds that release acids ($H^+$ ions) when exposed to light. These compounds are critical components of the photolithography processes that are used in the manufacture of semiconductor logic and memory chips. The exponential increase in the demand for semiconductors has highlighted the need for discovering novel photo-acid generators. While de novo molecule design using deep generative models has been widely employed for drug discovery and material design, its application to the creation of novel photo-acid generators poses several unique challenges, such as lack of property labels. In this paper, we highlight these challenges and propose a generative modeling approach that utilizes conditional generation from a pre-trained deep autoencoder and expert-in-the-loop techniques. The validity of the proposed approach was evaluated with the help of subject matter experts, indicating the promise of such an approach for applications beyond the creation of novel photo-acid generators.
【5】 PLSUM: Generating PT-BR Wikipedia by Summarizing Multiple Websites 标题:PLSUM:通过摘要多个网站生成PT-BR维基百科 链接:https://arxiv.org/abs/2112.01591
作者:André Seidel Oliveira,Anna Helena Reali Costa 备注:Published on Encontro Nacional de Intelig\^encia Artificial e Computacional (ENIAC) 2021 conference 摘要:维基百科是可理解知识的重要免费来源。尽管如此,巴西葡萄牙语维基百科仍然缺乏对许多主题的描述。为了扩展巴西维基百科,我们提供了PLSum,一个从多个描述性网站生成类似维基的抽象摘要的框架。该框架有一个提取阶段,然后是抽象阶段。特别是在抽象阶段,我们对Transformer神经网络的两种最新变化PTT5和Longformer进行了微调和比较。为了对模型进行微调和评估,我们创建了一个包含数千个示例的数据集,将参考网站链接到维基百科。我们的结果表明,从巴西葡萄牙语网页内容中生成有意义的摘要是可能的。 摘要:Wikipedia is an important free source of intelligible knowledge. Despite that, Brazilian Portuguese Wikipedia still lacks descriptions for many subjects. In an effort to expand the Brazilian Wikipedia, we contribute PLSum, a framework for generating wiki-like abstractive summaries from multiple descriptive websites. The framework has an extractive stage followed by an abstractive one. In particular, for the abstractive stage, we fine-tune and compare two recent variations of the Transformer neural network, PTT5, and Longformer. To fine-tune and evaluate the model, we created a dataset with thousands of examples, linking reference websites to Wikipedia. Our results show that it is possible to generate meaningful abstractive summaries from Brazilian Portuguese web content.
【6】 FuseDream: Training-Free Text-to-Image Generation with Improved CLIP+GAN Space Optimization 标题:FuseDream:通过改进的CLIP+GaN空间优化实现免训练的文本到图像生成 链接:https://arxiv.org/abs/2112.01573
作者:Xingchao Liu,Chengyue Gong,Lemeng Wu,Shujian Zhang,Hao Su,Qiang Liu 摘要:从自然语言指令生成图像是一项有趣但极具挑战性的任务。我们将重新训练的片段表示与现成的图像生成器(GANs)相结合,在GAN的潜在空间中进行优化,以找到在给定输入文本中获得最大片段分数的图像,从而实现文本到图像的生成。与从零开始从文本到图像训练生成模型的传统方法相比,CLIP+GAN方法无需训练,Zero-Shot,并且可以轻松地使用不同的生成器进行定制。然而,优化GAN空间中的剪辑分数会带来一个极具挑战性的优化问题,而现成的优化器(如Adam)无法产生令人满意的结果。在这项工作中,我们提出了一种FuseDream管道,它通过三个关键技术改进了CLIP+GAN方法:1)一个AugCLIP分数,它通过在图像上引入随机增强来增强CLIP目标的鲁棒性。2) 一种新颖的初始化和过参数化优化策略,使我们能够有效地导航GAN空间中的非凸地形。3) 一种合成生成技术,通过利用一种新的双层优化公式,可以合成多幅图像以扩展GAN空间并克服数据偏差。当通过不同的输入文本进行提升时,FuseDream可以生成具有不同对象、背景、艺术风格甚至新颖的反事实概念的高质量图像,而这些概念并未出现在我们使用的GAN的训练数据中。从数量上讲,FuseDream生成的图像在MS COCO数据集上产生顶级初始分数和FID分数,而无需额外的架构设计或训练。我们的代码可在\url公开获取{https://github.com/gnobitab/FuseDream}. 摘要:Generating images from natural language instructions is an intriguing yet highly challenging task. We approach text-to-image generation by combining the power of the retrained CLIP representation with an off-the-shelf image generator (GANs), optimizing in the latent space of GAN to find images that achieve maximum CLIP score with the given input text. Compared to traditional methods that train generative models from text to image starting from scratch, the CLIP+GAN approach is training-free, zero shot and can be easily customized with different generators. However, optimizing CLIP score in the GAN space casts a highly challenging optimization problem and off-the-shelf optimizers such as Adam fail to yield satisfying results. In this work, we propose a FuseDream pipeline, which improves the CLIP+GAN approach with three key techniques: 1) an AugCLIP score which robustifies the CLIP objective by introducing random augmentation on image. 2) a novel initialization and over-parameterization strategy for optimization which allows us to efficiently navigate the non-convex landscape in GAN space. 3) a composed generation technique which, by leveraging a novel bi-level optimization formulation, can compose multiple images to extend the GAN space and overcome the data-bias. When promoted by different input text, FuseDream can generate high-quality images with varying objects, backgrounds, artistic styles, even novel counterfactual concepts that do not appear in the training data of the GAN we use. Quantitatively, the images generated by FuseDream yield top-level Inception score and FID score on MS COCO dataset, without additional architecture design or training. Our code is publicly available at \url{https://github.com/gnobitab/FuseDream}.
【7】 Is Approximation Universally Defensive Against Adversarial Attacks in Deep Neural Networks? 标题:深度神经网络中的近似是否普遍防御敌意攻击? 链接:https://arxiv.org/abs/2112.01555
作者:Ayesha Siddique,Khaza Anuarul Hoque 备注:Accepted for publication in DATE 2022 摘要:近似计算以其在提高深度神经网络(DNN)加速器能量效率方面的有效性而闻名,但其代价是轻微的精度损失。最近,据报道,近似组件(如近似乘法器)的不精确性也成功地防御了对DNNs模型的对抗性攻击。由于近似误差以掩蔽或未掩蔽的形式穿过DNN层,这就提出了一个关键的研究问题:近似计算能否始终对DNN中的对抗性攻击提供防御,即它们是否具有普遍的防御能力?为此,我们使用最先进的近似乘法器对不同的近似DNN加速器(AXDNN)进行了广泛的对抗鲁棒性分析。特别是,我们使用MNIST和CIFAR-10数据集评估了十次对抗性攻击对不同AXDNN的影响。我们的结果表明,对AXDNN的对抗性攻击可导致53%的准确度损失,而相同的攻击可能导致准确DNN几乎没有准确度损失(低至0.06%)。因此,近似计算不能被称为对抗性攻击的通用防御策略。 摘要:Approximate computing is known for its effectiveness in improvising the energy efficiency of deep neural network (DNN) accelerators at the cost of slight accuracy loss. Very recently, the inexact nature of approximate components, such as approximate multipliers have also been reported successful in defending adversarial attacks on DNNs models. Since the approximation errors traverse through the DNN layers as masked or unmasked, this raises a key research question-can approximate computing always offer a defense against adversarial attacks in DNNs, i.e., are they universally defensive? Towards this, we present an extensive adversarial robustness analysis of different approximate DNN accelerators (AxDNNs) using the state-of-the-art approximate multipliers. In particular, we evaluate the impact of ten adversarial attacks on different AxDNNs using the MNIST and CIFAR-10 datasets. Our results demonstrate that adversarial attacks on AxDNNs can cause 53% accuracy loss whereas the same attack may lead to almost no accuracy loss (as low as 0.06%) in the accurate DNN. Thus, approximate computing cannot be referred to as a universal defense strategy against adversarial attacks.
【8】 Dynamic fracture of a bicontinuously nanostructured copolymer: A deep learning analysis of big-data-generating experiment 标题:双连续纳米共聚物的动态断裂:大数据生成实验的深度学习分析 链接:https://arxiv.org/abs/2112.01971
作者:Hanxun Jin,Rodney J. Clifton,Kyung-Suk Kim 备注:Submitted for Review in Journal of Mechanics and Physics of Solids (JMPS) 摘要:在这里,我们通过对动态大数据生成实验的深入学习分析,报告了双连续纳米结构共聚物聚脲在极高裂纹尖端加载速率下的动态断裂韧性和内聚参数。我们首先发明了一种新型的动态线图像剪切干涉仪(DL-ISI),它可以在单板冲击实验中沿样品背面投影覆盖裂纹萌生和扩展过程的线生成位移梯度-时间剖面。然后,我们提出了一种基于卷积神经网络(CNN)的深度学习框架,可以从DL-ISI条纹图像中反向确定精确的内聚参数。对带有中间平面裂纹的聚脲样品进行了平板碰撞实验,并用条件生成对抗网络(cGAN)修复了生成的DL-ISI条纹图像。首次利用预先训练好的CNN结构和计算数据集,成功地获得了聚脲的动态内聚参数,这与相关方法和线性断裂力学估计是一致的。聚脲具有明显的动态增韧作用,在相同的冲击速度下,聚脲的内聚强度几乎是对称冲击下层裂强度的三倍。这些实验结果填补了目前对共聚物在裂纹尖端附近极端局部载荷条件下的协同破坏强度理解的空白。本实验还展示了大数据生成实验的优势,该实验将创新的高通量实验技术与最先进的机器学习算法相结合。 摘要:Here, we report the dynamic fracture toughness as well as the cohesive parameters of a bicontinuously nanostructured copolymer, polyurea, under an extremely high crack-tip loading rate, from a deep-learning analysis of a dynamic big-data-generating experiment. We first invented a novel Dynamic Line-Image Shearing Interferometer (DL-ISI), which can generate the displacement-gradient - time profiles along a line on a sample's back surface projectively covering the crack initiation and growth process in a single plate impact experiment. Then, we proposed a convolutional neural network (CNN) based deep-learning framework that can inversely determine the accurate cohesive parameters from DL-ISI fringe images. Plate-impact experiments on a polyurea sample with a mid-plane crack have been performed, and the generated DL-ISI fringe image has been inpainted by a Conditional Generative Adversarial Networks (cGAN). For the first time, the dynamic cohesive parameters of polyurea have been successfully obtained by the pre-trained CNN architecture with the computational dataset, which is consistent with the correlation method and the linear fracture mechanics estimation. Apparent dynamic toughening is found in polyurea, where the cohesive strength is found to be nearly three times higher than the spall strength under the symmetric impact with the same impact speed. These experimental results fill the gap in the current understanding of copolymer's cooperative-failure strength under extreme local loading conditions near the crack tip. This experiment also demonstrates the advantages of big-data-generating experiments, which combine innovative high-throughput experimental techniques with state-of-the-art machine learning algorithms.
半/弱/无/有监督|不确定性|主动学习(9篇)
【1】 Hierarchical Optimal Transport for Unsupervised Domain Adaptation 标题:无监督域自适应的分层最优传输算法 链接:https://arxiv.org/abs/2112.02073
作者:Mourad El Hamri,Younès Bennani,Issam Falih,Hamid Ahaggach 摘要:在本文中,我们提出了一种新的无监督域自适应方法,该方法结合了最优传输、学习概率测度和无监督学习的概念。所提出的方法HOT-DA基于优化传输的分层公式,该公式利用了地面度量捕获的几何信息之外的源域和目标域中更丰富的结构信息。标记源域中的附加信息是通过根据类标签将样本分组到结构中本能地形成的。而在未标记的目标域中探索隐藏结构的问题则归结为通过Wasserstein重心学习概率测度的问题,我们证明了它等价于谱聚类。在一个复杂度可控的玩具数据集和两个具有挑战性的视觉适应数据集上的实验表明,该方法优于现有的方法。 摘要:In this paper, we propose a novel approach for unsupervised domain adaptation, that relates notions of optimal transport, learning probability measures and unsupervised learning. The proposed approach, HOT-DA, is based on a hierarchical formulation of optimal transport, that leverages beyond the geometrical information captured by the ground metric, richer structural information in the source and target domains. The additional information in the labeled source domain is formed instinctively by grouping samples into structures according to their class labels. While exploring hidden structures in the unlabeled target domain is reduced to the problem of learning probability measures through Wasserstein barycenter, which we prove to be equivalent to spectral clustering. Experiments on a toy dataset with controllable complexity and two challenging visual adaptation datasets show the superiority of the proposed approach over the state-of-the-art.
【2】 Boosting Unsupervised Domain Adaptation with Soft Pseudo-label and Curriculum Learning 标题:利用软伪标签和课程学习促进无监督领域适应 链接:https://arxiv.org/abs/2112.01948
作者:Shengjia Zhang,Tiancheng Lin,Yi Xu 备注:28 pages 摘要:通过利用来自完全标记源域的数据,无监督域自适应(UDA)通过显式差异最小化数据分布或对抗性学习,提高了未标记目标域的分类性能。作为一种增强,在自适应过程中涉及类别对齐,以利用模型预测增强目标特征识别。然而,由于对目标域的错误类别预测而导致的伪标签不准确,以及由于对源域的过度拟合而导致的分布偏差等问题仍然没有得到解决。在本文中,我们提出了一个模型不可知的两阶段学习框架,该框架使用软伪标记策略大大减少了有缺陷的模型预测,并避免了使用课程学习策略对源域进行过度拟合。理论上,它成功地将组合风险降低到目标域的预期误差上限。在第一阶段,我们使用基于分布对齐的UDA方法训练模型,以获得目标域上具有较高可信度的软语义标签。为了避免源域上的过度拟合,在第二阶段,我们提出了一种课程学习策略,以自适应地控制两个域损失之间的权重,从而使训练阶段的重点逐渐从源分布转移到目标分布,并在目标域上提高预测置信度。在两个著名的基准数据集上进行的大量实验验证了我们提出的框架在提高排名靠前的UDA算法性能方面的普遍有效性,并证明了其一致的优越性能。 摘要:By leveraging data from a fully labeled source domain, unsupervised domain adaptation (UDA) improves classification performance on an unlabeled target domain through explicit discrepancy minimization of data distribution or adversarial learning. As an enhancement, category alignment is involved during adaptation to reinforce target feature discrimination by utilizing model prediction. However, there remain unexplored problems about pseudo-label inaccuracy incurred by wrong category predictions on target domain, and distribution deviation caused by overfitting on source domain. In this paper, we propose a model-agnostic two-stage learning framework, which greatly reduces flawed model predictions using soft pseudo-label strategy and avoids overfitting on source domain with a curriculum learning strategy. Theoretically, it successfully decreases the combined risk in the upper bound of expected error on the target domain. At the first stage, we train a model with distribution alignment-based UDA method to obtain soft semantic label on target domain with rather high confidence. To avoid overfitting on source domain, at the second stage, we propose a curriculum learning strategy to adaptively control the weighting between losses from the two domains so that the focus of the training stage is gradually shifted from source distribution to target distribution with prediction confidence boosted on the target domain. Extensive experiments on two well-known benchmark datasets validate the universal effectiveness of our proposed framework on promoting the performance of the top-ranked UDA algorithms and demonstrate its consistent superior performance.
【3】 Mind Your Clever Neighbours: Unsupervised Person Re-identification via Adaptive Clustering Relationship Modeling 标题:小心你的聪明邻居:通过自适应聚类关系建模实现无人监督的重新身份识别 链接:https://arxiv.org/abs/2112.01839
作者:Lianjie Jia,Chenyang Yu,Xiehao Ye,Tianyu Yan,Yinjie Lei,Pingping Zhang 备注:This work has been accepted by AAAI-2022. Some modifications may be performed for the final version 摘要:无监督人员再识别(re-ID)因其解决有监督re-ID模型可扩展性问题的潜力而受到越来越多的关注。大多数现有的无监督方法采用迭代聚类机制,其中网络是基于无监督聚类生成的伪标签进行训练的。然而,聚类错误是不可避免的。为了生成高质量的伪标签并减轻聚类错误的影响,我们提出了一种新的无监督人员Re-ID聚类关系建模框架。具体而言,在聚类之前,基于图相关学习(GCL)探索未标记图像之间的关系模型和细化后的特征用于聚类,生成高质量的伪标签。因此,GCL自适应挖掘小批量样本之间的关系,以减少训练时异常聚类的影响。为了更有效地训练网络,我们进一步提出了一种具有选择性记忆库更新策略的选择性对比学习(SCL)方法。大量实验表明,在Market1501、DukeMTMC reID和MSMT17数据集上,我们的方法比大多数最先进的无监督方法显示出更好的结果。我们将发布模型复制的代码。 摘要:Unsupervised person re-identification (Re-ID) attracts increasing attention due to its potential to resolve the scalability problem of supervised Re-ID models. Most existing unsupervised methods adopt an iterative clustering mechanism, where the network was trained based on pseudo labels generated by unsupervised clustering. However, clustering errors are inevitable. To generate high-quality pseudo-labels and mitigate the impact of clustering errors, we propose a novel clustering relationship modeling framework for unsupervised person Re-ID. Specifically, before clustering, the relation between unlabeled images is explored based on a graph correlation learning (GCL) module and the refined features are then used for clustering to generate high-quality pseudo-labels.Thus, GCL adaptively mines the relationship between samples in a mini-batch to reduce the impact of abnormal clustering when training. To train the network more effectively, we further propose a selective contrastive learning (SCL) method with a selective memory bank update policy. Extensive experiments demonstrate that our method shows much better results than most state-of-the-art unsupervised methods on Market1501, DukeMTMC-reID and MSMT17 datasets. We will release the code for model reproduction.
【4】 SSDL: Self-Supervised Dictionary Learning 标题:SSDL:自监督词典学习 链接:https://arxiv.org/abs/2112.01790
作者:Shuai Shao,Lei Xing,Wei Yu,Rui Xu,Yanjiang Wang,Baodi Liu 备注:Accepted by 22th IEEE International Conference on Multimedia and Expo (ICME) as an Oral 摘要:标签嵌入词典学习(DL)算法通过引入鉴别信息生成有影响力的词典。然而,存在一个局限性:所有嵌入标签的DL方法都依赖于标签,因为这种方法只能在监督学习中获得理想的性能。而在半监督和非监督学习中,它不再足够有效。受自我监督学习概念的启发(例如,设置借口任务以生成下游任务的通用模型),我们提出了一个自我监督字典学习(SSDL)框架来应对这一挑战。具体来说,我们首先设计一个$p$-Laplacian注意超图学习(pAHL)块作为为DL生成伪软标签的借口任务。然后,我们采用伪标签从主标签嵌入的DL方法训练字典。我们在两个人类活动识别数据集上评估我们的SSDL。与其他先进方法的比较结果证明了SSDL的有效性。 摘要:The label-embedded dictionary learning (DL) algorithms generate influential dictionaries by introducing discriminative information. However, there exists a limitation: All the label-embedded DL methods rely on the labels due that this way merely achieves ideal performances in supervised learning. While in semi-supervised and unsupervised learning, it is no longer sufficient to be effective. Inspired by the concept of self-supervised learning (e.g., setting the pretext task to generate a universal model for the downstream task), we propose a Self-Supervised Dictionary Learning (SSDL) framework to address this challenge. Specifically, we first design a $p$-Laplacian Attention Hypergraph Learning (pAHL) block as the pretext task to generate pseudo soft labels for DL. Then, we adopt the pseudo labels to train a dictionary from a primary label-embedded DL method. We evaluate our SSDL on two human activity recognition datasets. The comparison results with other state-of-the-art methods have demonstrated the efficiency of SSDL.
【5】 Probabilistic Contrastive Loss for Self-Supervised Learning 标题:一种基于概率对比损失的自监督学习方法 链接:https://arxiv.org/abs/2112.01642
作者:Shen Li,Jianqing Xu,Bryan Hooi 摘要:提出了一种用于自监督学习的概率对比损失函数。众所周知的对比损失是确定性的,涉及到一个温度超参数,该参数在两个标准化特征嵌入之间缩放内积。通过将温度超参数重新解释为与超球体半径相关的量,我们导出了一个新的损失函数,该函数包含一个置信度,该置信度以数学基础的方式量化不确定性。一些有趣的性质提出的损失函数的经验证明,这符合人类的预测。我们相信本研究为对比学习领域带来了新的前景。 摘要:This paper proposes a probabilistic contrastive loss function for self-supervised learning. The well-known contrastive loss is deterministic and involves a temperature hyperparameter that scales the inner product between two normed feature embeddings. By reinterpreting the temperature hyperparameter as a quantity related to the radius of the hypersphere, we derive a new loss function that involves a confidence measure which quantifies uncertainty in a mathematically grounding manner. Some intriguing properties of the proposed loss function are empirically demonstrated, which agree with human-like predictions. We believe the present work brings up a new prospective to the area of contrastive learning.
【6】 Scheduling to Learn In An Unsupervised Online Streaming Model 标题:一种无监督在线流媒体模型中的学习调度 链接:https://arxiv.org/abs/2112.01576
作者:R. Vaze,Santanu Rathod 摘要:考虑无监督在线流媒体模型,其中样本以在线方式到达超过$T$slot的位置。有$M$分类器,其混淆矩阵是先验未知的。在每个插槽中,任何分类器最多可以标记一个样本。样本的准确度是从各种分类器获得的标签集的函数。样本的效用是其精度减去响应时间(出发时隙和到达时隙之差)的标量倍数,其中出发时隙也由算法决定。由于每个分类器每个时隙最多只能标记一个样本,因此在为特定样本获取更大的一组标签以提高其准确性和响应时间之间存在折衷。考虑最大化所有样本效用之和的问题,其中混淆矩阵的学习、样本分类器匹配分配和样本离开时隙决策相互依赖。该算法首先学习混淆矩阵,然后使用贪婪算法进行样本分类器匹配。当样本的增量效用变为非正时,样本将离开。我们证明了该算法的竞争比为$\frac{1}{2}-{\mathcal O}\left(\frac{\log T}{T}\right)$。 摘要:An unsupervised online streaming model is considered where samples arrive in an online fashion over $T$ slots. There are $M$ classifiers, whose confusion matrices are unknown a priori. In each slot, at most one sample can be labeled by any classifier. The accuracy of a sample is a function of the set of labels obtained for it from various classifiers. The utility of a sample is a scalar multiple of its accuracy minus the response time (difference of the departure slot and the arrival slot), where the departure slot is also decided by the algorithm. Since each classifier can label at most one sample per slot, there is a tradeoff between obtaining a larger set of labels for a particular sample to improve its accuracy, and its response time. The problem of maximizing the sum of the utilities of all samples is considered, where learning the confusion matrices, sample-classifier matching assignment, and sample departure slot decisions depend on each other. The proposed algorithm first learns the confusion matrices, and then uses a greedy algorithm for sample-classifier matching. A sample departs once its incremental utility turns non-positive. We show that the competitive ratio of the proposed algorithm is $\frac{1}{2}-{\mathcal O}\left(\frac{\log T}{T}\right)$.
【7】 Divergent representations of ethological visual inputs emerge from supervised, unsupervised, and reinforcement learning 标题:行为学视觉输入的不同表示来自有监督、无监督和强化学习 链接:https://arxiv.org/abs/2112.02027
作者:Grace W. Lindsay,Josh Merel,Tom Mrsic-Flogel,Maneesh Sahani 备注:19 total pages, 8 main figures, 5 Supplementary figures 摘要:使用强化、有监督和无监督学习训练的人工神经系统都获得高维输入的内部表示。这些表征在多大程度上取决于不同的学习目标,这在很大程度上是未知的。在这里,我们比较了八种不同的卷积神经网络学习到的表示,每种网络都具有相同的ResNet结构,并在同一系列以自我为中心的图像上进行训练,但嵌入到不同的学习系统中。具体来说,在复合强化学习任务中,对表征进行训练以指导行动;在监督下预测一个或三个任务相关目标的组合;或者使用三种不同的无监督目标之一。使用表征相似性分析,我们发现强化学习训练的网络与其他网络的差异最大。通过使用神经科学文献中的指标进行进一步分析,我们发现,使用强化学习训练的模型具有稀疏的高维表示,其中单个图像用非常不同的神经活动模式表示。进一步的分析表明,这些表征可能是为了指导RL代理的长期行为和目标寻求。我们的研究结果揭示了目标函数对神经表征特性的影响,并为迁移学习方法提供了信息。 摘要:Artificial neural systems trained using reinforcement, supervised, and unsupervised learning all acquire internal representations of high dimensional input. To what extent these representations depend on the different learning objectives is largely unknown. Here we compare the representations learned by eight different convolutional neural networks, each with identical ResNet architectures and trained on the same family of egocentric images, but embedded within different learning systems. Specifically, the representations are trained to guide action in a compound reinforcement learning task; to predict one or a combination of three task-related targets with supervision; or using one of three different unsupervised objectives. Using representational similarity analysis, we find that the network trained with reinforcement learning differs most from the other networks. Through further analysis using metrics inspired by the neuroscience literature, we find that the model trained with reinforcement learning has a sparse and high-dimensional representation wherein individual images are represented with very different patterns of neural activity. Further analysis suggests these representations may arise in order to guide long-term behavior and goal-seeking in the RL agent. Our results provide insights into how the properties of neural representations are influenced by objective functions and can inform transfer learning approaches.
【8】 Bayes in Wonderland! Predictive supervised classification inference hits unpredictability 标题:仙境里的贝叶斯!预测监督分类推理命中不可预测性 链接:https://arxiv.org/abs/2112.01880
作者:Ali Amiryousefi,Ville Kinnula,Jing Tang 备注:arXiv admin note: text overlap with arXiv:2101.10950 摘要:与同步贝叶斯预测分类器(sBpc)不同,边缘贝叶斯预测分类器(mBpc)分别处理每个数据,因此默认独立于观测值。然而,由于生成模型参数的学习饱和,这种错误假设对mBpc精度的不利影响在训练数据量不断增加的情况下趋于减弱;保证这两个分类器在definetti类型的可交换性下收敛。然而,对于分区可交换性(PE)下生成的序列来说,这一结果远非微不足道,因为即使是无数的训练数据也不排除出现未观察到的结果的可能性(Wonderland!)。我们提供了一个计算方案,允许在PE下生成序列。在此基础上,通过控制训练数据的增加,我们证明了sBpc和mBpc的收敛性。这是使用更简单但计算效率更高的边缘分类器而不是同时使用的基础。我们还提供了生成模型的参数估计,该生成模型产生了分区可交换序列,以及该参数在不同样本间相等性的测试范式。Ewens抽样公式生成模型的贝叶斯预测监督分类、参数估计和假设检验包作为PEkit包存放在CRAN上,免费从https://github.com/AmiryousefiLab/PEkit. 摘要:The marginal Bayesian predictive classifiers (mBpc) as opposed to the simultaneous Bayesian predictive classifiers (sBpc), handle each data separately and hence tacitly assumes the independence of the observations. However, due to saturation in learning of generative model parameters, the adverse effect of this false assumption on the accuracy of mBpc tends to wear out in face of increasing amount of training data; guaranteeing the convergence of these two classifiers under de Finetti type of exchangeability. This result however, is far from trivial for the sequences generated under Partition exchangeability (PE), where even umpteen amount of training data is not ruling out the possibility of an unobserved outcome (Wonderland!). We provide a computational scheme that allows the generation of the sequences under PE. Based on that, with controlled increase of the training data, we show the convergence of the sBpc and mBpc. This underlies the use of simpler yet computationally more efficient marginal classifiers instead of simultaneous. We also provide a parameter estimation of the generative model giving rise to the partition exchangeable sequence as well as a testing paradigm for the equality of this parameter across different samples. The package for Bayesian predictive supervised classifications, parameter estimation and hypothesis testing of the Ewens Sampling formula generative model is deposited on CRAN as PEkit package and free available from https://github.com/AmiryousefiLab/PEkit.
【9】 Learning Curves for Sequential Training of Neural Networks: Self-Knowledge Transfer and Forgetting 标题:神经网络序贯训练的学习曲线:自我知识迁移与遗忘 链接:https://arxiv.org/abs/2112.01653
作者:Ryo Karakida,Shotaro Akaho 备注:31 pages, 6 figures 摘要:从一个任务到另一个任务的顺序训练正成为持续学习和迁移学习等深度学习应用的主要对象之一。然而,在何种条件下,经过训练的模型的性能会提高或降低,目前尚不清楚。为了加深我们对序贯训练的理解,本研究在一个可解决的持续学习案例中对泛化绩效进行了理论分析。我们考虑神经网络的神经切向核(NTK)制度,不断学习目标函数从任务到任务,并通过使用建立的统计力学分析核无脊回归调查推广。我们首先展示了从正迁移到负迁移的特征转换。超过特定临界值的更相似目标可以为后续任务实现积极的知识转移,而灾难性遗忘甚至在目标非常相似的情况下也会发生。接下来,我们研究了连续学习的一种变体,其中模型在多个任务中学习相同的目标函数。即使对于相同的目标,训练模型也会根据每个任务的样本大小显示一些迁移和遗忘。我们可以保证在样本量相等的情况下,泛化误差在任务之间单调减小,而样本量不平衡会恶化泛化。我们将这些改善和恶化分别称为自我知识转移和遗忘,并在深层神经网络的实际训练中进行了实证验证。 摘要:Sequential training from task to task is becoming one of the major objects in deep learning applications such as continual learning and transfer learning. Nevertheless, it remains unclear under what conditions the trained model's performance improves or deteriorates. To deepen our understanding of sequential training, this study provides a theoretical analysis of generalization performance in a solvable case of continual learning. We consider neural networks in the neural tangent kernel (NTK) regime that continually learn target functions from task to task, and investigate the generalization by using an established statistical mechanical analysis of kernel ridge-less regression. We first show characteristic transitions from positive to negative transfer. More similar targets above a specific critical value can achieve positive knowledge transfer for the subsequent task while catastrophic forgetting occurs even with very similar targets. Next, we investigate a variant of continual learning where the model learns the same target function in multiple tasks. Even for the same target, the trained model shows some transfer and forgetting depending on the sample size of each task. We can guarantee that the generalization error monotonically decreases from task to task for equal sample sizes while unbalanced sample sizes deteriorate the generalization. We respectively refer to these improvement and deterioration as self-knowledge transfer and forgetting, and empirically confirm them in realistic training of deep neural networks as well.
迁移|Zero/Few/One-Shot|自适应(3篇)
【1】 Adaptive Poincaré Point to Set Distance for Few-Shot Classification 标题:基于自适应Poincaré点的Few-Shot分类距离设置方法 链接:https://arxiv.org/abs/2112.01719
作者:Rongkai Ma,Pengfei Fang,Tom Drummond,Mehrtash Harandi 备注:Accepted at AAAI2022 摘要:从有限的示例中学习和概括,即Few-Shot学习,对于许多真实世界的视觉应用程序具有核心重要性。实现少量镜头学习的一个主要方法是实现嵌入,其中来自不同类的样本是不同的。最近的研究表明,通过双曲几何的嵌入对于层次和结构化数据具有较低的失真,因此适合于少量镜头学习。在本文中,我们建议学习一个上下文感知的双曲线度量来描述一个点和一个集合之间的距离,该距离与学习的集合到集合的距离相关。为此,我们将度量表示为双曲空间切线束上的加权和,并开发了一种基于点群自适应获取权重的机制。这不仅使度量成为局部的,而且还依赖于手头的任务,这意味着度量将根据其比较的样本进行调整。我们的经验表明,这种度量在存在异常值的情况下具有鲁棒性,并且与基线模型相比取得了明显的改进。这包括五个流行的少数镜头分类基准的最新结果,即mini ImageNet、分层ImageNet、加州理工大学UCSD Birds-200-2011(CUB)、CIFAR-FS和FC100。 摘要:Learning and generalizing from limited examples, i,e, few-shot learning, is of core importance to many real-world vision applications. A principal way of achieving few-shot learning is to realize an embedding where samples from different classes are distinctive. Recent studies suggest that embedding via hyperbolic geometry enjoys low distortion for hierarchical and structured data, making it suitable for few-shot learning. In this paper, we propose to learn a context-aware hyperbolic metric to characterize the distance between a point and a set associated with a learned set to set distance. To this end, we formulate the metric as a weighted sum on the tangent bundle of the hyperbolic space and develop a mechanism to obtain the weights adaptively and based on the constellation of the points. This not only makes the metric local but also dependent on the task in hand, meaning that the metric will adapt depending on the samples that it compares. We empirically show that such metric yields robustness in the presence of outliers and achieves a tangible improvement over baseline models. This includes the state-of-the-art results on five popular few-shot classification benchmarks, namely mini-ImageNet, tiered-ImageNet, Caltech-UCSD Birds-200-2011 (CUB), CIFAR-FS, and FC100.
【2】 AdaSplit: Adaptive Trade-offs for Resource-constrained Distributed Deep Learning 标题:AdaSplit:资源受限分布式深度学习的自适应权衡 链接:https://arxiv.org/abs/2112.01637
作者:Ayush Chopra,Surya Kant Sahu,Abhishek Singh,Abhinav Java,Praneeth Vepakomma,Vivek Sharma,Ramesh Raskar 摘要:分布式深度学习框架,如联邦学习(FL)及其变体,正在广泛的网络客户端和移动/物联网设备上实现个性化体验。然而,由于模型参数的爆炸式增长(例如,10亿个参数的模型),基于FL的框架受到客户端计算资源的限制。拆分学习(Split learning,SL)是一种最新的框架,它通过在客户端和服务器之间拆分模型训练来减少客户端计算负载。这种灵活性对于低计算设置非常有用,但通常是以增加带宽消耗为代价实现的,并且可能导致次优收敛,特别是在客户端数据异构的情况下。在这项工作中,我们介绍了AdaSplit,它通过减少带宽消耗和提高异构客户端的性能,使SL能够高效地扩展到低资源场景。为了捕获分布式深度学习的这种多维特性并对其进行基准测试,我们还引入了C3分数,这是一种在资源预算下评估性能的指标。我们通过与强大的联邦和分裂学习基线进行广泛的实验比较,验证了AdaSplit在有限资源下的有效性。我们还对AdaSplit中的关键设计选择进行了敏感性分析,验证了AdaSplit在可变资源预算中提供自适应权衡的能力。 摘要:Distributed deep learning frameworks like federated learning (FL) and its variants are enabling personalized experiences across a wide range of web clients and mobile/IoT devices. However, FL-based frameworks are constrained by computational resources at clients due to the exploding growth of model parameters (eg. billion parameter model). Split learning (SL), a recent framework, reduces client compute load by splitting the model training between client and server. This flexibility is extremely useful for low-compute setups but is often achieved at cost of increase in bandwidth consumption and may result in sub-optimal convergence, especially when client data is heterogeneous. In this work, we introduce AdaSplit which enables efficiently scaling SL to low resource scenarios by reducing bandwidth consumption and improving performance across heterogeneous clients. To capture and benchmark this multi-dimensional nature of distributed deep learning, we also introduce C3-Score, a metric to evaluate performance under resource budgets. We validate the effectiveness of AdaSplit under limited resources through extensive experimental comparison with strong federated and split learning baselines. We also present a sensitivity analysis of key design choices in AdaSplit which validates the ability of AdaSplit to provide adaptive trade-offs across variable resource budgets.
【3】 Residual-Based Adaptive Coefficient and Noise-Immunity ZNN for Perturbed Time-Dependent Quadratic Minimization 标题:基于残差的自适应系数和抗噪ZNN在摄动时变二次极小化中的应用 链接:https://arxiv.org/abs/2112.01773
作者:Chengze Jiang,Long Jin,Xiuchun Xiao 备注:9 pages, 25 figures 摘要:依赖时间的二次极小(TDQM)问题出现在许多应用和研究项目中。据报道,归零神经网络(ZNN)模型可以有效地解决TDQM问题。然而,由于缺乏自适应系数和积分增强项的联合作用机制,现有ZNN模型的收敛性和鲁棒性受到限制。因此,本文提出了一种基于残差的带积分项的自适应系数调零神经网络(RACZNN)模型来解决TDQM问题。提出了自适应系数以提高收敛性能,并嵌入积分项以保证RACZNN模型在受到各种测量噪声干扰时保持可靠的鲁棒性。与现有模型相比,本文提出的RACZNN模型具有更快的收敛速度和更可靠的鲁棒性。然后,证明了RACZNN模型的收敛性。最后,本文设计并进行了相应的定量数值实验,以验证所提出的RACZNN模型的性能。 摘要:The time-dependent quadratic minimization (TDQM) problem appears in many applications and research projects. It has been reported that the zeroing neural network (ZNN) models can effectively solve the TDQM problem. However, the convergent and robust performance of the existing ZNN models are restricted for lack of a joint-action mechanism of adaptive coefficient and integration enhanced term. Consequently, the residual-based adaption coefficient zeroing neural network (RACZNN) model with integration term is proposed in this paper for solving the TDQM problem. The adaptive coefficient is proposed to improve the performance of convergence and the integration term is embedded to ensure the RACZNN model can maintain reliable robustness while perturbed by variant measurement noises. Compared with the state-of-the-art models, the proposed RACZNN model owns faster convergence and more reliable robustness. Then, theorems are provided to prove the convergence of the RACZNN model. Finally, corresponding quantitative numerical experiments are designed and performed in this paper to verify the performance of the proposed RACZNN model.
强化学习(4篇)
【1】 Reinforcement Learning-Based Automatic Berthing System 标题:基于强化学习的自动靠泊系统 链接:https://arxiv.org/abs/2112.01879
作者:Daesoo Lee 摘要:以往基于人工神经网络(ANN)的自动靠泊系统研究表明,通过以船舶靠泊数据为训练数据对ANN进行训练,可以获得良好的靠泊性能。然而,由于人工神经网络需要大量的训练数据才能产生鲁棒性能,基于人工神经网络的自动靠泊系统由于难以获得靠泊数据而受到一定的限制。在本研究中,为了克服这一困难,基于强化学习(RL)算法之一的自动靠泊系统,即近端策略优化(PPO),由于RL算法可以通过与给定环境交互,通过试错学习最优控制策略,并且不需要任何预先获得的训练数据,因此提出了基于PPO的自动靠泊系统中的控制策略控制船舶的每秒转数(RPS)和舵角。最后,基于PPO的自动靠泊系统无需获取训练数据集,在实际靠泊应用中显示出巨大的潜力。 摘要:Previous studies on automatic berthing systems based on artificial neural network (ANN) showed great berthing performance by training the ANN with ship berthing data as training data. However, because the ANN requires a large amount of training data to yield robust performance, the ANN-based automatic berthing system is somewhat limited due to the difficulty in obtaining the berthing data. In this study, to overcome this difficulty, the automatic berthing system based on one of the reinforcement learning (RL) algorithms, proximal policy optimization (PPO), is proposed because the RL algorithms can learn an optimal control policy through trial-and-error by interacting with a given environment and does not require any pre-obtained training data, where the control policy in the proposed PPO-based automatic berthing system controls revolutions per second (RPS) and rudder angle of a ship. Finally, it is shown that the proposed PPO-based automatic berthing system eliminates the need for obtaining the training dataset and shows great potential for the actual berthing application.
【2】 Differentially Private Exploration in Reinforcement Learning with Linear Representation 标题:线性表示强化学习中的差分私密探索 链接:https://arxiv.org/abs/2112.01585
作者:Paul Luyo,Evrard Garcelon,Alessandro Lazaric,Matteo Pirotta 摘要:研究具有线性表示的马尔可夫决策过程(MDP)中的隐私保护探索。我们首先考虑线性混合MDPs(Ayoub等人,2020)(A.K.A.基于模型的设置)的设置,并为分析联合和局部差分私有(DP)勘探提供统一的框架。通过这个框架,我们证明了$(\epsilon,\delta)$局部DP探测的$\widetilde{O}(K^{3/4}/\sqrt{\epsilon})$遗憾界和$(\epsilon,\delta)$联合DP的$\widetilde{O}(\sqrt{K/\epsilon})$遗憾界。我们进一步研究了线性MDP中的隐私保护探索(Jin et al.,2020)(也称为无模型设置),其中我们提供了一个$\widetilde{O}(\sqrt{k/\epsilon})$(\epsilon,\delta)$联合DP的$\widetilde{O}(\sqrt{k/\epsilon})$后悔界,以及一种基于低切换的新算法。最后,我们对在这种无模型环境下设计局部DP算法的问题提供了见解。 摘要:This paper studies privacy-preserving exploration in Markov Decision Processes (MDPs) with linear representation. We first consider the setting of linear-mixture MDPs (Ayoub et al., 2020) (a.k.a.\ model-based setting) and provide an unified framework for analyzing joint and local differential private (DP) exploration. Through this framework, we prove a $\widetilde{O}(K^{3/4}/\sqrt{\epsilon})$ regret bound for $(\epsilon,\delta)$-local DP exploration and a $\widetilde{O}(\sqrt{K/\epsilon})$ regret bound for $(\epsilon,\delta)$-joint DP. We further study privacy-preserving exploration in linear MDPs (Jin et al., 2020) (a.k.a.\ model-free setting) where we provide a $\widetilde{O}(\sqrt{K/\epsilon})$ regret bound for $(\epsilon,\delta)$-joint DP, with a novel algorithm based on low-switching. Finally, we provide insights into the issues of designing local DP algorithms in this model-free setting.
【3】 Towards Intrinsic Interactive Reinforcement Learning: A Survey 标题:本征交互式强化学习研究综述 链接:https://arxiv.org/abs/2112.01575
作者:Benjamin Poole,Minwoo Lee 摘要:强化学习(RL)和脑机接口(BCI)是过去十年中不断发展的两个领域。直到最近,这些领域还彼此独立运作。随着人们对人在回路(HITL)应用的兴趣不断增加,RL算法已被用于解释人的引导,从而产生了交互式强化学习(IRL)的子领域。与之相邻的是,BCI应用长期以来一直对从人机交互过程中的神经活动中提取内在反馈感兴趣。这两个想法通过将BCI集成到IRL框架中,使RL和BCI相互冲突,在IRL框架中,可以利用内在反馈来帮助训练代理。该交叉点被表示为内在IRL。为了进一步促进BCI和IRL的深层次讨好,我们对内在IRL进行了回顾,重点介绍了反馈驱动IRL的父领域,同时还就有效性、挑战和未来研究方向进行了讨论。 摘要:Reinforcement learning (RL) and brain-computer interfaces (BCI) are two fields that have been growing over the past decade. Until recently, these fields have operated independently of one another. With the rising interest in human-in-the-loop (HITL) applications, RL algorithms have been adapted to account for human guidance giving rise to the sub-field of interactive reinforcement learning (IRL). Adjacently, BCI applications have been long interested in extracting intrinsic feedback from neural activity during human-computer interactions. These two ideas have set RL and BCI on a collision course for one another through the integration of BCI into the IRL framework where intrinsic feedback can be utilized to help train an agent. This intersection has been denoted as intrinsic IRL. To further help facilitate deeper ingratiation of BCI and IRL, we provide a review of intrinsic IRL with an emphasis on its parent field of feedback-driven IRL while also providing discussions concerning the validity, challenges, and future research directions.
【4】 Reinforcement learning for options on target volatility funds 标题:目标波动率基金期权的强化学习 链接:https://arxiv.org/abs/2112.01841
作者:Roberto Daluiso,Emanuele Nastasi,Andrea Pallavicini,Stefano Polo 摘要:在这项工作中,我们处理了由于对冲目标波动率策略(TVS)、风险资产组合和无风险资产组合下的风险证券而增加的融资成本,以便将组合的已实现波动率保持在一定水平。TVS风险投资组合构成中的不确定性以及每个组成部分的套期保值成本差异需要解决一个控制问题来评估期权价格。我们推导了Black和Scholes(BS)情形下问题的解析解。然后,在局部波动率(LV)模型下,我们使用强化学习(RL)技术来确定导致最保守价格的基金组成,对于局部波动率(LV)模型,先验解不可用。我们展示了RL代理的性能如何与通过将BS分析策略应用于TVS动力学而获得的性能兼容,因此在LV场景中也具有竞争力。 摘要:In this work we deal with the funding costs rising from hedging the risky securities underlying a target volatility strategy (TVS), a portfolio of risky assets and a risk-free one dynamically rebalanced in order to keep the realized volatility of the portfolio on a certain level. The uncertainty in the TVS risky portfolio composition along with the difference in hedging costs for each component requires to solve a control problem to evaluate the option prices. We derive an analytical solution of the problem in the Black and Scholes (BS) scenario. Then we use Reinforcement Learning (RL) techniques to determine the fund composition leading to the most conservative price under the local volatility (LV) model, for which an a priori solution is not available. We show how the performances of the RL agents are compatible with those obtained by applying path-wise the BS analytical strategy to the TVS dynamics, which therefore appears competitive also in the LV scenario.
符号|符号学习(1篇)
【1】 Combining Sub-Symbolic and Symbolic Methods for Explainability 标题:子符号法和符号法相结合的可解释性研究 链接:https://arxiv.org/abs/2112.01844
作者:Anna Himmelhuber,Stephan Grimm,Sonja Zillner,Mitchell Joblin,Martin Ringsquandl,Thomas Runkler 备注:RuleML+RR 2021 摘要:与其他联结主义模型类似,图形神经网络(GNN)在决策过程中缺乏透明度。为了深入了解GNN决策过程,已经开发了许多次符号方法。这些是解释性的第一个重要步骤,但对于非人工智能专家的用户来说,生成的解释通常很难理解。为了克服这个问题,我们引入了一种概念方法,将亚符号和符号方法相结合,用于以人为中心的解释,该方法结合了领域知识和因果关系。我们还引入了保真度的概念,作为评估解释与GNN内部决策过程的接近程度的指标。通过对一个化学数据集和本体的评估,表明了该方法的解释价值和可靠性。 摘要:Similarly to other connectionist models, Graph Neural Networks (GNNs) lack transparency in their decision-making. A number of sub-symbolic approaches have been developed to provide insights into the GNN decision making process. These are first important steps on the way to explainability, but the generated explanations are often hard to understand for users that are not AI experts. To overcome this problem, we introduce a conceptual approach combining sub-symbolic and symbolic methods for human-centric explanations, that incorporate domain knowledge and causality. We furthermore introduce the notion of fidelity as a metric for evaluating how close the explanation is to the GNN's internal decision making process. The evaluation with a chemical dataset and ontology shows the explanatory value and reliability of our method.
医学相关(1篇)
【1】 Modelling and optimization of nanovector synthesis for applications in drug delivery systems 标题:纳米矢量合成在药物输送系统中应用的建模与优化 链接:https://arxiv.org/abs/2112.02002
作者:Felipe J. Villaseñor-Cavazos,Daniel Torres-Valladares,Omar Lozano 备注:19 pages, 8 figures, 7 tables 摘要:纳米载体(NVs),基于纳米结构物质,如纳米颗粒(NPs),已被证明是优秀的药物输送系统。然而,由于潜在的NVs种类繁多,包括NPs材料及其功能化,以及大量可运输的分子,该领域在寻找具有最佳物理化学性质(如粒径和载药量)的NVs的资源方面面临巨大挑战,在这里,大部分工作都依赖于试错实验。在这方面,人工智能(AI)和元启发式算法分别提供了最先进的建模和优化效率。本综述通过系统研究,重点介绍了人工智能和元启发式算法在药物释放系统中用于纳米颗粒合成的应用。主要发现是:与线性回归算法和响应面方法相比,神经网络在建模NVs特性方面更为出色,比较AI或元启发式算法的研究数量非常有限,并且没有关于样本大小计算适当性的信息。基于这些发现,多层感知器人工神经网络和自适应神经模糊推理系统在NV数据集上进行了建模性能测试;找到后者是更好的算法。对于元启发式算法,采用布谷鸟搜索、萤火虫算法、遗传算法和共生生物搜索对基准函数进行优化;寻找性能最好的布谷鸟搜索和共生生物搜索。最后,讨论了人工智能算法适当样本大小的估计方法。 摘要:Nanovectors (NVs), based on nanostructured matter such as nanoparticles (NPs), have proven to perform as excellent drug delivery systems. However, due to the great variety of potential NVs, including NPs materials and their functionalization, in addition to the plethora of molecules that could transport, this fields presents a great challenge in terms of resources to find NVs with the most optimal physicochemical properties such as particle size and drug loading, where most of efforts rely on trial and error experimentation. In this regard, Artificial intelligence (AI) and metaheuristic algorithms offer efficient of the state-of-the-art modelling and optimization, respectively. This review focuses, through a systematic search, on the use of artificial intelligence and metaheuristic algorithms for nanoparticle synthesis in drug delivery systems. The main findings are: neural networks are better at modelling NVs properties than linear regression algorithms and response surface methodology, there is a very limited number of studies comparing AI or metaheuristic algorithm, and there is no information regarding the appropriateness of calculations of the sample size. Based on these findings, multilayer perceptron artificial neural network and adaptive neuro fuzzy inference system were tested for their modelling performance with a NV dataset; finding the latter the better algorithm. For metaheuristic algorithms, benchmark functions were optimized with cuckoo search, firefly algorithm, genetic algorithm and symbiotic organism search; finding cuckoo search and symbiotic organism search with the best performance. Finally, methods to estimate appropriate sample size for AI algorithms are discussed.
蒸馏|知识提取(1篇)
【1】 Cross-modal Knowledge Distillation for Vision-to-Sensor Action Recognition 标题:用于视觉到传感器动作识别的跨模态知识提取 链接:https://arxiv.org/abs/2112.01849
作者:Jianyuan Ni,Raunak Sarbajna,Yang Liu,Anne H. H. Ngu,Yan Yan 备注:5 pages, 2 figures, submitted to ICASSP2022 摘要:最近,基于多模态方法的人类活动识别(HAR)被证明可以提高HAR的准确性和性能。然而,与可穿戴设备(如smartwatch)相关的有限计算资源无法直接支持此类高级方法。为了解决这个问题,本研究引入了一个端到端的传感器知识提取(VSKD)框架。在该VSKD框架中,测试阶段仅需要可穿戴设备的时间序列数据,即加速度计数据。因此,该框架不仅可以减少对边缘设备的计算需求,还可以生成与计算昂贵的多模式方法的性能密切匹配的学习模型。为了保持局部时间关系和便于视觉深度学习模型,我们首先采用基于格拉米安角场(GAF)的编码方法将时间序列数据转换为二维图像。在本研究中,我们分别采用了ResNet18和多尺度TRN(以BN为起始点)作为教师和学生网络。提出了一种新的损失函数,称为距离和角度语义知识损失(DASK),用于缓解视觉和传感器域之间的模态变化。在UTD-MHAD、MMAct和Berkeley MHAD数据集上的大量实验结果证明了所提出的可部署在可穿戴传感器上的VSKD模型的有效性和竞争力。 摘要:Human activity recognition (HAR) based on multi-modal approach has been recently shown to improve the accuracy performance of HAR. However, restricted computational resources associated with wearable devices, i.e., smartwatch, failed to directly support such advanced methods. To tackle this issue, this study introduces an end-to-end Vision-to-Sensor Knowledge Distillation (VSKD) framework. In this VSKD framework, only time-series data, i.e., accelerometer data, is needed from wearable devices during the testing phase. Therefore, this framework will not only reduce the computational demands on edge devices, but also produce a learning model that closely matches the performance of the computational expensive multi-modal approach. In order to retain the local temporal relationship and facilitate visual deep learning models, we first convert time-series data to two-dimensional images by applying the Gramian Angular Field ( GAF) based encoding method. We adopted ResNet18 and multi-scale TRN with BN-Inception as teacher and student network in this study, respectively. A novel loss function, named Distance and Angle-wised Semantic Knowledge loss (DASK), is proposed to mitigate the modality variations between the vision and the sensor domain. Extensive experimental results on UTD-MHAD, MMAct, and Berkeley-MHAD datasets demonstrate the effectiveness and competitiveness of the proposed VSKD model which can deployed on wearable sensors.
聚类(1篇)
【1】 Trajectory Clustering Performance Evaluation: If we know the answer, it's not clustering 标题:轨迹聚类性能评估:如果我们知道答案,那就不是聚类 链接:https://arxiv.org/abs/2112.01570
作者:Mohsen Rezaie,Nicolas Saunier 摘要:智能交通系统(ITS)的进步使得通过自动数据采集可以获得大量的交通数据。这些数据的很大一部分存储为移动车辆和道路使用者的轨迹。在最少的人工监督下自动分析这些数据既可以降低成本,又可以消除分析的主观性。轨迹聚类是一项无监督的任务。在本文中,我们使用来自七个交叉口的轨迹数据对相似性度量、聚类算法和评估度量进行了综合比较。我们还提出了一种基于原点和终点自动生成轨迹参考簇的方法,用于基于标签的评估度量。因此,整个过程在聚类和评估级别上都没有监督。最后,我们使用一组评估指标来寻找每个交叉点的最佳相似性指标和聚类算法。结果表明,距离和聚类算法的单一组合并不总是前十大聚类设置之一。 摘要:Advancements in Intelligent Traffic Systems (ITS) have made huge amounts of traffic data available through automatic data collection. A big part of this data is stored as trajectories of moving vehicles and road users. Automatic analysis of this data with minimal human supervision would both lower the costs and eliminate subjectivity of the analysis. Trajectory clustering is an unsupervised task. In this paper, we perform a comprehensive comparison of similarity measures, clustering algorithms and evaluation measures using trajectory data from seven intersections. We also propose a method to automatically generate trajectory reference clusters based on their origin and destination points to be used for label-based evaluation measures. Therefore, the entire procedure remains unsupervised both in clustering and evaluation levels. Finally, we use a combination of evaluation measures to find the top performing similarity measures and clustering algorithms for each intersection. The results show that there is no single combination of distance and clustering algorithm that is always among the top ten clustering setups.
自动驾驶|车辆|车道检测等(1篇)
【1】 Causal-based Time Series Domain Generalization for Vehicle Intention Prediction 标题:基于因果关系的汽车意向预测时域泛化 链接:https://arxiv.org/abs/2112.02093
作者:Yeping Hu,Xiaogang Jia,Masayoshi Tomizuka,Wei Zhan 备注:Accepted by NeurIPS 2021 Workshop on Distribution Shifts 摘要:准确预测交通参与者可能的行为是自动驾驶车辆的基本能力。由于自动驾驶车辆需要在动态变化的环境中导航,因此无论在何处以及遇到什么样的驾驶环境,都需要进行准确的预测。因此,当自动驾驶车辆部署在现实世界中时,对未知领域的泛化能力对于预测模型至关重要。本文针对车辆意图预测任务的领域泛化问题,提出了一种基于因果关系的时间序列领域泛化(CTSDG)模型。我们构建了一个车辆意图预测任务的结构因果模型,以学习输入驾驶数据的不变表示,用于领域泛化。我们进一步将递归潜在变量模型集成到我们的结构因果模型中,以更好地从时间序列输入数据中捕获时间潜在依赖性。我们的方法的有效性通过真实驾驶数据进行评估。我们证明,与其他最先进的领域泛化和行为预测方法相比,我们提出的方法在预测精度上有一致的提高。 摘要:Accurately predicting possible behaviors of traffic participants is an essential capability for autonomous vehicles. Since autonomous vehicles need to navigate in dynamically changing environments, they are expected to make accurate predictions regardless of where they are and what driving circumstances they encountered. Therefore, generalization capability to unseen domains is crucial for prediction models when autonomous vehicles are deployed in the real world. In this paper, we aim to address the domain generalization problem for vehicle intention prediction tasks and a causal-based time series domain generalization (CTSDG) model is proposed. We construct a structural causal model for vehicle intention prediction tasks to learn an invariant representation of input driving data for domain generalization. We further integrate a recurrent latent variable model into our structural causal model to better capture temporal latent dependencies from time-series input data. The effectiveness of our approach is evaluated via real-world driving data. We demonstrate that our proposed method has consistent improvement on prediction accuracy compared to other state-of-the-art domain generalization and behavior prediction methods.
点云|SLAM|雷达|激光|深度RGBD相关(2篇)
【1】 Bridging the Gap: Point Clouds for Merging Neurons in Connectomics 标题:跨越鸿沟:连接学中用于合并神经元的点云 链接:https://arxiv.org/abs/2112.02039
作者:Jules Berman,Dmitri B. Chklovskii,Jingpeng Wu 备注:10 pages, 6 figures, MIDL 2022 摘要:在连接组学领域,一个主要问题是三维神经元分割。尽管基于深度学习的方法已经取得了显著的精度,但仍然存在误差,特别是在图像缺陷区域。一种常见的缺陷类型是连续缺失图像部分。在这里,数据沿着某个轴丢失,由此产生的神经元分割被分割开。为了解决这个问题,我们提出了一种基于神经元点云表示的新方法。我们将其表述为一个分类问题,并训练CurveNet(一种最先进的点云分类模型)来确定应该合并哪些神经元。我们表明,我们的方法不仅表现出很强的性能,而且可以合理地扩展到远远超出其他方法试图解决的差距。此外,我们的点云表示在数据方面是高效的,能够在大量数据的情况下保持高性能,这对于其他方法来说是不可行的。我们认为,这是一个指标的可行性使用点云表示的其他校对任务。 摘要:In the field of Connectomics, a primary problem is that of 3D neuron segmentation. Although Deep Learning based methods have achieved remarkable accuracy, errors still exist, especially in regions with image defects. One common type of defect is that of consecutive missing image sections. Here data is lost along some axis, and the resulting neuron segmentations are split across the gap. To address this problem, we propose a novel method based on point cloud representations of neurons. We formulate this as a classification problem and train CurveNet, a state-of-the-art point cloud classification model, to identify which neurons should be merged. We show that our method not only performs strongly but scales reasonably to gaps well beyond what other methods have attempted to address. Additionally, our point cloud representations are highly efficient in terms of data, maintaining high performance with an amount of data that would be unfeasible for other methods. We believe that this is an indicator of the viability of using point clouds representations for other proofreading tasks.
【2】 In situ process quality monitoring and defect detection for direct metal laser melting 标题:金属直接激光熔化工艺质量在线监测与缺陷检测 链接:https://arxiv.org/abs/2112.01921
作者:Sarah Felix,Saikat Ray Majumder,H. Kirk Mathews,Michael Lexa,Gabriel Lipsa,Xiaohu Ping,Subhrajit Roychowdhury,Thomas Spears 备注:16 pages, 4 figures 摘要:质量控制和质量保证是直接金属激光熔化(DMLM)面临的挑战。间歇性机器诊断和下游零件检查在处理有缺陷的零件产生不适当的成本后发现问题。在本文中,我们演示了两种用于过程中故障检测和零件质量预测的方法,这两种方法可以很容易地部署在现有的商用DMLM系统上,只需对硬件进行最小的修改。新的特征来自普通光电二极管传感器的时间序列以及标准机器控制信号。贝叶斯方法将测量归因于多个过程状态之一,最小二乘回归模型预测某些材料缺陷的严重程度。 摘要:Quality control and quality assurance are challenges in Direct Metal Laser Melting (DMLM). Intermittent machine diagnostics and downstream part inspections catch problems after undue cost has been incurred processing defective parts. In this paper we demonstrate two methodologies for in-process fault detection and part quality prediction that can be readily deployed on existing commercial DMLM systems with minimal hardware modification. Novel features were derived from the time series of common photodiode sensors along with standard machine control signals. A Bayesian approach attributes measurements to one of multiple process states and a least squares regression model predicts severity of certain material defects.
推理|分析|理解|解释(6篇)
【1】 Application of Machine Learning in understanding plant virus pathogenesis: Trends and perspectives on emergence, diagnosis, host-virus interplay and management 标题:机器学习在植物病毒致病机理研究中的应用:出现、诊断、宿主-病毒相互作用和管理方面的趋势和展望 链接:https://arxiv.org/abs/2112.01998
作者:Dibyendu Ghosh,Srija Chakraborty,Hariprasad Kodamana,Supriya Chakraborty 摘要:近年来,高通量技术在生物学领域的应用产生了大量的生物学数据。现在,将这些海量数据转化为知识是计算生物学的主要挑战。传统的数据分析方法无法完成这项任务。因此,研究人员正在转向基于机器学习的方法来分析高维大数据。在机器学习中,一旦使用训练数据集对模型进行训练,就可以将其应用于独立的测试数据集。当前,深度学习算法进一步推动了机器学习在包括植物病毒学在内的生物学领域的应用。考虑到机器学习在理解植物病毒学方面的应用取得了重大进展,本文着重介绍了机器学习,并全面讨论了机器学习在诊断病毒疾病、理解宿主病毒相互作用和植物病毒出现方面的趋势和前景。 摘要:Inclusion of high throughput technologies in the field of biology has generated massive amounts of biological data in the recent years. Now, transforming these huge volumes of data into knowledge is the primary challenge in computational biology. The traditional methods of data analysis have failed to carry out the task. Hence, researchers are turning to machine learning based approaches for the analysis of high-dimensional big data. In machine learning, once a model is trained with a training dataset, it can be applied on a testing dataset which is independent. In current times, deep learning algorithms further promote the application of machine learning in several field of biology including plant virology. Considering a significant progress in the application of machine learning in understanding plant virology, this review highlights an introductory note on machine learning and comprehensively discusses the trends and prospects of machine learning in diagnosis of viral diseases, understanding host-virus interplay and emergence of plant viruses.
【2】 Active Inference in Robotics and Artificial Agents: Survey and Challenges 标题:机器人和人工智能体中的主动推理:综述和挑战 链接:https://arxiv.org/abs/2112.01871
作者:Pablo Lanillos,Cristian Meo,Corrado Pezzato,Ajith Anil Meera,Mohamed Baioumy,Wataru Ohata,Alexander Tschantz,Beren Millidge,Martijn Wisse,Christopher L. Buckley,Jun Tani 备注:This manuscript is under review in a IEEE journal 摘要:主动推理是一种数学框架,起源于计算神经科学,是一种关于大脑如何执行动作、感知和学习的理论。最近,它被证明是一种很有前途的方法,在不确定状态下的状态估计和控制问题,以及基础建设的目标驱动行为在机器人和人工智能体一般。在这里,我们回顾了用于状态估计、控制、规划和学习的主动推理的最新理论和实现;描述当前的成就,特别关注机器人技术。我们展示了相关的实验,展示了它在适应性、泛化和鲁棒性方面的潜力。此外,我们将此方法与其他框架联系起来,并讨论其预期的好处和挑战:使用变分贝叶斯推理的具有功能生物学合理性的统一框架。 摘要:Active inference is a mathematical framework which originated in computational neuroscience as a theory of how the brain implements action, perception and learning. Recently, it has been shown to be a promising approach to the problems of state-estimation and control under uncertainty, as well as a foundation for the construction of goal-driven behaviours in robotics and artificial agents in general. Here, we review the state-of-the-art theory and implementations of active inference for state-estimation, control, planning and learning; describing current achievements with a particular focus on robotics. We showcase relevant experiments that illustrate its potential in terms of adaptation, generalization and robustness. Furthermore, we connect this approach with other frameworks and discuss its expected benefits and challenges: a unified framework with functional biological plausibility using variational Bayesian inference.
【3】 Table2Vec: Automated Universal Representation Learning to Encode All-round Data DNA for Benchmarkable and Explainable Enterprise Data Science 标题:表2Vec:自动化通用表示学习为可基准和可解释的企业数据科学编码全面的数据DNA 链接:https://arxiv.org/abs/2112.01830
作者:Longbing Cao,Chengzhang Zhu 备注:24 pages, 16 figures, 1 table 摘要:企业数据通常涉及多个异构数据源和外部数据,分别记录业务活动、交易、客户统计、状态、行为、与企业的交互和通信,以及其产品、服务、生产、营销和运营的消费和反馈,企业数据科学面临的一个关键挑战是,如何在全方位的企业DNA上实现有效的整体企业数据理解、数据驱动的发现和决策。我们介绍了一种神经编码器Table2Vec,用于自动通用表示学习实体,如来自全方位企业DNA的客户,并具有自动数据特征分析和数据质量增强功能。学习到的通用表示可以作为代表性和基准企业数据基因组,并可用于企业范围和特定领域的学习任务。表2VEC集成了低质量企业数据和下游学习任务的自动化通用表示学习。我们举例说明Table2Vec在复杂异构多关系大表上描述企业中全面的客户数据DNA,以构建通用客户向量表示。每个客户学习到的通用表示法是全面的、有代表性的和基准的,能够支持企业数据科学中企业范围和特定领域的学习目标和任务。Table2Vec显著优于企业分析中常用的现有浅层、推进和深度学习方法。我们进一步讨论了自动化通用企业表示和学习的研究机会、方向和应用,以及用于自动化、通用、全企业和道德机器学习和数据科学的企业数据DNA。 摘要:Enterprise data typically involves multiple heterogeneous data sources and external data that respectively record business activities, transactions, customer demographics, status, behaviors, interactions and communications with the enterprise, and the consumption and feedback of its products, services, production, marketing, operations, and management, etc. A critical challenge in enterprise data science is to enable an effective whole-of-enterprise data understanding and data-driven discovery and decision-making on all-round enterprise DNA. We introduce a neural encoder Table2Vec for automated universal representation learning of entities such as customers from all-round enterprise DNA with automated data characteristics analysis and data quality augmentation. The learned universal representations serve as representative and benchmarkable enterprise data genomes and can be used for enterprise-wide and domain-specific learning tasks. Table2Vec integrates automated universal representation learning on low-quality enterprise data and downstream learning tasks. We illustrate Table2Vec in characterizing all-round customer data DNA in an enterprise on complex heterogeneous multi-relational big tables to build universal customer vector representations. The learned universal representation of each customer is all-round, representative and benchmarkable to support both enterprise-wide and domain-specific learning goals and tasks in enterprise data science. Table2Vec significantly outperforms the existing shallow, boosting and deep learning methods typically used for enterprise analytics. We further discuss the research opportunities, directions and applications of automated universal enterprise representation and learning and the learned enterprise data DNA for automated, all-purpose, whole-of-enterprise and ethical machine learning and data science.
【4】 Probing Linguistic Information For Logical Inference In Pre-trained Language Models 标题:在预先训练的语言模型中探索逻辑推理的语言信息 链接:https://arxiv.org/abs/2112.01753
作者:Zeming Chen,Qiyue Gao 备注:Accepted in AAAI 2022 摘要:在预先训练的语言模型方面取得的进展已经在自然语言理解的下游任务上取得了令人印象深刻的成果。最近关于探索预先训练的语言模型的工作揭示了在其语境化表达中编码的广泛的语言特性。然而,目前尚不清楚它们是否编码了对符号推理方法至关重要的语义知识。我们提出了一种在预先训练的语言模型表示中探测逻辑推理的语言信息的方法。我们的探测数据集涵盖了主要符号推理系统所需的语言现象列表。我们发现(i)预先训练的语言模型确实编码了几种类型的语言信息用于推理,但也有一些类型的信息是弱编码的,(ii)语言模型可以通过微调有效地学习缺失的语言信息。总的来说,我们的研究结果提供了语言模型及其训练前程序捕捉逻辑推理的语言信息的哪些方面的见解。此外,我们还展示了语言模型作为支持符号推理方法的语义和背景知识库的潜力。 摘要:Progress in pre-trained language models has led to a surge of impressive results on downstream tasks for natural language understanding. Recent work on probing pre-trained language models uncovered a wide range of linguistic properties encoded in their contextualized representations. However, it is unclear whether they encode semantic knowledge that is crucial to symbolic inference methods. We propose a methodology for probing linguistic information for logical inference in pre-trained language model representations. Our probing datasets cover a list of linguistic phenomena required by major symbolic inference systems. We find that (i) pre-trained language models do encode several types of linguistic information for inference, but there are also some types of information that are weakly encoded, (ii) language models can effectively learn missing linguistic information through fine-tuning. Overall, our findings provide insights into which aspects of linguistic information for logical inference do language models and their pre-training procedures capture. Moreover, we have demonstrated language models' potential as semantic and background knowledge bases for supporting symbolic inference methods.
【5】 Theoretical Analysis of an XGBoost Framework for Product Cannibalization 标题:XGBoost产品拆分框架的理论分析 链接:https://arxiv.org/abs/2112.01566
作者:Gautham Bekal,Mohammad Bari 备注:To better understand this paper please go through the previous paper, An XGBoost-Based Forecasting Framework for Product Cannibalization. This paper is an extension of the previous work 摘要:本文是我们工作的一个扩展,我们提出了一个三阶段XGBoost算法,用于预测产品同类相食情形下的销售额。之前,我们根据直觉开发了该模型,并提供了其性能的经验证据。在本研究中,我们将简要回顾该算法,然后提供其工作原理背后的数学推理。 摘要:This paper is an extension of our work where we presented a three-stage XGBoost algorithm for forecasting sales under product cannibalization scenario. Previously we developed the model based on our intuition and provided empirical evidence on its performance. In this study we would briefly go over the algorithm and then provide mathematical reasoning behind its working.
【6】 Dimension-Free Average Treatment Effect Inference with Deep Neural Networks 标题:基于深度神经网络的无因次平均处理效果推断 链接:https://arxiv.org/abs/2112.01574
作者:Xinze Du,Yingying Fan,Jinchi Lv,Tianshu Sun,Patrick Vossler 备注:56 pages, 22 figures 摘要:本文研究了在潜在结果框架下,利用深度神经网络(DNN)对平均治疗效果(ATE)的估计和推断。在某些规律性条件下,观察到的反应可以表示为以混杂变量和治疗指标为自变量的均值回归问题的反应。利用这种公式,我们研究了两种基于DNN回归估计平均回归函数的ATE估计和推断方法,使用特定的网络结构。我们证明了在真实均值回归模型的某些假设下,ATE的两个DNN估计与无量纲一致性率是一致的。我们的模型假设适应了观察到的反应对协变量的潜在复杂依赖结构,包括潜在因素以及治疗指标和混杂变量之间的非线性相互作用。我们还基于样本分割的思想建立了估计量的渐近正态性,确保了精确的推断和不确定性量化。仿真研究和实际数据应用证明了我们的理论发现,并支持我们的DNN估计和推断方法。 摘要:This paper investigates the estimation and inference of the average treatment effect (ATE) using deep neural networks (DNNs) in the potential outcomes framework. Under some regularity conditions, the observed response can be formulated as the response of a mean regression problem with both the confounding variables and the treatment indicator as the independent variables. Using such formulation, we investigate two methods for ATE estimation and inference based on the estimated mean regression function via DNN regression using a specific network architecture. We show that both DNN estimates of ATE are consistent with dimension-free consistency rates under some assumptions on the underlying true mean regression model. Our model assumptions accommodate the potentially complicated dependence structure of the observed response on the covariates, including latent factors and nonlinear interactions between the treatment indicator and confounding variables. We also establish the asymptotic normality of our estimators based on the idea of sample splitting, ensuring precise inference and uncertainty quantification. Simulation studies and real data application justify our theoretical findings and support our DNN estimation and inference methods.
检测相关(5篇)
【1】 Improving the Reliability of Network Intrusion Detection Systems through Dataset Integration 标题:通过数据集整合提高网络入侵检测系统的可靠性 链接:https://arxiv.org/abs/2112.02080
作者:Roberto Magán-Carrión,Daniel Urda,Ignacio Díaz-Cano,Bernabé Dorronsoro 备注:Submitted to the IEEE Transactions on Emerging Topics in Computing journal 摘要:这项工作提出了可靠的网络入侵检测系统(R-NIDS),这是一种基于机器学习(ML)的网络入侵检测系统(NIDS)的新方法,允许ML模型在集成数据集上工作,使学习过程能够从不同的数据集获得不同的信息。因此,R-NIDS的目标是设计比传统方法更健壮的模型。我们还提出了一个新的数据集,称为UNK21。它由三个最著名的网络数据集(UGR'16、USNW-NB15和NLS-KDD)构建而成,每个数据集都是从自己的网络环境中收集的,具有不同的特性和类别,使用R-NIDS中提供的数据聚合方法。在R-NIDS之后,在这项工作中,我们建议根据文献中用于NIDS评估的三个最常见数据集的信息构建两个著名的ML模型(线性和非线性模型),这些数据集集成在UNK21中。所提出的方法提供的结果表明,作为NIDS解决方案训练的这两个ML模型可以从这种方法中受益,在新提出的UNK21数据集上训练时能够更好地推广。此外,使用统计工具仔细分析了这些结果,为我们的结论提供了很高的可信度。 摘要:This work presents Reliable-NIDS (R-NIDS), a novel methodology for Machine Learning (ML) based Network Intrusion Detection Systems (NIDSs) that allows ML models to work on integrated datasets, empowering the learning process with diverse information from different datasets. Therefore, R-NIDS targets the design of more robust models, that generalize better than traditional approaches. We also propose a new dataset, called UNK21. It is built from three of the most well-known network datasets (UGR'16, USNW-NB15 and NLS-KDD), each one gathered from its own network environment, with different features and classes, by using a data aggregation approach present in R-NIDS. Following R-NIDS, in this work we propose to build two well-known ML models (a linear and a non-linear one) based on the information of three of the most common datasets in the literature for NIDS evaluation, those integrated in UNK21. The results that the proposed methodology offers show how these two ML models trained as a NIDS solution could benefit from this approach, being able to generalize better when training on the newly proposed UNK21 dataset. Furthermore, these results are carefully analyzed with statistical tools that provide high confidence on our conclusions.
【2】 Practitioner-Centric Approach for Early Incident Detection Using Crowdsourced Data for Emergency Services 标题:使用众包数据进行应急服务的以从业者为中心的早期事件检测方法 链接:https://arxiv.org/abs/2112.02012
作者:Yasas Senarath,Ayan Mukhopadhyay,Sayyed Mohsen Vazirizade,Hemant Purohit,Saideep Nannapaneni,Abhishek Dubey 备注:Accepted at IEEE International Conference on Data Mining (ICDM) 2021 摘要:应急响应在很大程度上取决于事件报告的时间。不幸的是,接收事故报告的传统方法(如在美国拨打911)存在时间延迟。Waze等众包平台为事件的早期识别提供了机会。然而,由于与众包数据流相关的噪声和不确定性的挑战,从众包数据流中检测事件是困难的。此外,简单地优化过检测精度可能会影响推理的时空定位,从而使此类方法不适用于实际部署。本文以应急响应管理为例,提出了一种新的基于众包数据的以从业者为中心的事件检测问题公式和解决方法。提议的方法CROME(众包多目标事件检测)量化了事件分类的性能指标(如F1分数)与模型从业者的要求(如事件检测的1 km半径)之间的关系。首先,我们展示了如何在卷积神经网络(CNN)架构中,将众包报告、地面实况历史数据和其他相关决定因素(如交通和天气)结合使用,以早期检测紧急事件。然后,我们使用基于帕累托优化的方法来优化CNN的输出,并结合以从业者为中心的参数来平衡检测精度和时空定位。最后,我们使用来自Waze的众包数据和来自美国田纳西州纳什维尔的交通事故报告证明了该方法的适用性。我们的实验表明,该方法在事件检测方面优于现有方法,同时优化了对真实世界部署和可用性的需求。 摘要:Emergency response is highly dependent on the time of incident reporting. Unfortunately, the traditional approach to receiving incident reports (e.g., calling 911 in the USA) has time delays. Crowdsourcing platforms such as Waze provide an opportunity for early identification of incidents. However, detecting incidents from crowdsourced data streams is difficult due to the challenges of noise and uncertainty associated with such data. Further, simply optimizing over detection accuracy can compromise spatial-temporal localization of the inference, thereby making such approaches infeasible for real-world deployment. This paper presents a novel problem formulation and solution approach for practitioner-centered incident detection using crowdsourced data by using emergency response management as a case-study. The proposed approach CROME (Crowdsourced Multi-objective Event Detection) quantifies the relationship between the performance metrics of incident classification (e.g., F1 score) and the requirements of model practitioners (e.g., 1 km. radius for incident detection). First, we show how crowdsourced reports, ground-truth historical data, and other relevant determinants such as traffic and weather can be used together in a Convolutional Neural Network (CNN) architecture for early detection of emergency incidents. Then, we use a Pareto optimization-based approach to optimize the output of the CNN in tandem with practitioner-centric parameters to balance detection accuracy and spatial-temporal localization. Finally, we demonstrate the applicability of this approach using crowdsourced data from Waze and traffic accident reports from Nashville, TN, USA. Our experiments demonstrate that the proposed approach outperforms existing approaches in incident detection while simultaneously optimizing the needs for real-world deployment and usability.
【3】 Label noise detection under the Noise at Random model with ensemble filters 标题:基于集成滤波的随机模型下标签噪声检测 链接:https://arxiv.org/abs/2112.01617
作者:Kecia G. Moura,Ricardo B. C. Prudêncio,George D. C. Cavalcanti 备注:Accepted for publication in IOS Press Intelligent Data Analysis. This paper will appear in Volume 26(5) of the IDA journal. The publication date for this issue is September 2022 摘要:由于标签噪声检测在提高训练数据质量方面的重要性,它在机器学习中得到了广泛的研究。通过采用分类器集成,实现了满意的噪声检测。在这种方法中,如果池中有很大比例的成员对实例进行了错误分类,则会将实例指定为错误标记。以前的作者对这种方法进行了经验评估;然而,他们大多假设标签噪声是在数据集中完全随机产生的。这是一个强有力的假设,因为其他类型的标签噪声在实践中是可行的,并且会影响噪声检测结果。本文研究了两种不同噪声模型下的集成噪声检测性能:随机噪声模型(NAR),其中标签噪声的概率取决于实例类,与完全随机噪声模型(其中标签噪声的概率完全独立)相比。在这种情况下,我们研究了类别分布对噪声检测性能的影响,因为在NAR假设下,类别分布改变了数据集中观察到的总噪声水平。此外,对集合投票阈值进行评估,以与文献中最常用的方法进行对比。在许多已执行的实验中,当考虑到诸如类不平衡和不同类之间的噪声级比率等方面时,选择一个噪声产生模型而不是另一个模型会导致不同的结果。 摘要:Label noise detection has been widely studied in Machine Learning because of its importance in improving training data quality. Satisfactory noise detection has been achieved by adopting ensembles of classifiers. In this approach, an instance is assigned as mislabeled if a high proportion of members in the pool misclassifies it. Previous authors have empirically evaluated this approach; nevertheless, they mostly assumed that label noise is generated completely at random in a dataset. This is a strong assumption since other types of label noise are feasible in practice and can influence noise detection results. This work investigates the performance of ensemble noise detection under two different noise models: the Noisy at Random (NAR), in which the probability of label noise depends on the instance class, in comparison to the Noisy Completely at Random model, in which the probability of label noise is entirely independent. In this setting, we investigate the effect of class distribution on noise detection performance since it changes the total noise level observed in a dataset under the NAR assumption. Further, an evaluation of the ensemble vote threshold is conducted to contrast with the most common approaches in the literature. In many performed experiments, choosing a noise generation model over another can lead to different results when considering aspects such as class imbalance and noise level ratio among different classes.
【4】 Detection of Large Vessel Occlusions using Deep Learning by Deforming Vessel Tree Segmentations 标题:基于变形血管树分割的深度学习检测大血管闭塞 链接:https://arxiv.org/abs/2112.01797
作者:Florian Thamm,Oliver Taubmann,Markus Jürgens,Hendrik Ditt,Andreas Maier 备注:6 pages, preprint 摘要:计算机断层摄影血管造影是一种关键的检查方法,它能深入了解脑血管树,这对于缺血性中风的诊断和治疗至关重要,尤其是在大血管闭塞(LVO)的情况下。因此,LVOs患者的自动检测大大有利于临床工作流程。这项工作使用卷积神经网络进行案例级分类,通过血管树分割模板的弹性变形进行训练,以人为地增加训练数据。仅使用遮罩作为模型的输入,使我们能够在保持样本真实性的同时,比传统图像体积更积极地应用此类变形。神经网络对LVO的存在和受影响的半球进行分类。在一项5倍交叉验证消融研究中,我们证明,使用建议的增强器使我们能够从很少的数据集训练稳健的模型。在100个数据集上训练EfficientNetB1架构,建议的增强方案能够在不使用增强的情况下将ROC AUC从基线值0.57提高到0.85。使用3D DenseNet获得最佳性能,AUC为0.88。增强对受累半球的分类也有积极影响,其中3D DenseNet两侧的AUC为0.93。 摘要:Computed Tomography Angiography is a key modality providing insights into the cerebrovascular vessel tree that are crucial for the diagnosis and treatment of ischemic strokes, in particular in cases of large vessel occlusions (LVO). Thus, the clinical workflow greatly benefits from an automated detection of patients suffering from LVOs. This work uses convolutional neural networks for case-level classification trained with elastic deformation of the vessel tree segmentation masks to artificially augment training data. Using only masks as the input to our model uniquely allows us to apply such deformations much more aggressively than one could with conventional image volumes while retaining sample realism. The neural network classifies the presence of an LVO and the affected hemisphere. In a 5-fold cross validated ablation study, we demonstrate that the use of the suggested augmentation enables us to train robust models even from few data sets. Training the EfficientNetB1 architecture on 100 data sets, the proposed augmentation scheme was able to raise the ROC AUC to 0.85 from a baseline value of 0.57 using no augmentation. The best performance was achieved using a 3D-DenseNet yielding an AUC of 0.88. The augmentation had positive impact in classification of the affected hemisphere as well, where the 3D-DenseNet reached an AUC of 0.93 on both sides.
【5】 Robust End-to-End Focal Liver Lesion Detection using Unregistered Multiphase Computed Tomography Images 标题:基于未配准多期CT图像的端到端局灶性肝病检测 链接:https://arxiv.org/abs/2112.01535
作者:Sang-gil Lee,Eunji Kim,Jae Seok Bae,Jung Hoon Kim,Sungroh Yoon 备注:IEEE TETCI. 14 pages, 8 figures, 5 tables 摘要:肝脏局灶性病变(FLLs)的计算机辅助诊断有助于改进工作流程,实现正确诊断;FLL检测是计算机辅助诊断的第一步。尽管最近基于深度学习的方法在检测FLL方面取得了成功,但目前的方法对于评估失调的多相数据还不够稳健。通过在特征空间中引入注意引导的多相位对齐,本研究提出了一个全自动的端到端学习框架,用于从多相位CT(CT)图像中检测FLL。我们的方法由于其完全基于学习的方法而对错位多相图像具有鲁棒性,这降低了模型性能对配准质量的敏感性,并使模型能够在临床实践中独立部署。对一个包含280名患者的大规模数据集的评估证实,我们的方法优于以前的最新方法,并显著降低了使用错位多期CT图像检测FLL的性能下降。该方法的鲁棒性可以提高基于深度学习的计算机辅助检测系统的临床应用。 摘要:The computer-aided diagnosis of focal liver lesions (FLLs) can help improve workflow and enable correct diagnoses; FLL detection is the first step in such a computer-aided diagnosis. Despite the recent success of deep-learning-based approaches in detecting FLLs, current methods are not sufficiently robust for assessing misaligned multiphase data. By introducing an attention-guided multiphase alignment in feature space, this study presents a fully automated, end-to-end learning framework for detecting FLLs from multiphase computed tomography (CT) images. Our method is robust to misaligned multiphase images owing to its complete learning-based approach, which reduces the sensitivity of the model's performance to the quality of registration and enables a standalone deployment of the model in clinical practice. Evaluation on a large-scale dataset with 280 patients confirmed that our method outperformed previous state-of-the-art methods and significantly reduced the performance degradation for detecting FLLs using misaligned multiphase CT images. The robustness of the proposed method can enhance the clinical adoption of the deep-learning-based computer-aided detection system.
分类|识别(1篇)
【1】 Shapes of Emotions: Multimodal Emotion Recognition in Conversations via Emotion Shifts 标题:情绪的形状:通过情绪转换识别对话中的多模态情绪 链接:https://arxiv.org/abs/2112.01938
作者:Harsh Agarwal,Keshav Bansal,Abhinav Joshi,Ashutosh Modi 备注:13 pages 摘要:会话中的情感识别是一个重要而活跃的研究课题。最近的工作表明,在ERC任务中使用多种模式(如文本、音频和视频)的好处。在谈话中,参与者倾向于保持特定的情绪状态,除非某些外部刺激引起变化。在一次谈话中,情绪会不断地起伏波动。受这一观察结果的启发,我们提出了一个多模态ERC模型,并用情绪转移成分对其进行了扩充。提出的情感转移组件是模块化的,可以添加到任何现有的多模态ERC模型中(只需稍作修改),以提高情感识别。我们对该模型的不同变体进行了实验,结果表明,包含情绪转移信号有助于该模型优于现有的ERC多模态模型,从而在MOSEI和IEMOCAP数据集上显示出最先进的性能。 摘要:Emotion Recognition in Conversations (ERC) is an important and active research problem. Recent work has shown the benefits of using multiple modalities (e.g., text, audio, and video) for the ERC task. In a conversation, participants tend to maintain a particular emotional state unless some external stimuli evokes a change. There is a continuous ebb and flow of emotions in a conversation. Inspired by this observation, we propose a multimodal ERC model and augment it with an emotion-shift component. The proposed emotion-shift component is modular and can be added to any existing multimodal ERC model (with a few modifications), to improve emotion recognition. We experiment with different variants of the model, and results show that the inclusion of emotion shift signal helps the model to outperform existing multimodal models for ERC and hence showing the state-of-the-art performance on MOSEI and IEMOCAP datasets.
表征(3篇)
【1】 A Structured Dictionary Perspective on Implicit Neural Representations 标题:关于隐含神经表征的结构化词典视角 链接:https://arxiv.org/abs/2112.01917
作者:Gizem Yüce,Guillermo Ortiz-Jiménez,Beril Besbinar,Pascal Frossard 备注:26 pages, 14 figures 摘要:在新设计的推动下,隐式神经表征(INR)可以绕过光谱偏差,成为信号经典离散化表征的一种有前途的替代方法。然而,尽管INR在实践中取得了成功,但我们仍然缺乏对INR如何表示信号的正确理论描述。在这项工作中,我们旨在填补这一空白,并提出了一个新的统一的角度来理论分析印度卢比。利用谐波分析和深度学习理论的结果,我们发现大多数INR族类似于结构化信号字典,其原子是初始映射频率集的整数次谐波。这种结构允许INR使用一些仅随深度线性增长的参数,以指数增长的频率支持来表达信号。然后,我们利用最近关于经验神经切线核(NTK)的研究结果,探讨了INRs的诱导偏差。具体来说,我们证明了NTK的本征函数可以看作是字典原子,其与目标信号的内积决定了重建的最终性能。在这方面,我们发现元学习初始化具有NTK的重塑效应,类似于字典学习,将字典原子构建为元训练过程中看到的示例的组合。我们的结果允许设计和调整新颖的INR体系结构,但也可能引起更广泛的深度学习理论界的兴趣。 摘要:Propelled by new designs that permit to circumvent the spectral bias, implicit neural representations (INRs) have recently emerged as a promising alternative to classical discretized representations of signals. Nevertheless, despite their practical success, we still lack a proper theoretical characterization of how INRs represent signals. In this work, we aim to fill this gap, and we propose a novel unified perspective to theoretically analyse INRs. Leveraging results from harmonic analysis and deep learning theory, we show that most INR families are analogous to structured signal dictionaries whose atoms are integer harmonics of the set of initial mapping frequencies. This structure allows INRs to express signals with an exponentially increasing frequency support using a number of parameters that only grows linearly with depth. Afterwards, we explore the inductive bias of INRs exploiting recent results about the empirical neural tangent kernel (NTK). Specifically, we show that the eigenfunctions of the NTK can be seen as dictionary atoms whose inner product with the target signal determines the final performance of their reconstruction. In this regard, we reveal that meta-learning the initialization has a reshaping effect of the NTK analogous to dictionary learning, building dictionary atoms as a combination of the examples seen during meta-training. Our results permit to design and tune novel INR architectures, but can also be of interest for the wider deep learning theory community.
【2】 The Representation Jensen-Renyí Divergence 标题:Jensen-Renyí发散表示 链接:https://arxiv.org/abs/2112.01583
作者:Jhoan Keider Hoyos Osorio,Oscar Skean,Austin Brockmeier,Luis Gonzalo Sanchez Giraldo 摘要:在由无限可分核定义的再生核Hilbert空间中,我们引入了一个基于算子的数据分布之间的散度度量。散度的经验估计是使用正定矩阵的特征值来计算的,正定矩阵是通过在成对样本上计算核得到的。新的度量与Jensen-Shannon散度具有相似的性质。基于Gram矩阵的有序谱和与总体量相关的积分算子之间的差异,所提出的估计量的收敛性来自于集中结果。建议的散度度量避免了对数据背后的概率分布的估计。数值实验包括比较分布和应用抽样不平衡数据进行分类,结果表明所提出的散度可以达到最先进的结果。 摘要:We introduce a divergence measure between data distributions based on operators in reproducing kernel Hilbert spaces defined by infinitely divisible kernels. The empirical estimator of the divergence is computed using the eigenvalues of positive definite matrices that are obtained by evaluating the kernel over pairs of samples. The new measure shares similar properties to Jensen-Shannon divergence. Convergence of the proposed estimators follows from concentration results based on the difference between the ordered spectrum of the Gram matrices and the integral operators associated with the population quantities. The proposed measure of divergence avoids the estimation of the probability distribution underlying the data. Numerical experiments involving comparing distributions and applications to sampling unbalanced data for classification show that the proposed divergence can achieve state of the art results.
【3】 Adversarially learning disentangled speech representations for robust multi-factor voice conversion 标题:用于鲁棒多因素语音转换的对抗性学习解缠语音表示 链接:https://arxiv.org/abs/2102.00184
作者:Jie Wang,Jingbei Li,Xintao Zhao,Zhiyong Wu,Shiyin Kang,Helen Meng 摘要:在语音转换(VC)中,将语音分解为非纠缠语音表示对于实现高度可控的风格转换至关重要。VC中传统的语音表征学习方法仅将语音分解为说话人和内容,缺乏对其他韵律相关因素的可控性。针对更多语音因素的最先进的语音表示学习方法正在使用主要的解纠缠算法,如随机重采样和特设瓶颈层大小调整,但这很难确保鲁棒的语音表示解纠缠。为了提高VC中高度可控的风格转换对多种因素的鲁棒性,我们提出了一种基于对抗学习的非纠缠语音表征学习框架。提取了四种表征内容、音色、节奏和音高的语音表示,并通过受BERT启发的对抗性掩码和预测(MAP)网络进一步分解。对抗性网络通过随机掩蔽和预测一种语音表征与另一种语音表征之间的相关性来最小化语音表征之间的相关性。实验结果表明,该框架通过将语音质量MOS从2.79提高到3.30,将MCD从3.89降低到3.58,显著提高了VC对多因素的鲁棒性。 摘要:Factorizing speech as disentangled speech representations is vital to achieve highly controllable style transfer in voice conversion (VC). Conventional speech representation learning methods in VC only factorize speech as speaker and content, lacking controllability on other prosody-related factors. State-of-the-art speech representation learning methods for more speechfactors are using primary disentangle algorithms such as random resampling and ad-hoc bottleneck layer size adjustment,which however is hard to ensure robust speech representationdisentanglement. To increase the robustness of highly controllable style transfer on multiple factors in VC, we propose a disentangled speech representation learning framework based on adversarial learning. Four speech representations characterizing content, timbre, rhythm and pitch are extracted, and further disentangled by an adversarial Mask-And-Predict (MAP)network inspired by BERT. The adversarial network is used tominimize the correlations between the speech representations,by randomly masking and predicting one of the representationsfrom the others. Experimental results show that the proposedframework significantly improves the robustness of VC on multiple factors by increasing the speech quality MOS from 2.79 to3.30 and decreasing the MCD from 3.89 to 3.58.
优化|敛散性(3篇)
【1】 Fast L^2 optimal mass transport via reduced basis methods for the Monge-Amp\grave{\rm e}re equation链接:https://arxiv.org/abs/2112.01878
作者:Shijin Hou,Yanlai Chen,Yinhua Xia 摘要:重复求解参数化最优质量传输(pOMT)问题是图像配准和自适应网格生成等应用中的一项常见任务。因此,开发与全阶模型同样精确的高效简化求解器至关重要。在本文中,我们提出了一种类似机器学习的pOMT方法,该方法采用了一种专为非线性方程设计的新的约化基(RB)技术,即约化剩余约化过配置(R2-ROC)方法,用于参数化Monge Amp$\grave{\rm e}$re方程。它建立在窄模板有限差分法(FDM)的基础上,这是一种所谓的真值解算器,我们在本文中针对具有传输边界的Monge-Amp$\grave{\rm e}$re方程提出了该方法。与R2-ROC方法一起,它允许我们处理与Monge Amp$\grave{\rm e}$re方程相关的强而独特的非线性,从而实现在线效率,而无需求助于非线性的任何直接近似。几个具有挑战性的数值试验证明了我们的方法在求解具有各种参数边界条件的Monge-Amp$\grave{\rm e}$re方程时的准确性和高效性。 摘要:Repeatedly solving the parameterized optimal mass transport (pOMT) problem is a frequent task in applications such as image registration and adaptive grid generation. It is thus critical to develop a highly efficient reduced solver that is equally accurate as the full order model. In this paper, we propose such a machine learning-like method for pOMT by adapting a new reduced basis (RB) technique specifically designed for nonlinear equations, the reduced residual reduced over-collocation (R2-ROC) approach, to the parameterized Monge-Amp$\grave{\rm e}$re equation. It builds on top of a narrow-stencil finite different method (FDM), a so-called truth solver, which we propose in this paper for the Monge-Amp$\grave{\rm e}$re equation with a transport boundary. Together with the R2-ROC approach, it allows us to handle the strong and unique nonlinearity pertaining to the Monge-Amp$\grave{\rm e}$re equation achieving online efficiency without resorting to any direct approximation of the nonlinearity. Several challenging numerical tests demonstrate the accuracy and high efficiency of our method for solving the Monge-Amp$\grave{\rm e}$re equation with various parametric boundary conditions.
【2】 Regularized Newton Method with Global O(1/k^2) Convergence标题:全局O(1/k^2)收敛的正则化牛顿法链接:https://arxiv.org/abs/2112.02089
作者:Konstantin Mishchenko 备注:19 pages, 1 figure 摘要:我们提出了一种牛顿型方法,该方法在任何初始条件下都能快速收敛,并可用于具有Lipschitz-Hessians算子的任意凸目标。我们通过将三次正则化的思想与某种自适应的Levenberg——Marquardt惩罚相结合来实现这一点。特别地,我们证明了由$x^{k+1}=x^k-\bigl($H>0$是常数)给出的迭代,以$\mathcal{O}(\frac{1}{k})速率全局收敛。我们的方法是牛顿方法的第一个变种,它具有廉价的迭代和可证明的快速全局收敛性。此外,我们还证明了当目标为强凸时,我们的方法局部超线性收敛。为了提高该方法的性能,我们提出了一种不需要超参数且可证明有效的线搜索方法。 摘要:We present a Newton-type method that converges fast from any initialization and for arbitrary convex objectives with Lipschitz Hessians. We achieve this by merging the ideas of cubic regularization with a certain adaptive Levenberg--Marquardt penalty. In particular, we show that the iterates given by $x^{k+1}=x^k - \bigl(\nabla^2 f(x^k) + \sqrt{H\|\nabla f(x^k)\|} \mathbf{I}\bigr)^{-1}\nabla f(x^k)$, where $H>0$ is a constant, converge globally with a $\mathcal{O}(\frac{1}{k^2})$ rate. Our method is the first variant of Newton's method that has both cheap iterations and provably fast global convergence. Moreover, we prove that locally our method converges superlinearly when the objective is strongly convex. To boost the method's performance, we present a line search procedure that does not need hyperparameters and is provably efficient.
【3】 Near-optimal estimation of smooth transport maps with kernel sums-of-squares 标题:具有核平方和的光滑传输映射的次最优估计 链接:https://arxiv.org/abs/2112.01907
作者:Boris Muzellec,Adrien Vacher,Francis Bach,François-Xavier Vialard,Alessandro Rudi 摘要:最近的研究表明,在光滑条件下,两个分布之间的平方Wasserstein距离可以有效地计算,并具有诱人的统计误差上界。然而,与距离本身不同,生成性建模等应用程序感兴趣的对象是底层的最佳运输地图。因此,需要获得估计地图本身的计算和统计保证。在本文中,我们提出了第一个易于处理的算法,该算法使得映射上的统计$L^2$误差几乎与现有的平滑映射估计的极小极大下界相匹配。我们的方法是基于求解具有无限维平方和的最优传输的半对偶公式,并导出了一个在样本数上具有无维多项式速率的算法,该算法具有潜在的指数维依赖常数。 摘要:It was recently shown that under smoothness conditions, the squared Wasserstein distance between two distributions could be efficiently computed with appealing statistical error upper bounds. However, rather than the distance itself, the object of interest for applications such as generative modeling is the underlying optimal transport map. Hence, computational and statistical guarantees need to be obtained for the estimated maps themselves. In this paper, we propose the first tractable algorithm for which the statistical $L^2$ error on the maps nearly matches the existing minimax lower-bounds for smooth map estimation. Our method is based on solving the semi-dual formulation of optimal transport with an infinite-dimensional sum-of-squares reformulation, and leads to an algorithm which has dimension-free polynomial rates in the number of samples, with potentially exponentially dimension-dependent constants.
预测|估计(4篇)
【1】 User-click Modelling for Predicting Purchase Intent 标题:用于预测购买意向的用户点击建模 链接:https://arxiv.org/abs/2112.02006
作者:Simone Borg Bruun 摘要:本论文对利用机器学习方法对用户行为建模的开放式精算数学问题进行了结构化研究,以预测非寿险产品的购买意愿。对于一家公司来说,了解用户与他们网站的互动是很有价值的,因为它为消费者行为提供了丰富而个性化的见解。大多数现有的用户行为建模研究旨在解释或预测搜索引擎结果页面上的点击,或估计赞助搜索中的点击率。这些模型基于关于用户对网页的检查模式和网页项目表示的概念。通过研究建模用户行为以预测商业网站上的购买意图的问题,我们发现,用户的意图高度依赖于用户浏览网站的方式,即用户访问了多少不同的网页,用户与哪种网页交互,以及用户在每个网页上花费的时间。受这些发现的启发,我们提出了两种不同的方法来表示用户会话的特征,从而产生了两种基于用户点击的购买预测模型:一种基于前馈神经网络,另一种基于递归神经网络。通过将上述两个模型与使用用户人口统计特征的模型进行比较,我们检验了用户点击预测购买意图的区分性。我们的实验结果表明,我们基于点击的模型在标准分类评估指标方面显著优于人口统计模型,并且基于用户点击顺序表示的模型产生的性能略高于基于点击特征工程的模型。 摘要:This thesis contributes a structured inquiry into the open actuarial mathematics problem of modelling user behaviour using machine learning methods, in order to predict purchase intent of non-life insurance products. It is valuable for a company to understand user interactions with their website as it provides rich and individualized insight into consumer behaviour. Most of existing research in user behaviour modelling aims to explain or predict clicks on a search engine result page or to estimate click-through rate in sponsored search. These models are based on concepts about users' examination patterns of a web page and the web page's representation of items. Investigating the problem of modelling user behaviour to predict purchase intent on a business website, we observe that a user's intention yields high dependency on how the user navigates the website in terms of how many different web pages the user visited, what kind of web pages the user interacted with, and how much time the user spent on each web page. Inspired by these findings, we propose two different ways of representing features of a user session leading to two models for user click-based purchase prediction: one based on a Feed Forward Neural Network, and another based on a Recurrent Neural Network. We examine the discriminativeness of user-clicks for predicting purchase intent by comparing the above two models with a model using demographic features of the user. Our experimental results show that our click-based models significantly outperform the demographic model, in terms of standard classification evaluation metrics, and that a model based on a sequential representation of user clicks yields slightly greater performance than a model based on feature engineering of clicks.
【2】 Fast Projected Newton-like Method for Precision Matrix Estimation with Nonnegative Partial Correlations 标题:非负部分相关精度矩阵估计的快速投影类牛顿算法 链接:https://arxiv.org/abs/2112.01939
作者:Jiaxi Ying,José Vinícius de M. Cardoso,Jian-Feng Cai,Daniel P. Palomar 备注:43 pages 摘要:我们研究了多元高斯分布中的精度矩阵估计问题,其中所有的偏相关都是非负的,也称为二阶多元全正($\mathrm{MTP}u 2$)。近年来,这类模型受到了广泛的关注,主要是由于其有趣的性质,例如,无论潜在维度如何,最大似然估计量只存在两个观测值。在$\mathrm{MTP}u 2$约束下,我们将该问题表述为加权$\ellu 1$范数正则高斯最大似然估计。在这个方向上,我们提出了一种新的投影类牛顿算法,该算法结合了精心设计的近似牛顿方向,使得我们的算法具有与一阶方法相同的计算阶数和内存开销。我们证明了所提出的投影类牛顿算法收敛到问题的极小值。我们进一步从理论和实验上证明,使用加权$\ellu 1$-范数的公式的最小值能够正确地恢复基础精度矩阵的支持,而不需要$\ellu 1$-范数方法中存在的非相干条件。涉及合成数据和真实数据的实验表明,从计算时间的角度来看,我们提出的算法比最先进的方法具有更高的效率。最后,我们将我们的方法应用于金融时间序列数据,这是众所周知的显示正相关性的数据,我们观察到在学习的金融网络上模块化值方面的显著性能。 摘要:We study the problem of estimating precision matrices in multivariate Gaussian distributions where all partial correlations are nonnegative, also known as multivariate totally positive of order two ($\mathrm{MTP}_2$). Such models have received significant attention in recent years, primarily due to interesting properties, e.g., the maximum likelihood estimator exists with as few as two observations regardless of the underlying dimension. We formulate this problem as a weighted $\ell_1$-norm regularized Gaussian maximum likelihood estimation under $\mathrm{MTP}_2$ constraints. On this direction, we propose a novel projected Newton-like algorithm that incorporates a well-designed approximate Newton direction, which results in our algorithm having the same orders of computation and memory costs as those of first-order methods. We prove that the proposed projected Newton-like algorithm converges to the minimizer of the problem. We further show, both theoretically and experimentally, that the minimizer of our formulation using the weighted $\ell_1$-norm is able to recover the support of the underlying precision matrix correctly without requiring the incoherence condition present in $\ell_1$-norm based methods. Experiments involving synthetic and real-world data demonstrate that our proposed algorithm is significantly more efficient, from a computational time perspective, than the state-of-the-art methods. Finally, we apply our method in financial time-series data, which are well-known for displaying positive dependencies, where we observe a significant performance in terms of modularity value on the learned financial networks.
【3】 Prediction of Household-level Heat-Consumption using PSO enhanced SVR Model 标题:基于粒子群算法改进支持向量机模型的户级热耗预测 链接:https://arxiv.org/abs/2112.01908
作者:Satyaki Chatterjee,Siming Bayer,Andreas Maier 备注:Accepted for NeurIPS Climate Change Workshop 2021 摘要:在应对气候变化的过程中,区域能源系统(DES)供热或供冷的有效需求能源供应操作必不可少。因此,准确预测用户端的热量消耗是实现最佳能源供应的重要第一步。然而,由于热耗数据的非线性和非平稳性,DES的热能需求预测仍然具有挑战性。在这项工作中,我们提出了一个基于核支持向量回归(kSVR)的区域供热系统(DHS)热能消耗预测框架,该框架使用真实世界的智能电表数据。粒子群优化算法(PSO)用于寻找kSVR模型的最优超参数,这使得所提出的方法与最先进的ARIMA模型相比具有优越性。单个电表特定预测和社会消费预测的平均MAPE分别降低至2.07%和2.64%。 摘要:In combating climate change, an effective demand-based energy supply operation of the district energy system (DES) for heating or cooling is indispensable. As a consequence, an accurate forecast of heat consumption on the consumer side poses an important first step towards an optimal energy supply. However, due to the non-linearity and non-stationarity of heat consumption data, the prediction of the thermal energy demand of DES remains challenging. In this work, we propose a forecasting framework for thermal energy consumption within a district heating system (DHS) based on kernel Support Vector Regression (kSVR) using real-world smart meter data. Particle Swarm Optimization (PSO) is employed to find the optimal hyper-parameter for the kSVR model which leads to the superiority of the proposed methods when compared to a state-of-the-art ARIMA model. The average MAPE is reduced to 2.07% and 2.64% for the individual meter-specific forecasting and for forecasting of societal consumption, respectively.
【4】 Differential Property Prediction: A Machine Learning Approach to Experimental Design in Advanced Manufacturing 标题:微分性能预测:先进制造实验设计的机器学习方法 链接:https://arxiv.org/abs/2112.01687
作者:Loc Truong,WoongJo Choi,Colby Wight,Lizzy Coda,Tegan Emerson,Keerti Kappagantula,Henry Kvinge 摘要:先进的制造技术使生产具有最先进性能的材料成为可能。然而,在许多情况下,这些技术的基于物理的模型的开发落后于它们在实验室中的使用。这意味着设计和运行实验主要通过反复试验来进行。这是次优的,因为实验是成本、时间和劳动密集型的。在这项工作中,我们提出了一个机器学习框架,即差分属性分类(DPC),它使实验者能够利用机器学习无与伦比的模式匹配能力来进行数据驱动的实验设计。DPC获取两个可能的实验参数集,并输出一个预测值,该预测值将产生具有操作员指定的更理想特性的材料。我们使用剪切辅助加工和挤出(ShAPE)这一固相加工技术,在AA7075管材制造工艺和机械性能数据上展示了DPC的成功。我们表明,通过关注实验者在多个候选实验参数之间进行选择的需要,我们可以将从加工参数预测材料特性的具有挑战性的回归任务重新构造为机器学习模型可以获得良好性能的分类任务。 摘要:Advanced manufacturing techniques have enabled the production of materials with state-of-the-art properties. In many cases however, the development of physics-based models of these techniques lags behind their use in the lab. This means that designing and running experiments proceeds largely via trial and error. This is sub-optimal since experiments are cost-, time-, and labor-intensive. In this work we propose a machine learning framework, differential property classification (DPC), which enables an experimenter to leverage machine learning's unparalleled pattern matching capability to pursue data-driven experimental design. DPC takes two possible experiment parameter sets and outputs a prediction of which will produce a material with a more desirable property specified by the operator. We demonstrate the success of DPC on AA7075 tube manufacturing process and mechanical property data using shear assisted processing and extrusion (ShAPE), a solid phase processing technology. We show that by focusing on the experimenter's need to choose between multiple candidate experimental parameters, we can reframe the challenging regression task of predicting material properties from processing parameters, into a classification task on which machine learning models can achieve good performance.
其他神经网络|深度学习|模型|建模(14篇)
【1】 MD-inferred neural network monoclinic finite-strain hyperelasticity models for β-HMX: Sobolev training and validation against physical constraints标题:β-hmx的MD-推断神经网络单斜有限应变超弹性模型:针对物理约束的索博列夫训练和验证链接:https://arxiv.org/abs/2112.02077
作者:Nikolaos N. Vlassis,Puhan Zhao,Ran Ma,Tommy Sewell,WaiChing Sun 备注:29 pages, 17 figures 摘要:我们提出了一个机器学习框架来训练和验证神经网络,以预测单斜有机分子晶体$\beta$-HMX在几何非线性状态下的各向异性弹性响应。使用过滤分子动力学(MD)模拟数据库训练Sobolev范数的神经网络,Sobolev范数使用应力测量和参考构型推导弹性储能泛函。为了提高从学习到的存储能量得出的弹性切线预测的准确性,使用转移学习技术在必要条件下(如强椭圆度、晶体对称性)从数据中引入额外的切线约束对于模型的正确性,要么作为附加物理约束引入,要么纳入验证测试。神经网络的评估基于(1)它们再现MD预测的底线本构响应的准确性,(2)它们的稳定性和唯一性的详细检查,以及(3)有限变形区内连续介质力学理论预测响应的可容许性。我们比较了神经网络在不同Sobolev约束下的训练效率,并针对$\beta$-HMX的MD基准评估了模型的准确性和鲁棒性。 摘要:We present a machine learning framework to train and validate neural networks to predict the anisotropic elastic response of the monoclinic organic molecular crystal $\beta$-HMX in the geometrical nonlinear regime. A filtered molecular dynamic (MD) simulations database is used to train the neural networks with a Sobolev norm that uses the stress measure and a reference configuration to deduce the elastic stored energy functional. To improve the accuracy of the elasticity tangent predictions originating from the learned stored energy, a transfer learning technique is used to introduce additional tangential constraints from the data while necessary conditions (e.g. strong ellipticity, crystallographic symmetry) for the correctness of the model are either introduced as additional physical constraints or incorporated in the validation tests. Assessment of the neural networks is based on (1) the accuracy with which they reproduce the bottom-line constitutive responses predicted by MD, (2) detailed examination of their stability and uniqueness, and (3) admissibility of the predicted responses with respect to continuum mechanics theory in the finite-deformation regime. We compare the neural networks' training efficiency under different Sobolev constraints and assess the models' accuracy and robustness against MD benchmarks for $\beta$-HMX.
【2】 ROCA: Robust CAD Model Retrieval and Alignment from a Single Image 标题:ROCA:单幅图像的稳健CAD模型检索与对齐 链接:https://arxiv.org/abs/2112.01988
作者:Can Gümeli,Angela Dai,Matthias Nießner 摘要:我们介绍了ROCA,一种新颖的端到端方法,它可以从形状数据库检索三维CAD模型并将其与单个输入图像对齐。这使得能够从2D RGB观察中对观察到的场景进行3D感知,其特征是轻量级、紧凑、干净的CAD表示。我们方法的核心是基于密集2D-3D对象对应和Procrustes对齐的可微对齐优化。因此,ROCA可以提供可靠的CAD对齐,同时通过利用2D-3D对应关系来学习几何相似的CAD模型来通知CAD检索。对来自ScanNet的具有挑战性的真实图像进行的实验表明,ROCA在支持检索的CAD对齐精度方面显著提高,从9.5%提高到17.6%。 摘要:We present ROCA, a novel end-to-end approach that retrieves and aligns 3D CAD models from a shape database to a single input image. This enables 3D perception of an observed scene from a 2D RGB observation, characterized as a lightweight, compact, clean CAD representation. Core to our approach is our differentiable alignment optimization based on dense 2D-3D object correspondences and Procrustes alignment. ROCA can thus provide a robust CAD alignment while simultaneously informing CAD retrieval by leveraging the 2D-3D correspondences to learn geometrically similar CAD models. Experiments on challenging, real-world imagery from ScanNet show that ROCA significantly improves on state of the art, from 9.5% to 17.6% in retrieval-aware CAD alignment accuracy.
【3】 Enhancing Deep Neural Networks Testing by Traversing Data Manifold 标题:遍历数据流形增强深度神经网络测试 链接:https://arxiv.org/abs/2112.01956
作者:Yuanyuan Yuan,Qi Pang,Shuai Wang 摘要:我们开发了DEEPTRAVERSAL,一个反馈驱动的框架来测试DNN。DEEPTRAVERSAL首先启动一个离线阶段,将各种形式的媒体数据映射到流形。然后,在在线测试阶段,DEEPTRAVERSAL遍历准备好的流形空间,以最大化DNN覆盖标准并触发预测错误。在我们的评估中,使用了执行各种任务(例如分类、自动驾驶、机器翻译)的DNN和不同类型(图像、音频、文本)的媒体数据。相对于流行的DNN覆盖标准,DEEPTRAVERSAL比以前的方法表现出更好的性能,它可以发现更多和更高质量的错误触发输入。经测试的DNN模型在修复深度穿越发现后,获得了更好的精度 摘要:We develop DEEPTRAVERSAL, a feedback-driven framework to test DNNs. DEEPTRAVERSAL first launches an offline phase to map media data of various forms to manifolds. Then, in its online testing phase, DEEPTRAVERSAL traverses the prepared manifold space to maximize DNN coverage criteria and trigger prediction errors. In our evaluation, DNNs executing various tasks (e.g., classification, self-driving, machine translation) and media data of different types (image, audio, text) were used. DEEPTRAVERSAL exhibits better performance than prior methods with respect to popular DNN coverage criteria and it can discover a larger number and higher quality of error-triggering inputs. The tested DNN models, after being repaired with findings of DEEPTRAVERSAL, achieve better accuracy
【4】 You Can't See the Forest for Its Trees: Assessing Deep Neural Network Testing via NeuraL Coverage 标题:从树木看不到森林:通过神经覆盖率评估深度神经网络测试 链接:https://arxiv.org/abs/2112.01955
作者:Yuanyuan Yuan,Qi Pang,Shuai Wang 摘要:本文总结了DNN测试标准的八项设计要求,考虑了分布特性和实际问题。然后,我们提出了一个新的标准NLC,它满足所有这些设计要求。NLC将单个DNN层作为基本计算单元(而不是单个神经元),并捕获神经元输出分布的四个关键特征。因此,NLC被表示为神经覆盖,它更准确地描述了神经网络如何通过近似分布而不是神经元来理解输入。我们证明了NLC与测试套件在许多任务(分类和生成)和数据格式(图像和文本)中的多样性显著相关。它发现DNN预测错误的能力是有希望的。NLC引导的测试输入突变导致暴露错误行为的更高质量和多样性。 摘要:This paper summarizes eight design requirements for DNN testing criteria, taking into account distribution properties and practical concerns. We then propose a new criterion, NLC, that satisfies all of these design requirements. NLC treats a single DNN layer as the basic computational unit (rather than a single neuron) and captures four critical features of neuron output distributions. Thus, NLC is denoted as NeuraL Coverage, which more accurately describes how neural networks comprehend inputs via approximated distributions rather than neurons. We demonstrate that NLC is significantly correlated with the diversity of a test suite across a number of tasks (classification and generation) and data formats (image and text). Its capacity to discover DNN prediction errors is promising. Test input mutation guided by NLC result in a greater quality and diversity of exposed erroneous behaviors.
【5】 Heuristic Search Planning with Deep Neural Networks using Imitation, Attention and Curriculum Learning 标题:基于模仿、注意和课程学习的深度神经网络启发式搜索规划 链接:https://arxiv.org/abs/2112.01918
作者:Leah Chrestien,Tomas Pevny,Antonin Komenda,Stefan Edelkamp 备注:8 pages plus references 摘要:为硬任务规划领域学习一个信息充分的启发式函数是一个难以捉摸的问题。虽然有已知的神经网络结构来表示此类启发式知识,但不清楚学习了哪些具体信息,以及旨在理解结构的技术是否有助于提高启发式知识的质量。本文提出了一个网络模型来学习一个启发式函数,该启发式函数能够通过使用注意机制的最优计划模拟来关联状态空间的遥远部分,从而大大提高了对一个好的启发式函数的学习。为了克服该方法在创建难度越来越大的问题时的局限性,我们演示了课程学习的使用,在训练集中添加新解决的问题实例,这反过来,有助于解决更复杂的问题,远远超过所有现有基线(包括经典规划启发式)的性能。我们证明了它对网格型PDDL域的有效性。 摘要:Learning a well-informed heuristic function for hard task planning domains is an elusive problem. Although there are known neural network architectures to represent such heuristic knowledge, it is not obvious what concrete information is learned and whether techniques aimed at understanding the structure help in improving the quality of the heuristics. This paper presents a network model to learn a heuristic capable of relating distant parts of the state space via optimal plan imitation using the attention mechanism, which drastically improves the learning of a good heuristic function. To counter the limitation of the method in the creation of problems of increasing difficulty, we demonstrate the use of curriculum learning, where newly solved problem instances are added to the training set, which, in turn, helps to solve problems of higher complexities and far exceeds the performances of all existing baselines including classical planning heuristics. We demonstrate its effectiveness for grid-type PDDL domains.
【6】 A Flexible HLS Hoeffding Tree Implementation for Runtime Learning on FPGA 标题:一种灵活的HLS Hoeffding树在FPGA上的运行时学习实现 链接:https://arxiv.org/abs/2112.01875
作者:Luís Miguel Sousa,Nuno Paulino,João Canas Ferreira,João Bispo 摘要:在嵌入式系统中实现机器学习时,决策树因其简单性和可扩展性而经常被首选。Hoeffding树是一种决策树,它利用Hoeffding界限,允许它们学习数据中的模式,而不必连续存储数据样本以备将来重新处理。这使得它们特别适合在嵌入式设备上部署。在这项工作中,我们强调了Hoeffding树的HLS实现的特性。实现参数包括样本的特征大小(D)、输出类的数量(K)以及允许树增长到的最大节点数量(Nd)。我们以Xilinx MPSoC ZCU102为目标,并评估:不同类别数量和特征尺寸的设计资源需求和时钟频率、不同样本大小(N)的多个合成数据集上的执行时间、输出类别数量以及UCI两个数据集的执行时间和精度。对于D3、K5和N40000的问题大小,在103MHz下运行的单个决策树的推理速度比1.2GHz ARM Cortex-A53核心快8.3倍。与Hoeffding树的参考实现相比,我们实现了UCI数据集的可比分类精度。 摘要:Decision trees are often preferred when implementing Machine Learning in embedded systems for their simplicity and scalability. Hoeffding Trees are a type of Decision Trees that take advantage of the Hoeffding Bound to allow them to learn patterns in data without having to continuously store the data samples for future reprocessing. This makes them especially suitable for deployment on embedded devices. In this work we highlight the features of an HLS implementation of the Hoeffding Tree. The implementation parameters include the feature size of the samples (D), the number of output classes (K), and the maximum number of nodes to which the tree is allowed to grow (Nd). We target a Xilinx MPSoC ZCU102, and evaluate: the design's resource requirements and clock frequency for different numbers of classes and feature size, the execution time on several synthetic datasets of varying sample sizes (N), number of output classes and the execution time and accuracy for two datasets from UCI. For a problem size of D3, K5, and N40000, a single decision tree operating at 103MHz is capable of 8.3x faster inference than the 1.2GHz ARM Cortex-A53 core. Compared to a reference implementation of the Hoeffding tree, we achieve comparable classification accuracy for the UCI datasets.
【7】 Characterizing Performance Bugs in Deep Learning Systems 标题:深度学习系统中性能缺陷的表征 链接:https://arxiv.org/abs/2112.01771
作者:Junming Cao,Bihuan Chen,Chao Sun,Longjie Hu,Xin Peng 摘要:深度学习(DL)已越来越多地应用于各个领域。编程范式从传统系统向DL系统的转变对工程DL系统提出了独特的挑战。性能是一个挑战,DL系统中的性能缺陷(PBs)会导致严重的后果,如过度的资源消耗和财务损失。虽然DL系统中的bug已经得到了广泛的研究,但DL系统中的PBs几乎没有被研究过。为了弥补这一差距,我们提出了第一项综合研究,以描述TensorFLow和Keras开发的DL系统中PBs的症状、根本原因以及引入和暴露阶段,共从225个堆垛溢流桩收集了238个PBs。我们的发现为开发高性能DL系统以及检测和定位DL系统中的PBs提供了启示。我们还建立了DL系统中56个PBs的第一个基准,并评估了现有方法解决这些问题的能力。此外,我们还开发了一个静态检查器DeepPerf来检测三种类型的PBs,并在130个GitHub项目中识别了488个新PBs。其中62个和18个分别得到了开发人员的确认和修复。 摘要:Deep learning (DL) has been increasingly applied to a variety of domains. The programming paradigm shift from traditional systems to DL systems poses unique challenges in engineering DL systems. Performance is one of the challenges, and performance bugs(PBs) in DL systems can cause severe consequences such as excessive resource consumption and financial loss. While bugs in DL systems have been extensively investigated, PBs in DL systems have hardly been explored. To bridge this gap, we present the first comprehensive study to characterize symptoms, root causes, and introducing and exposing stages of PBs in DL systems developed in TensorFLow and Keras, with a total of 238 PBs collected from 225 StackOverflow posts. Our findings shed light on the implications on developing high performance DL systems, and detecting and localizing PBs in DL systems. We also build the first benchmark of 56 PBs in DL systems, and assess the capability of existing approaches in tackling them. Moreover, we develop a static checker DeepPerf to detect three types of PBs, and identify 488 new PBs in 130 GitHub projects.62 and 18 of them have been respectively confirmed and fixed by developers.
【8】 Learning Emergent Random Access Protocol for LEO Satellite Networks 标题:LEO卫星网络的学习应急随机接入协议 链接:https://arxiv.org/abs/2112.01765
作者:Ju-Hyung Lee,Hyowoon Seo,Jihong Park,Mehdi Bennis,Young-Chai Ko 摘要:设想一个由低空地球轨道(LEO)卫星(SAT)组成的巨型星座,以在第五代(5G)蜂窝系统中提供全球覆盖的SAT网络。LEO卫星网络在时变卫星网络拓扑结构下表现出许多用户的超长链路距离。这使得现有的多址协议,例如为固定地面网络拓扑设计的基于随机接入信道(RACH)的蜂窝协议,不适合。为了克服这个问题,本文提出了一种新的LEO卫星网络无授权随机接入方案,称为紧急随机接入信道协议(eRACH)。与现有的基于模型和标准化协议形成鲜明对比的是,eRACH是一种无模型方法,通过与非平稳网络环境交互,使用多代理深度强化学习(MADRL)而出现。此外,通过利用已知的SAT轨道模式,eRACH不需要在用户之间进行中心协调或额外通信,而通过常规轨道模式稳定训练收敛。与RACH相比,我们通过各种模拟表明,我们提出的eRACH在实现0.989 Jain公平性指数的同时,平均网络吞吐量提高了54.6%,平均访问延迟降低了约两倍。 摘要:A mega-constellation of low-altitude earth orbit (LEO) satellites (SATs) are envisaged to provide a global coverage SAT network in beyond fifth-generation (5G) cellular systems. LEO SAT networks exhibit extremely long link distances of many users under time-varying SAT network topology. This makes existing multiple access protocols, such as random access channel (RACH) based cellular protocol designed for fixed terrestrial network topology, ill-suited. To overcome this issue, in this paper, we propose a novel grant-free random access solution for LEO SAT networks, dubbed emergent random access channel protocol (eRACH). In stark contrast to existing model-based and standardized protocols, eRACH is a model-free approach that emerges through interaction with the non-stationary network environment, using multi-agent deep reinforcement learning (MADRL). Furthermore, by exploiting known SAT orbiting patterns, eRACH does not require central coordination or additional communication across users, while training convergence is stabilized through the regular orbiting patterns. Compared to RACH, we show from various simulations that our proposed eRACH yields 54.6% higher average network throughput with around two times lower average access delay while achieving 0.989 Jain's fairness index.
【9】 Frame Averaging for Equivariant Shape Space Learning 标题:用于等变形状空间学习的帧平均方法 链接:https://arxiv.org/abs/2112.01741
作者:Matan Atzmon,Koki Nagano,Sanja Fidler,Sameh Khamis,Yaron Lipman 摘要:形状空间学习的任务是将一系列形状映射到具有良好泛化特性的潜在表示空间,并将其映射到潜在表示空间。通常,真实世界的形状集合具有对称性,可以将对称性定义为不改变形状本质的变换。将对称性纳入形状空间学习的一种自然方法是要求到形状空间的映射(编码器)和来自形状空间的映射(解码器)与相关对称性相等。在本文中,我们通过引入两个贡献,提出了一个在编码器和解码器中合并等变的框架:(i)调整最近的帧平均(FA)框架,以构建通用的、高效的和最大表达的等变自动编码器;和(ii)构造与应用于形状不同部分的分段欧几里德运动相同的自动编码器。据我们所知,这是第一个完全分段欧氏等变自动编码器构造。我们的框架很简单:它使用标准重建损失,不需要引入新的损失。我们的体系结构是由标准(主干)体系结构构建的,具有适当的帧平均以使它们相等。在使用隐式神经表示的刚性形状数据集和使用基于网格的神经网络的关节形状数据集上对我们的框架进行测试,显示了对看不见的测试形状的最新概括,大大提高了相关基线。特别是,我们的方法在推广到看不见的关节姿势方面有显著的改进。 摘要:The task of shape space learning involves mapping a train set of shapes to and from a latent representation space with good generalization properties. Often, real-world collections of shapes have symmetries, which can be defined as transformations that do not change the essence of the shape. A natural way to incorporate symmetries in shape space learning is to ask that the mapping to the shape space (encoder) and mapping from the shape space (decoder) are equivariant to the relevant symmetries. In this paper, we present a framework for incorporating equivariance in encoders and decoders by introducing two contributions: (i) adapting the recent Frame Averaging (FA) framework for building generic, efficient, and maximally expressive Equivariant autoencoders; and (ii) constructing autoencoders equivariant to piecewise Euclidean motions applied to different parts of the shape. To the best of our knowledge, this is the first fully piecewise Euclidean equivariant autoencoder construction. Training our framework is simple: it uses standard reconstruction losses and does not require the introduction of new losses. Our architectures are built of standard (backbone) architectures with the appropriate frame averaging to make them equivariant. Testing our framework on both rigid shapes dataset using implicit neural representations, and articulated shape datasets using mesh-based neural networks show state-of-the-art generalization to unseen test shapes, improving relevant baselines by a large margin. In particular, our method demonstrates significant improvement in generalizing to unseen articulated poses.
【10】 Reduced, Reused and Recycled: The Life of a Dataset in Machine Learning Research 标题:缩减、重用和再循环:机器学习研究中的数据集寿命 链接:https://arxiv.org/abs/2112.01716
作者:Bernard Koch,Emily Denton,Alex Hanna,Jacob G. Foster 备注:35th Conference on Neural Information Processing Systems (NeurIPS 2021), Sydney, Australia 摘要:基准数据集在机器学习研究的组织中起着核心作用。他们围绕共同的研究问题协调研究人员,并作为实现共同目标进展的衡量标准。尽管基准测试实践在这一领域起着基础性作用,但在机器学习子单元内部或之间,对基准数据集使用和重用的动态性关注相对较少。在本文中,我们深入研究了这些动力学。我们研究了2015-2020年间数据集使用模式在机器学习子单元和时间上的差异。我们发现,越来越多的人关注任务社区中越来越少的数据集,大量采用其他任务中的数据集,以及研究人员在少数精英机构中引入的数据集。我们的研究结果对科学评估、人工智能伦理和领域内的公平/准入具有影响。 摘要:Benchmark datasets play a central role in the organization of machine learning research. They coordinate researchers around shared research problems and serve as a measure of progress towards shared goals. Despite the foundational role of benchmarking practices in this field, relatively little attention has been paid to the dynamics of benchmark dataset use and reuse, within or across machine learning subcommunities. In this paper, we dig into these dynamics. We study how dataset usage patterns differ across machine learning subcommunities and across time from 2015-2020. We find increasing concentration on fewer and fewer datasets within task communities, significant adoption of datasets from other tasks, and concentration across the field on datasets that have been introduced by researchers situated within a small number of elite institutions. Our results have implications for scientific evaluation, AI ethics, and equity/access within the field.
【11】 Contrastive Continual Learning with Feature Propagation 标题:基于特征传播的对比式连续学习 链接:https://arxiv.org/abs/2112.01713
作者:Xuejun Han,Yuhong Guo 摘要:经典的机器学习者被设计为只处理一项任务,而没有能力采用新出现的任务或类,而这种能力在现实世界中更实用,更像人类。为了解决这个缺点,连续的机器学习者被精心设计,以很好地学习一系列任务,不同任务之间的领域和类别发生变化。在本文中,我们提出了一种基于通用特征传播的对比连续学习方法,该方法能够处理多个连续学习场景。具体地说,我们通过特征传播和对比表征学习来对齐当前和以前的表征空间,以在不同任务之间架起领域转移的桥梁。为了进一步减少特征表示的类转移,利用监督对比损失使同一类的示例嵌入比不同类的示例嵌入更接近。大量的实验结果表明,与一组先进的连续学习方法相比,该方法在六个连续学习基准上具有优异的性能。 摘要:Classical machine learners are designed only to tackle one task without capability of adopting new emerging tasks or classes whereas such capacity is more practical and human-like in the real world. To address this shortcoming, continual machine learners are elaborated to commendably learn a stream of tasks with domain and class shifts among different tasks. In this paper, we propose a general feature-propagation based contrastive continual learning method which is capable of handling multiple continual learning scenarios. Specifically, we align the current and previous representation spaces by means of feature propagation and contrastive representation learning to bridge the domain shifts among distinct tasks. To further mitigate the class-wise shifts of the feature representation, a supervised contrastive loss is exploited to make the example embeddings of the same class closer than those of different classes. The extensive experimental results demonstrate the outstanding performance of the proposed method on six continual learning benchmarks compared to a group of cutting-edge continual learning methods.
【12】 Machine Learning Subsystem for Autonomous Collision Avoidance on a small UAS with Embedded GPU 标题:基于嵌入式GPU的小型无人机自主避碰机器学习子系统 链接:https://arxiv.org/abs/2112.01688
作者:Nicholas Polosky,Tyler Gwin,Sean Furman,Parth Barhanpurkar,Jithin Jagannath 备注:IEEE International Workshop on Communication and Networking for Swarms Robotics 摘要:随着基于机器学习的自治模块和嵌入式图形处理单元(GPU)的广泛使用,人们对6G通信网络的无人机系统(UAS)供电解决方案的兴趣大大增加。虽然这些技术彻底改变了无人机解决方案的可能性,但为无人机设计一个可操作、健壮的自治框架仍然是一个多方面的难题。在这项工作中,我们提出了我们新颖的、模块化的UAS自治框架,名为MR iFLY,并讨论了如何将其扩展以支持6G swarm解决方案。我们首先详细介绍了资源受限设备上基于机器学习的UAS自主性所面临的挑战。接下来,我们将深入描述iFLY先生的新型深度估计和碰撞避免技术如何应对这些挑战。最后,我们描述了我们用来衡量性能的各种评估标准,展示了我们优化的机器视觉组件如何比基线模型提供高达15倍的加速,并展示了iFLY先生基于视觉的防撞技术的飞行演示视频。我们认为,这些实证结果证实了iFLY先生可以通过提供独立的碰撞避免和导航功能来减少6G通信群中节点之间的通信开销。 摘要:Interest in unmanned aerial system (UAS) powered solutions for 6G communication networks has grown immensely with the widespread availability of machine learning based autonomy modules and embedded graphical processing units (GPUs). While these technologies have revolutionized the possibilities of UAS solutions, designing an operable, robust autonomy framework for UAS remains a multi-faceted and difficult problem. In this work, we present our novel, modular framework for UAS autonomy, entitled MR-iFLY, and discuss how it may be extended to enable 6G swarm solutions. We begin by detailing the challenges associated with machine learning based UAS autonomy on resource constrained devices. Next, we describe in depth, how MR-iFLY's novel depth estimation and collision avoidance technology meets these challenges. Lastly, we describe the various evaluation criteria we have used to measure performance, show how our optimized machine vision components provide up to 15X speedup over baseline models and present a flight demonstration video of MR-iFLY's vision-based collision avoidance technology. We argue that these empirical results substantiate MR-iFLY as a candidate for use in reducing communication overhead between nodes in 6G communication swarms by providing standalone collision avoidance and navigation capabilities.
【13】 Challenges and Opportunities in Approximate Bayesian Deep Learning for Intelligent IoT Systems 标题:智能物联网系统近似贝叶斯深度学习面临的挑战和机遇 链接:https://arxiv.org/abs/2112.01675
作者:Meet P. Vadera,Benjamin M. Marlin 摘要:近似贝叶斯深度学习方法在解决智能系统中部署深度学习组件时出现的几个问题方面具有重要的应用前景,包括减少过度自信错误的发生,增强对分布外示例的鲁棒性。然而,现有近似贝叶斯推理方法的计算要求可能使其不适合部署在包括低功耗边缘设备的智能物联网系统中。在本文中,我们提出了一系列用于监督深度学习的近似贝叶斯推理方法,并强调了在当前边缘硬件上应用这些方法的挑战和机遇。我们重点介绍了几种降低模型存储需求和提高计算可伸缩性的潜在解决方案,包括模型修剪和提取方法。 摘要:Approximate Bayesian deep learning methods hold significant promise for addressing several issues that occur when deploying deep learning components in intelligent systems, including mitigating the occurrence of over-confident errors and providing enhanced robustness to out of distribution examples. However, the computational requirements of existing approximate Bayesian inference methods can make them ill-suited for deployment in intelligent IoT systems that include lower-powered edge devices. In this paper, we present a range of approximate Bayesian inference methods for supervised deep learning and highlight the challenges and opportunities when applying these methods on current edge hardware. We highlight several potential solutions to decreasing model storage requirements and improving computational scalability, including model pruning and distillation methods.
【14】 Identifying mass composition of ultra-high-energy cosmic rays using deep learning 标题:基于深度学习的超高能宇宙线质量组成识别 链接:https://arxiv.org/abs/2112.02072
作者:O. Kalashev,I. Kharuk,M. Kuznetsov,G. Rubtsov,T. Sako,Y. Tsunesada,Ya. Zhezher 备注:18 pages, 5 figures 摘要:我们介绍了一种利用深度学习识别超高能宇宙线质量组成的新方法。该方法的关键思想是使用两个神经网络链。第一个网络预测单个事件的主要粒子类型,而第二个网络则推断事件集合的质量组成。我们将此方法应用于望远镜阵列表面探测器读数的蒙特卡罗数据,在该数据上,对于4分量近似,它产生了前所未有的7%的低误差。统计误差被证明低于与选择用于模拟的强子相互作用模型有关的系统误差。 摘要:We introduce a novel method for identifying the mass composition of ultra-high-energy cosmic rays using deep learning. The key idea of the method is to use a chain of two neural networks. The first network predicts the type of a primary particle for individual events, while the second infers the mass composition of an ensemble of events. We apply this method to the Monte-Carlo data for the Telescope Array Surface Detectors readings, on which it yields an unprecedented low error of 7% for 4-component approximation. The statistical error is shown to be inferior to the systematic one related to the choice of the hadronic interaction model used for simulations.
其他(29篇)
【1】 Coupling Vision and Proprioception for Navigation of Legged Robots 标题:腿式机器人导航中的视觉与视觉耦合 链接:https://arxiv.org/abs/2112.02094
作者:Zipeng Fu,Ashish Kumar,Ananye Agarwal,Haozhi Qi,Jitendra Malik,Deepak Pathak 备注:Website and videos at this https URL 摘要:我们利用视觉和本体感觉的互补优势,在腿部机器人中实现点目标导航。腿式系统比轮式机器人能够穿越更复杂的地形,但为了充分利用这一能力,我们需要导航系统中的高级路径规划人员了解低级别移动策略在不同地形上的行走能力。我们通过使用本体感知反馈来估计行走策略的安全操作极限,并感知意外障碍物和地形特性,如视觉可能错过的地面平滑度或柔软度,从而实现这一目标。导航系统使用车载摄像头生成入住地图和相应的成本地图,以实现目标。然后,FMM(快速行进法)规划器生成目标路径。速度命令生成器将此作为输入,使用来自安全顾问的意外障碍物和地形确定速度限制的附加约束作为输入,为移动策略生成所需速度。与轮式机器人(LoCoBot)基线和其他具有不相交的高层规划和底层控制的基线相比,我们显示出了优越的性能。我们还展示了我们的系统在四足机器人上的实际部署,该机器人带有机载传感器和计算机。视频在https://navigation-locomotion.github.io/camera-ready 摘要:We exploit the complementary strengths of vision and proprioception to achieve point goal navigation in a legged robot. Legged systems are capable of traversing more complex terrain than wheeled robots, but to fully exploit this capability, we need the high-level path planner in the navigation system to be aware of the walking capabilities of the low-level locomotion policy on varying terrains. We achieve this by using proprioceptive feedback to estimate the safe operating limits of the walking policy, and to sense unexpected obstacles and terrain properties like smoothness or softness of the ground that may be missed by vision. The navigation system uses onboard cameras to generate an occupancy map and a corresponding cost map to reach the goal. The FMM (Fast Marching Method) planner then generates a target path. The velocity command generator takes this as input to generate the desired velocity for the locomotion policy using as input additional constraints, from the safety advisor, of unexpected obstacles and terrain determined speed limits. We show superior performance compared to wheeled robot (LoCoBot) baselines, and other baselines which have disjoint high-level planning and low-level control. We also show the real-world deployment of our system on a quadruped robot with onboard sensors and compute. Videos at https://navigation-locomotion.github.io/camera-ready
【2】 Class-agnostic Reconstruction of Dynamic Objects from Videos 标题:视频中与类无关的动态对象重建 链接:https://arxiv.org/abs/2112.02091
作者:Zhongzheng Ren,Xiaoming Zhao,Alexander G. Schwing 备注:NeurIPS 2021 摘要:我们引入了REDO,一个与类无关的框架来从RGBD或校准视频重建动态对象。与之前的工作相比,我们的问题设置更真实,但更具挑战性,原因有三:1)由于遮挡或相机设置,感兴趣的对象可能永远不会完全可见,但我们的目标是重建完整的形状;2) 我们的目标是处理不同的对象动力学,包括刚体运动、非刚体运动和关节;3) 我们的目标是用一个统一的框架重建不同类别的对象。为了应对这些挑战,我们开发了两个新模块。首先,我们引入一个标准的4D隐式函数,该函数是与聚集的时间视觉线索对齐的像素。其次,我们开发了一个4D转换模块,该模块捕获对象动态以支持时间传播和聚合。我们在合成RGBD视频数据集SAIL-VOS 3D和变形4D++以及真实世界视频数据3DPW的大量实验中研究了REDO的效果。我们发现REDO比最先进的动态重建方法有一定的优势。在消融研究中,我们验证了每个开发的组件。 摘要:We introduce REDO, a class-agnostic framework to REconstruct the Dynamic Objects from RGBD or calibrated videos. Compared to prior work, our problem setting is more realistic yet more challenging for three reasons: 1) due to occlusion or camera settings an object of interest may never be entirely visible, but we aim to reconstruct the complete shape; 2) we aim to handle different object dynamics including rigid motion, non-rigid motion, and articulation; 3) we aim to reconstruct different categories of objects with one unified framework. To address these challenges, we develop two novel modules. First, we introduce a canonical 4D implicit function which is pixel-aligned with aggregated temporal visual cues. Second, we develop a 4D transformation module which captures object dynamics to support temporal propagation and aggregation. We study the efficacy of REDO in extensive experiments on synthetic RGBD video datasets SAIL-VOS 3D and DeformingThings4D++, and on real-world video data 3DPW. We find REDO outperforms state-of-the-art dynamic reconstruction methods by a margin. In ablation studies we validate each developed component.
【3】 Data-Free Neural Architecture Search via Recursive Label Calibration 标题:基于递归标签校准的无数据神经结构搜索 链接:https://arxiv.org/abs/2112.02086
作者:Zechun Liu,Zhiqiang Shen,Yun Long,Eric Xing,Kwang-Ting Cheng,Chas Leichner 备注:Technical report 摘要:本文旨在探讨在不使用任何原始训练数据的情况下,仅给出预训练模型的神经结构搜索(NAS)的可行性。在现实场景中,这是隐私保护、避免偏见等的重要环境。为了实现这一点,我们首先通过从预先训练的深层神经网络中恢复知识来合成可用数据。然后,我们使用合成数据及其预测的软标签来指导神经结构搜索。我们发现NAS任务需要合成的数据(我们这里的目标是图像域),这些数据具有足够的语义、多样性,并且与自然图像的域间距最小。对于语义,我们提出递归标签校准来产生更多的信息输出。对于多样性,我们提出了一种区域更新策略,以生成更加多样和语义丰富的合成数据。对于最小域间距,我们使用输入和特征级正则化来模拟原始数据在潜在空间中的分布。我们用三种流行的NAS算法来实例化我们提出的框架:DART、ProxylessNAS和SPO。令人惊讶的是,我们的结果表明,通过使用我们的合成数据搜索发现的架构实现了与通过搜索原始架构首次发现的架构相当甚至更高的精确度,得出结论:如果综合方法设计得当,NAS可以有效地进行,而无需访问原始数据或所谓的自然数据。我们的代码将公开提供。 摘要:This paper aims to explore the feasibility of neural architecture search (NAS) given only a pre-trained model without using any original training data. This is an important circumstance for privacy protection, bias avoidance, etc., in real-world scenarios. To achieve this, we start by synthesizing usable data through recovering the knowledge from a pre-trained deep neural network. Then we use the synthesized data and their predicted soft-labels to guide neural architecture search. We identify that the NAS task requires the synthesized data (we target at image domain here) with enough semantics, diversity, and a minimal domain gap from the natural images. For semantics, we propose recursive label calibration to produce more informative outputs. For diversity, we propose a regional update strategy to generate more diverse and semantically-enriched synthetic data. For minimal domain gap, we use input and feature-level regularization to mimic the original data distribution in latent space. We instantiate our proposed framework with three popular NAS algorithms: DARTS, ProxylessNAS and SPOS. Surprisingly, our results demonstrate that the architectures discovered by searching with our synthetic data achieve accuracy that is comparable to, or even higher than, architectures discovered by searching from the original ones, for the first time, deriving the conclusion that NAS can be done effectively with no need of access to the original or called natural data if the synthesis method is well designed. Our code will be publicly available.
【4】 I-WKNN: Fast-Speed and High-Accuracy WIFI Positioning for Intelligent Stadiums 标题:I-WKNN:智能体育场馆的快速高精度WiFi定位 链接:https://arxiv.org/abs/2112.02058
作者:Zhangzhi Zhao,Zhengying Lou,Ruibo Wang,Qingyao Li,Xing Xu 摘要:在智能体育场馆现有各种无线指纹定位算法的基础上,提出了一种高精度、快速的室内定位算法&改进加权k近邻(I-WKNN)。为了满足体育场馆复杂的环境和高速采样的要求,提出了一种离线和在线阶段的AP选择算法。根据智能场馆信号强度分布的特点,提出了一种非对称高斯滤波算法。介绍了定位算法在智能体育场系统中的应用,完成了体育场的数据采集和实时定位。与传统的WKNN和KNN算法相比,I-WKNN算法在指纹定位数据库处理、环境噪声适应性、实时定位精度和定位速度等方面具有优势,实验结果表明,I-WKNN算法在复杂噪声环境下的定位精度和定位时间上具有明显优势,在智能体育场中具有明显的应用潜力。 摘要:Based on various existing wireless fingerprint location algorithms in intelligent sports venues, a high-precision and fast indoor location algorithm improved weighted k-nearest neighbor (I-WKNN) is proposed. In order to meet the complex environment of sports venues and the demand of high-speed sampling, this paper proposes an AP selection algorithm for offline and online stages. Based on the characteristics of the signal intensity distribution in intelligent venues, an asymmetric Gaussian filter algorithm is proposed. This paper introduces the application of the positioning algorithm in the intelligent stadium system, and completes the data acquisition and real-time positioning of the stadium. Compared with traditional WKNN and KNN algorithms, the I-WKNN algorithm has advantages in fingerprint positioning database processing, environmental noise adaptability, real-time positioning accuracy and positioning speed, etc. The experimental results show that the I-WKNN algorithm has obvious advantages in positioning accuracy and positioning time in a complex noise environment and has obvious application potential in a smart stadium.
【5】 Multilingual training for Software Engineering 标题:软件工程的多语种训练 链接:https://arxiv.org/abs/2112.02043
作者:Toufique Ahmed,Premkumar Devanbu 备注:Accepted at International Conference on Software Engineering (ICSE-2022) 摘要:训练有素的机器学习模型利用了大量开源软件数据,现在已经成为自动化许多软件工程任务的有趣方法。一些SE任务都采用了这种方法,在过去几年中,通过更好的模型和训练方法,绩效逐渐提高。更多、更多样化、干净、有标签的数据更有利于训练;但构建高质量的数据集既耗时又具有挑战性。增加干净标记数据的数量和多样性的方法通常具有广泛的适用性。对于某些语言(如Ruby),标记数据不太丰富;在其他应用程序中(例如JavaScript),可用数据可能更多地集中在某些应用程序域上,因此不太多样化。作为绕过此类数据瓶颈的一种方法,我们提供的证据表明,人类用不同语言编写的代码(执行相同的功能)非常相似,特别是保留了标识符命名模式;我们进一步提出证据表明,标识符是软件工程任务训练数据的一个非常重要的元素。我们利用这一相当偶然的现象来寻找证据,证明可用的多语言训练数据(跨不同语言)可以用来提高性能。我们研究了三个不同的任务:代码摘要、代码检索和函数命名。我们注意到,这种数据扩充方法广泛兼容不同的任务、语言和机器学习模型。 摘要:Well-trained machine-learning models, which leverage large amounts of open-source software data, have now become an interesting approach to automating many software engineering tasks. Several SE tasks have all been subject to this approach, with performance gradually improving over the past several years with better models and training methods. More, and more diverse, clean, labeled data is better for training; but constructing good-quality datasets is time-consuming and challenging. Ways of augmenting the volume and diversity of clean, labeled data generally have wide applicability. For some languages (e.g., Ruby) labeled data is less abundant; in others (e.g., JavaScript) the available data maybe more focused on some application domains, and thus less diverse. As a way around such data bottlenecks, we present evidence suggesting that human-written code in different languages (which performs the same function), is rather similar, and particularly preserving of identifier naming patterns; we further present evidence suggesting that identifiers are a very important element of training data for software engineering tasks. We leverage this rather fortuitous phenomenon to find evidence that available multilingual training data (across different languages) can be used to amplify performance. We study this for 3 different tasks: code summarization, code retrieval, and function naming. We note that this data-augmenting approach is broadly compatible with different tasks, languages, and machine-learning models.
【6】 A Survey on Concept Drift in Process Mining 标题:流程挖掘中的概念漂移研究综述 链接:https://arxiv.org/abs/2112.02000
作者:Denise Maria Vecino Sato,Sheila Cristiana de Freitas,Jean Paul Barddal,Edson Emilio Scalabrin 备注:None 摘要:流程挖掘(PM)中的概念漂移是一个挑战,因为经典方法假设流程处于稳定状态,即事件共享相同的流程版本。我们对这些领域的交叉点进行了系统的文献回顾,因此,我们回顾了流程挖掘中的概念漂移,并提出了针对不断变化的环境的漂移检测和在线流程挖掘现有技术的分类。现有工作表明:(i)PM仍然主要集中于离线分析,(ii)由于缺乏通用的评估协议、数据集和度量,过程中概念漂移技术的评估非常繁琐。 摘要:Concept drift in process mining (PM) is a challenge as classical methods assume processes are in a steady-state, i.e., events share the same process version. We conducted a systematic literature review on the intersection of these areas, and thus, we review concept drift in process mining and bring forward a taxonomy of existing techniques for drift detection and online process mining for evolving environments. Existing works depict that (i) PM still primarily focuses on offline analysis, and (ii) the assessment of concept drift techniques in processes is cumbersome due to the lack of common evaluation protocol, datasets, and metrics.
【7】 Survey on English Entity Linking on Wikidata 标题:关于维基数据上英文实体链接的调查 链接:https://arxiv.org/abs/2112.01989
作者:Cedric Möller,Jens Lehmann,Ricardo Usbeck 备注:Disclaimer: Cedric M\"oller, Jens Lehmann, Ricardo Usbeck, 2021. The definitive, peer reviewed and edited version of this article is published in the Semantic Web Journal, Special issue: Latest Advancements in Linguistic 3 Linked Data, 2021 摘要:Wikidata是一个经常更新的、社区驱动的、多语言的知识图。因此,维基数据是实体链接的一个有吸引力的基础,最近发表的论文的增加就是明证。本次调查主要关注四个主题:(1)链接数据集的Wikidata实体存在哪些,它们的使用范围有多广,以及它们是如何构建的?(2) Wikidata的特性对实体链接数据集的设计有影响吗?如果有,如何影响?(3) 当前的实体链接方法如何利用Wikidata的特定特性?(4) 现有实体链接方法未利用哪些Wikidata特性?这项调查显示,当前Wikidata特定实体链接数据集的注释方案与其他知识图(如DBpedia)的注释方案没有区别。因此,自然适合Wikidata的多语言和时间相关数据集的潜力并没有被释放。此外,我们还表明,大多数实体链接方法使用Wikidata的方式与任何其他知识图使用Wikidata的方式相同,没有机会利用Wikidata特定的特性来提高质量。几乎所有的方法都使用特定的属性,如标签,有时使用描述,但忽略了超关系结构等特征。因此,仍然有改进的余地,例如,通过包含超关系图嵌入或类型信息。许多方法还包括来自Wikipedia的信息,它很容易与Wikidata结合,并提供Wikidata所缺乏的有价值的文本信息。 摘要:Wikidata is a frequently updated, community-driven, and multilingual knowledge graph. Hence, Wikidata is an attractive basis for Entity Linking, which is evident by the recent increase in published papers. This survey focuses on four subjects: (1) Which Wikidata Entity Linking datasets exist, how widely used are they and how are they constructed? (2) Do the characteristics of Wikidata matter for the design of Entity Linking datasets and if so, how? (3) How do current Entity Linking approaches exploit the specific characteristics of Wikidata? (4) Which Wikidata characteristics are unexploited by existing Entity Linking approaches? This survey reveals that current Wikidata-specific Entity Linking datasets do not differ in their annotation scheme from schemes for other knowledge graphs like DBpedia. Thus, the potential for multilingual and time-dependent datasets, naturally suited for Wikidata, is not lifted. Furthermore, we show that most Entity Linking approaches use Wikidata in the same way as any other knowledge graph missing the chance to leverage Wikidata-specific characteristics to increase quality. Almost all approaches employ specific properties like labels and sometimes descriptions but ignore characteristics such as the hyper-relational structure. Hence, there is still room for improvement, for example, by including hyper-relational graph embeddings or type information. Many approaches also include information from Wikipedia, which is easily combinable with Wikidata and provides valuable textual information, which Wikidata lacks.
【8】 MetaQA: Combining Expert Agents for Multi-Skill Question Answering 标题:MetaQA:组合专家Agent进行多技能问答 链接:https://arxiv.org/abs/2112.01922
作者:Haritz Puerto,Gözde Gül Şahin,Iryna Gurevych 摘要:最近问答(QA)数据集和模型的爆炸式增长,通过在多个数据集上训练模型或结合多个模型,增加了人们对跨多个领域和格式的模型泛化的兴趣。我们认为,尽管多数据集模型的结果很有希望,但某些领域或QA格式可能需要特定的体系结构,因此这些模型的适应性可能会受到限制。此外,当前用于组合模型的方法忽略了问题-答案兼容性等线索。在这项工作中,我们建议将专家代理与一种新颖、灵活且训练高效的体系结构相结合,该体系结构考虑问题、答案预测和答案预测置信度得分,以从候选答案列表中选择最佳答案。通过定量和定性实验,我们表明,我们的模型i)在代理之间建立了协作,在域内和域外场景中都优于以前的多代理和多数据集方法;ii)训练数据效率极高;iii)可以适应任何QA格式。 摘要:The recent explosion of question answering (QA) datasets and models has increased the interest in the generalization of models across multiple domains and formats by either training models on multiple datasets or by combining multiple models. We argue that despite the promising results of multi-dataset models, some domains or QA formats may require specific architectures, and thus the adaptability of these models might be limited. In addition, current approaches for combining models disregard cues such as question-answer compatibility. In this work, we propose to combine expert agents with a novel, flexible, and training-efficient architecture that considers questions, answer predictions, and answer-prediction confidence scores to select the best answer among a list of answer candidates. Through quantitative and qualitative experiments we show that our model i) creates a collaboration between agents that outperforms previous multi-agent and multi-dataset approaches in both in-domain and out-of-domain scenarios, ii) is extremely data-efficient to train, and iii) can be adapted to any QA format.
【9】 Hybrid Digital Twin for process industry using Apros simulation environment 标题:基于Apros仿真环境的流程工业混合数字孪生系统 链接:https://arxiv.org/abs/2112.01903
作者:Mohammad Azangoo,Joonas Salmi,Iivo Yrjölä,Jonathan Bensky,Gerardo Santillan,Nikolaos Papakonstantinou,Seppo Sierla,Valeriy Vyatkin 摘要:制作更新的竣工模型在工艺装置的生命周期中起着重要作用。特别是,数字孪生模型必须精确,以保证系统的效率和可靠性。数据驱动模型可以通过考虑不确定性和生命周期相关变化来模拟子系统的最新行为。本文以一个早期实现的原型为例,提出了过程工厂混合数字双模型的逐步概念。它将详细说明使用工艺设备的数据驱动模型更新棕地工艺系统的第一原理模型和数字孪生模型的步骤。还将讨论生成竣工混合数字孪生兄弟的挑战。借助过程历史数据来教授机器学习模型,随着时间的推移,实现的数字孪生可以不断改进,并且可以进一步优化正在进行的工作。 摘要:Making an updated and as-built model plays an important role in the life-cycle of a process plant. In particular, Digital Twin models must be precise to guarantee the efficiency and reliability of the systems. Data-driven models can simulate the latest behavior of the sub-systems by considering uncertainties and life-cycle related changes. This paper presents a step-by-step concept for hybrid Digital Twin models of process plants using an early implemented prototype as an example. It will detail the steps for updating the first-principles model and Digital Twin of a brownfield process system using data-driven models of the process equipment. The challenges for generation of an as-built hybrid Digital Twin will also be discussed. With the help of process history data to teach Machine Learning models, the implemented Digital Twin can be continually improved over time and this work in progress can be further optimized.
【10】 Estimating the Value-at-Risk by Temporal VAE 标题:用时间VAE估计在险价值(Value-at-Risk) 链接:https://arxiv.org/abs/2112.01896
作者:Robert Sicks,Stefanie Grimm,Ralf Korn,Ivo Richert 备注:35 pages 摘要:大型资产组合的风险价值(VaR)估计是金融机构的一项重要任务。由于资产价格的联合对数收益通常可以预测到更小维度的潜在空间,因此使用变分自动编码器(VAE)估计VaR是一个自然的建议。为了确保在学习序列数据时自动编码器的瓶颈结构,我们使用了一种时态VAE(TempVAE),它避免了观测变量的自回归结构。然而,金融数据的低信噪比与VAE的自动修剪特性相结合,通常使得VAE的使用容易出现后崩溃。所以,我们建议使用正则化退火来减轻这种影响。因此,TempVAE的自动修剪工作正常,这也为VaR带来了出色的估计结果,在应用于实际数据时,它优于经典的GARCH类型和历史模拟方法。 摘要:Estimation of the value-at-risk (VaR) of a large portfolio of assets is an important task for financial institutions. As the joint log-returns of asset prices can often be projected to a latent space of a much smaller dimension, the use of a variational autoencoder (VAE) for estimating the VaR is a natural suggestion. To ensure the bottleneck structure of autoencoders when learning sequential data, we use a temporal VAE (TempVAE) that avoids an auto-regressive structure for the observation variables. However, the low signal- to-noise ratio of financial data in combination with the auto-pruning property of a VAE typically makes the use of a VAE prone to posterior collapse. Therefore, we propose to use annealing of the regularization to mitigate this effect. As a result, the auto-pruning of the TempVAE works properly which also results in excellent estimation results for the VaR that beats classical GARCH-type and historical simulation approaches when applied to real data.
【11】 Image-to-image Translation as a Unique Source of Knowledge 标题:作为一种独特的知识来源的影象翻译 链接:https://arxiv.org/abs/2112.01873
作者:Alejandro D. Mousist 摘要:图像到图像(I2I)转换是一种将数据从一个域转换到另一个域的既定方法,但在处理SAR/光学卫星图像等不同域时,目标域中转换图像的可用性以及将多少原始域转换到目标域仍然不够清楚。本文通过使用最新的I2I算法将标记数据集从光学域转换到SAR域,从目标域中传输的特征中学习,并在稍后评估从原始数据集传输了多少数据,从而解决了这一问题。除此之外,还建议将堆叠作为一种结合从不同I2I翻译中学习到的知识并针对单个模型进行评估的方法。 摘要:Image-to-image (I2I) translation is an established way of translating data from one domain to another but the usability of the translated images in the target domain when working with such dissimilar domains as the SAR/optical satellite imagery ones and how much of the origin domain is translated to the target domain is still not clear enough. This article address this by performing translations of labelled datasets from the optical domain to the SAR domain with different I2I algorithms from the state-of-the-art, learning from transferred features in the destination domain and evaluating later how much from the original dataset was transferred. Added to this, stacking is proposed as a way of combining the knowledge learned from the different I2I translations and evaluated against single models.
【12】 Discovery of Crime Event Sequences with Constricted Spatio-Temporal Sequential Patterns 标题:具有压缩时空序列模式的犯罪事件序列的发现 链接:https://arxiv.org/abs/2112.01863
作者:Piotr S. Maciąg,Robert Bembenik,Artur Dubrawski 备注:37 pages 摘要:在这篇文章中,我们介绍了一种新型的时空序列模式,称为压缩时空序列(CSTS)模式,并深入分析了它们的特性。我们证明了CSTS模式集是可以在给定数据集中发现的所有时空序列模式的简明表示。为了测量发现的CSTS模式的重要性,我们采用了参与指数测量。我们还提供了CSTS Miner:一种在事件数据中发现所有参与索引强CSTS模式的算法。我们使用两个与犯罪相关的数据集:匹兹堡警察事件记录数据集和波士顿犯罪事件报告数据集对所提出的算法进行了实验评估。在实验中,将CSTS-Miner算法与其他四种最先进的算法:STS-Miner、CSTPM、STBFM和CST-SPMiner进行了比较。实验结果表明,该算法比其他算法发现的模式要少得多。最后,我们提供了所提出的CSTS Miner算法发现的有趣的犯罪相关模式的示例。 摘要:In this article, we introduce a novel type of spatio-temporal sequential patterns called Constricted Spatio-Temporal Sequential (CSTS) patterns and thoroughly analyze their properties. We demonstrate that the set of CSTS patterns is a concise representation of all spatio-temporal sequential patterns that can be discovered in a given dataset. To measure significance of the discovered CSTS patterns we adapt the participation index measure. We also provide CSTS-Miner: an algorithm that discovers all participation index strong CSTS patterns in event data. We experimentally evaluate the proposed algorithms using two crime-related datasets: Pittsburgh Police Incident Blotter Dataset and Boston Crime Incident Reports Dataset. In the experiments, the CSTS-Miner algorithm is compared with the other four state-of-the-art algorithms: STS-Miner, CSTPM, STBFM and CST-SPMiner. As the results of experiments suggest, the proposed algorithm discovers much fewer patterns than the other selected algorithms. Finally, we provide the examples of interesting crime-related patterns discovered by the proposed CSTS-Miner algorithm.
【13】 Episodic Policy Gradient Training 标题:情景政策梯度训练 链接:https://arxiv.org/abs/2112.01853
作者:Hung Le,Majid Abdolshah,Thommen K. George,Kien Do,Dung Nguyen,Svetha Venkatesh 备注:19 pages 摘要:我们介绍了一种新的策略梯度方法的训练过程,其中情景记忆用于动态优化强化学习算法的超参数。与其他超参数搜索不同,我们将超参数调度描述为一个标准的马尔可夫决策过程,并使用情节记忆来存储使用的超参数及其训练上下文的结果。在任何策略更新步骤中,策略学习者参考存储的经验,并使用记忆确定的新超参数自适应地重新配置其学习算法。这种机制被称为情节策略梯度训练(EPGT),实现了情节学习过程,并在一次运行中联合学习策略和学习算法的超参数。在连续和离散环境下的实验结果表明,使用该方法可以提高各种策略梯度算法的性能。 摘要:We introduce a novel training procedure for policy gradient methods wherein episodic memory is used to optimize the hyperparameters of reinforcement learning algorithms on-the-fly. Unlike other hyperparameter searches, we formulate hyperparameter scheduling as a standard Markov Decision Process and use episodic memory to store the outcome of used hyperparameters and their training contexts. At any policy update step, the policy learner refers to the stored experiences, and adaptively reconfigures its learning algorithm with the new hyperparameters determined by the memory. This mechanism, dubbed as Episodic Policy Gradient Training (EPGT), enables an episodic learning process, and jointly learns the policy and the learning algorithm's hyperparameters within a single run. Experimental results on both continuous and discrete environments demonstrate the advantage of using the proposed method in boosting the performance of various policy gradient algorithms.
【14】 Automatic evaluation of scientific abstracts through natural language processing 标题:基于自然语言处理的科技摘要自动评价 链接:https://arxiv.org/abs/2112.01842
作者:Lucas G. O. Lopes,Thales M. A. Vieira,William W. M. Lira 摘要:这项工作提出了一个框架来分类和评估不同的研究摘要文本,这些文本侧重于过程及其应用的描述。在这种背景下,本文提出了自然语言处理算法来分类、分割和评估科学工作的结果。最初,提出的框架根据文本分类方法要解决的问题,将抽象文本分类为多个类别。然后,将抽象文本分为问题描述、方法和结果。最后,基于对摘要结果的情感分析,对摘要的方法论进行了排名。建议的框架使我们能够快速排列解决特定问题的最佳方法。为了验证所提出的框架,对采油异常摘要进行了实验,并取得了令人满意的结果。 摘要:This work presents a framework to classify and evaluate distinct research abstract texts which are focused on the description of processes and their applications. In this context, this paper proposes natural language processing algorithms to classify, segment and evaluate the results of scientific work. Initially, the proposed framework categorize the abstract texts into according to the problems intended to be solved by employing a text classification approach. Then, the abstract text is segmented into problem description, methodology and results. Finally, the methodology of the abstract is ranked based on the sentiment analysis of its results. The proposed framework allows us to quickly rank the best methods to solve specific problems. To validate the proposed framework, oil production anomaly abstracts were experimented and achieved promising results.
【15】 Semantic Segmentation of Legal Documents via Rhetorical Roles 标题:基于修辞角色的法律文本语义切分 链接:https://arxiv.org/abs/2112.01836
作者:Vijit Malik,Rishabh Sanjay,Shouvik Kumar Guha,Shubham Kumar Nigam,Angshuman Hazarika,Arnab Bhattacharya,Ashutosh Modi 备注:16 pages 摘要:法律文档是非结构化的,使用法律术语,并且具有相当长的长度,因此很难通过传统的文本处理技术自动处理。如果文档可以在语义上分割为连贯的信息单元,那么法律文档处理系统将大大受益。本文提出了一个修辞角色(RR)系统,用于将法律文件分割为语义连贯的单元:事实、论点、法规、问题、先例、裁决和比率。在法律专家的帮助下,我们提出了一套13个细粒度的修辞角色标签,并创建了一个新的法律文件语料库,用建议的RR注释。我们开发了一个将文档分割成修辞角色单元的系统。特别是,我们开发了一个基于多任务学习的深度学习模型,将文档修辞角色标签转换作为分割法律文档的辅助任务。我们对各种深度学习模型进行了广泛的实验,以预测文档中的修辞角色,与现有模型相比,该模型表现出了更高的性能。此外,我们将RR应用于预测法律案件的判决,并表明与基于Transformer的模型相比,RR的使用增强了预测。 摘要:Legal documents are unstructured, use legal jargon, and have considerable length, making it difficult to process automatically via conventional text processing techniques. A legal document processing system would benefit substantially if the documents could be semantically segmented into coherent units of information. This paper proposes a Rhetorical Roles (RR) system for segmenting a legal document into semantically coherent units: facts, arguments, statute, issue, precedent, ruling, and ratio. With the help of legal experts, we propose a set of 13 fine-grained rhetorical role labels and create a new corpus of legal documents annotated with the proposed RR. We develop a system for segmenting a document into rhetorical role units. In particular, we develop a multitask learning-based deep learning model with document rhetorical role label shift as an auxiliary task for segmenting a legal document. We experiment extensively with various deep learning models for predicting rhetorical roles in a document, and the proposed model shows superior performance over the existing models. Further, we apply RR for predicting the judgment of legal cases and show that the use of RR enhances the prediction compared to the transformer-based models.
【16】 The UniNAS framework: combining modules in arbitrarily complex configurations with argument trees 标题:UniNAS框架:使用参数树组合任意复杂配置中的模块 链接:https://arxiv.org/abs/2112.01796
作者:Kevin Alexander Laube 备注:a laxly written technical presentation of UniNAS and Argument Trees, the code is publicly available 摘要:将代码设计得过于简单,但又能提供选择,这是一个走钢丝的过程。优化器和数据集等附加模块使框架对更广泛的受众有用,但增加的复杂性很快成为一个问题。框架参数可能仅适用于某些模块,而不适用于其他模块,它们相互排斥或相互依赖,通常以不明确的方式存在。尽管如此,许多框架仅限于少数特定用例。本文介绍了UniNAS的基本概念,UniNAS是一个框架,旨在结合各种神经结构搜索方法。由于它们在优化器和网络数量、超参数优化、网络设计、候选操作等方面存在差异,传统方法无法解决该任务。相反,每个模块定义自己的超参数和模块需求的局部树结构。配置文件指定使用哪些模块、它们使用的参数以及它们使用的其他模块。反过来,参数树的概念允许在复杂配置中组合和重用模块,同时避免上述许多问题。参数树也可以从图形用户界面进行配置,这样就可以在不编写一行代码的情况下设计和更改实验。UniNAS可在以下网址公开获取:https://github.com/cogsys-tuebingen/uninas 摘要:Designing code to be simplistic yet to offer choice is a tightrope walk. Additional modules such as optimizers and data sets make a framework useful to a broader audience, but the added complexity quickly becomes a problem. Framework parameters may apply only to some modules but not others, be mutually exclusive or depend on each other, often in unclear ways. Even so, many frameworks are limited to a few specific use cases. This paper presents the underlying concept of UniNAS, a framework designed to incorporate a variety of Neural Architecture Search approaches. Since they differ in the number of optimizers and networks, hyper-parameter optimization, network designs, candidate operations, and more, a traditional approach can not solve the task. Instead, every module defines its own hyper-parameters and a local tree structure of module requirements. A configuration file specifies which modules are used, their used parameters, and which other modules they use in turn This concept of argument trees enables combining and reusing modules in complex configurations while avoiding many problems mentioned above. Argument trees can also be configured from a graphical user interface so that designing and changing experiments becomes possible without writing a single line of code. UniNAS is publicly available at https://github.com/cogsys-tuebingen/uninas
【17】 Prescriptive Process Monitoring: Quo Vadis? 标题:规范过程监控:现状VADIS? 链接:https://arxiv.org/abs/2112.01769
作者:Kateryna Kubrak,Fredrik Milani,Alexander Nolte,Marlon Dumas 摘要:规定性流程监控方法旨在通过在运行时建议干预措施来优化业务流程,以防止出现负面结果或表现不佳的情况。近年来,人们提出了各种规定性的过程监控方法。本文通过系统文献综述(SLR)研究该领域的现有方法。为了构建该领域,本文提出了一个根据绩效目标、绩效指标、干预类型、建模技术、数据输入和干预策略来描述规范性流程监控方法的框架。SLR为未来的研究提供了挑战和领域的见解,这些挑战和领域可以增强规范性过程监控方法的有用性和适用性。该文件强调需要在现实环境中验证现有和新方法,将干预类型扩展到与时间和成本角度相关的干预类型之外,并设计考虑因果关系和二阶效应的政策。 摘要:Prescriptive process monitoring methods seek to optimize a business process by recommending interventions at runtime to prevent negative outcomes or poorly performing cases. In recent years, various prescriptive process monitoring methods have been proposed. This paper studies existing methods in this field via a Systematic Literature Review (SLR). In order to structure the field, the paper proposes a framework for characterizing prescriptive process monitoring methods according to their performance objective, performance metrics, intervention types, modeling techniques, data inputs, and intervention policies. The SLR provides insights into challenges and areas for future research that could enhance the usefulness and applicability of prescriptive process monitoring methods. The paper highlights the need to validate existing and new methods in real-world settings, to extend the types of interventions beyond those related to the temporal and cost perspectives, and to design policies that take into account causality and second-order effects.
【18】 Hamiltonian prior to Disentangle Content and Motion in Image Sequences 标题:图像序列中内容和运动解缠前的哈密顿量 链接:https://arxiv.org/abs/2112.01641
作者:Asif Khan,Amos Storkey 备注:Controllable Generative Modeling in Language and Vision Workshop at NeurIPS 2021 摘要:我们提出了一个高维序列数据的深层潜变量模型。我们的模型将潜在空间分解为内容和运动变量。为了对不同的动力学进行建模,我们将运动空间划分为子空间,并为每个子空间引入唯一的哈密顿算符。哈密顿公式提供可逆动力学,学习约束运动路径以保持不变特性。运动空间的显式分裂将哈密顿量分解为对称群,并给出动力学的长期可分性。这种分离还意味着可以学习易于理解和控制的表达。我们展示了我们的模型用于交换两个视频的运动、从给定图像生成各种动作序列和无条件序列生成的实用性。 摘要:We present a deep latent variable model for high dimensional sequential data. Our model factorises the latent space into content and motion variables. To model the diverse dynamics, we split the motion space into subspaces, and introduce a unique Hamiltonian operator for each subspace. The Hamiltonian formulation provides reversible dynamics that learn to constrain the motion path to conserve invariant properties. The explicit split of the motion space decomposes the Hamiltonian into symmetry groups and gives long-term separability of the dynamics. This split also means representations can be learnt that are easy to interpret and control. We demonstrate the utility of our model for swapping the motion of two videos, generating sequences of various actions from a given image and unconditional sequence generation.
【19】 A Survey on Awesome Korean NLP Datasets 标题:令人敬畏的朝鲜语自然语言处理数据集综述 链接:https://arxiv.org/abs/2112.01624
作者:Byunghyun Ban 备注:11 pages, 1 horizontal page for large table 摘要:基于英语的数据集通常可从Kaggle、GitHub或最近发表的论文中获得。尽管使用英语数据集进行的基准测试足以展示新模型和方法的性能,但研究人员仍需要在基于韩语的数据集上训练和验证模型,以生产适合韩语加工的技术或产品。本文介绍了15个流行的基于韩语的NLP数据集,并总结了一些细节,如卷、许可证、存储库以及受这些数据集启发的其他研究结果。此外,我还提供了数据集样本或统计数据的高分辨率说明。数据集的主要特征显示在一个表中,为研究人员提供数据集的快速摘要。 摘要:English based datasets are commonly available from Kaggle, GitHub, or recently published papers. Although benchmark tests with English datasets are sufficient to show off the performances of new models and methods, still a researcher need to train and validate the models on Korean based datasets to produce a technology or product, suitable for Korean processing. This paper introduces 15 popular Korean based NLP datasets with summarized details such as volume, license, repositories, and other research results inspired by the datasets. Also, I provide high-resolution instructions with sample or statistics of datasets. The main characteristics of datasets are presented on a single table to provide a rapid summarization of datasets for researchers.
【20】 Neurosymbolic Systems of Perception & Cognition: The Role of Attention 标题:感知和认知的神经符号系统:注意的作用 链接:https://arxiv.org/abs/2112.01603
作者:Hugo Latapie,Ozkan Kilic,Kristinn R. Thorisson,Pei Wang,Patrick Hammer 摘要:以累积学习为目标的认知架构必须提供必要的信息和控制结构,以允许代理从他们的经验中进行增量和自主学习。这包括管理代理的目标,以及在其感知信息堆栈中不断地将感官信息与这些目标关联。学习代理的环境越多样化,这些机制就必须越通用和灵活,以处理更广泛的相关模式、任务和目标结构。虽然许多研究人员都同意,不同抽象层次的信息可能在组成、结构和处理机制上有所不同,但研究界对这些差异的细节并不普遍认同。二进制处理架构(通常称为System-1和System-2)分别被提出作为低层和高层信息的认知处理模型。我们假设认知不是以这种方式二元的,任何抽象层次的知识都涉及我们所说的神经符号信息,这意味着高层次和低层次的数据都必须包含符号和亚符号信息。此外,我们认为,高层次和低层次数据抽象处理之间的主要区别因素在很大程度上可以归因于所涉及的注意机制的性质。我们描述了这一观点背后的关键论点,并回顾了文献中的相关证据。 摘要:A cognitive architecture aimed at cumulative learning must provide the necessary information and control structures to allow agents to learn incrementally and autonomously from their experience. This involves managing an agent's goals as well as continuously relating sensory information to these in its perception-cognition information stack. The more varied the environment of a learning agent is, the more general and flexible must be these mechanisms to handle a wider variety of relevant patterns, tasks, and goal structures. While many researchers agree that information at different levels of abstraction likely differs in its makeup and structure and processing mechanisms, agreement on the particulars of such differences is not generally shared in the research community. A binary processing architecture (often referred to as System-1 and System-2) has been proposed as a model of cognitive processing for low- and high-level information, respectively. We posit that cognition is not binary in this way and that knowledge at any level of abstraction involves what we refer to as neurosymbolic information, meaning that data at both high and low levels must contain both symbolic and subsymbolic information. Further, we argue that the main differentiating factor between the processing of high and low levels of data abstraction can be largely attributed to the nature of the involved attention mechanisms. We describe the key arguments behind this view and review relevant evidence from the literature.
【21】 Online Search With Best-Price and Query-Based Predictions 标题:基于最优价格和基于查询的预测的在线搜索 链接:https://arxiv.org/abs/2112.01592
作者:Spyros Angelopoulos,Shahin Kamali,Dehou Zhang 备注:22 pages, 5 figures 摘要:在在线(时间序列)搜索问题中,玩家会看到一系列在线显示的价格。在问题的标准定义中,对于每个披露的价格,参与者必须在不知道未来价格(除了其极值的上限和下限)的情况下,不可撤销地决定是否接受或拒绝该价格,并且目标是最小化竞争比,即序列中的最高价格与玩家选择的价格之间的最坏情况比率。该问题描述了在面对暴露样本的不确定性时决策的若干应用。以前关于这个问题的工作基本上假设了极端情况,要么玩家几乎没有关于输入的信息,要么玩家得到了一些强大的、无错误的建议。在这项工作中,我们研究学习增强算法,其中有一个潜在的错误预测有关的输入。具体而言,我们考虑两种不同的设置:预测与序列中的最大价格有关的设置,以及作为对多个二进制查询的响应而获得预测的设置。对于这两种设置,我们提供了搜索算法最坏情况下性能的严格或接近严格的上下界,作为预测误差的函数。我们还提供了从证券交易市场获得的数据的实验结果,证实了理论分析,并解释了我们的技术如何适用于其他学习增强应用程序。 摘要:In the online (time-series) search problem, a player is presented with a sequence of prices which are revealed in an online manner. In the standard definition of the problem, for each revealed price, the player must decide irrevocably whether to accept or reject it, without knowledge of future prices (other than an upper and a lower bound on their extreme values), and the objective is to minimize the competitive ratio, namely the worst-case ratio between the maximum price in the sequence and the one selected by the player. The problem formulates several applications of decision-making in the face of uncertainty on the revealed samples. Previous work on this problem has largely assumed extreme scenarios in which either the player has almost no information about the input, or the player is provided with some powerful, and error-free advice. In this work, we study learning-augmented algorithms, in which there is a potentially erroneous prediction concerning the input. Specifically, we consider two different settings: the setting in which the prediction is related to the maximum price in the sequence, as well as the setting in which the prediction is obtained as a response to a number of binary queries. For both settings, we provide tight, or near-tight upper and lower bounds on the worst-case performance of search algorithms as a function of the prediction error. We also provide experimental results on data obtained from stock exchange markets that confirm the theoretical analysis, and explain how our techniques can be applicable to other learning-augmented applications.
【22】 HMC with Normalizing Flows 标题:具有归一化流量的HMC 链接:https://arxiv.org/abs/2112.01586
作者:Sam Foreman,Taku Izubuchi,Luchang Jin,Xiao-Yong Jin,James C. Osborn,Akio Tomiya 备注:7 pages, 6 figures, presented at The 38th International Symposium on Lattice Field Theory, LATTICE2021 26th-30th July, 2021 Zoom/Gather @ Massachusetts Institute of Technology 摘要:我们建议在哈密顿蒙特卡罗(HMC)的分子动力学更新中使用规范化流作为可训练核。通过学习简化动力学的(可逆)变换,我们可以在生成独立配置方面胜过传统方法。我们表明,使用精心构建的网络体系结构,我们的方法可以轻松地扩展到大的晶格体积,只需最少的再训练工作。我们的实现的源代码在以下网站公开https://github.com/nftqcd/fthmc. 摘要:We propose using Normalizing Flows as a trainable kernel within the molecular dynamics update of Hamiltonian Monte Carlo (HMC). By learning (invertible) transformations that simplify our dynamics, we can outperform traditional methods at generating independent configurations. We show that, using a carefully constructed network architecture, our approach can be easily scaled to large lattice volumes with minimal retraining effort. The source code for our implementation is publicly available online at https://github.com/nftqcd/fthmc.
【23】 Improving mathematical questioning in teacher training 标题:在教师训练中改进数学提问 链接:https://arxiv.org/abs/2112.01537
作者:Debajyoti Datta,Maria Phillips,James P Bywater,Jennifer Chiu,Ginger S. Watson,Laura E. Barnes,Donald E Brown 备注:Accepted to appear at the NeurIPS 2021 Human Centered AI Workshop (HCAI). arXiv admin note: text overlap with arXiv:2112.00985 摘要:高保真、基于人工智能的模拟教室系统使教师能够演练有效的教学策略。然而,以对话为导向的开放式对话,如向学生讲授比例因素,可能难以建模。本文介绍了一个高保真的,基于人工智能的课堂模拟器,以帮助教师排练基于研究的数学提问技能。我们采用以人为中心的方法来设计我们的系统,依靠深度学习、不确定性量化和自然语言处理方面的进步,同时承认会话代理对于特定教学需求的局限性。在模拟过程中直接使用专家的输入,我们演示了如何实现对话成功率和高用户满意度。 摘要:High-fidelity, AI-based simulated classroom systems enable teachers to rehearse effective teaching strategies. However, dialogue-oriented open-ended conversations such as teaching a student about scale factor can be difficult to model. This paper presents a high-fidelity, AI-based classroom simulator to help teachers rehearse research-based mathematical questioning skills. We take a human centered approach to designing our system, relying advances in deep-learning, uncertainty quantification and natural language processing while acknowledging the limitations of conversational agents for specific pedagogical needs. Using experts' input directly during the simulation, we demonstrate how conversation success rate and high user satisfaction can be achieved.
【24】 Chronological Causal Bandits 标题:时序因果关系图 链接:https://arxiv.org/abs/2112.01819
作者:Neil Dhir 备注:10 pages, accepted at the NeurIPS 2021 workshop Causal Inference Challenges in Sequential Decision Making: Bridging Theory and Practice 摘要:本文研究了多臂bandit(MAB)问题的一个实例,特别是多个因果MAB在同一动力系统中按时间顺序运行的情况。实际上,每个强盗的报酬分布由相同的非平凡依赖结构控制,这是一个动态因果模型。动态,因为我们允许每个因果性单克隆抗体依赖于前面的单克隆抗体,并且在这样做时能够在代理之间传递信息。我们的贡献,按时间顺序排列的因果班迪特(CCB),在因果效应随时间变化的离散决策环境中是有用的,并且可以通过同一系统中的早期干预获得信息。在这篇文章中,我们提出了一些早期的发现,商业罪案调查局证明了一个玩具问题。 摘要:This paper studies an instance of the multi-armed bandit (MAB) problem, specifically where several causal MABs operate chronologically in the same dynamical system. Practically the reward distribution of each bandit is governed by the same non-trivial dependence structure, which is a dynamic causal model. Dynamic because we allow for each causal MAB to depend on the preceding MAB and in doing so are able to transfer information between agents. Our contribution, the Chronological Causal Bandit (CCB), is useful in discrete decision-making settings where the causal effects are changing across time and can be informed by earlier interventions in the same system. In this paper, we present some early findings of the CCB as demonstrated on a toy problem.
【25】 Computation of conditional expectations with guarantees 标题:有担保的条件期望的计算 链接:https://arxiv.org/abs/2112.01804
作者:Patrick Cheridito,Balint Gersey 摘要:理论上,给定一个$d$维随机向量$X$的平方可积随机变量$Y$的条件期望可以通过最小化所有Borel可测函数$f\colon\mathbb{R}^d\to\mathbb{R}$上$Y$和$f(X)$之间的均方距离来获得。然而,在许多应用中,这个最小化问题不能精确地解决,相反,必须使用一种数值方法来计算合适的Borel函数子族上的近似最小值。结果的质量取决于亚家族的充分性和数值方法的性能。在本文中,我们推导了最小均方距离的期望值表示,在许多应用中,它可以用标准蒙特卡罗平均值有效地近似。这使我们能够保证给定条件期望的任何数值近似的准确性。我们通过在不同的具体例子中评估通过线性、多项式以及神经网络回归获得的近似条件期望的质量来说明该方法。 摘要:Theoretically, the conditional expectation of a square-integrable random variable $Y$ given a $d$-dimensional random vector $X$ can be obtained by minimizing the mean squared distance between $Y$ and $f(X)$ over all Borel measurable functions $f \colon \mathbb{R}^d \to \mathbb{R}$. However, in many applications this minimization problem cannot be solved exactly, and instead, a numerical method that computes an approximate minimum over a suitable subfamily of Borel functions has to be used. The quality of the result depends on the adequacy of the subfamily and the performance of the numerical method. In this paper, we derive an expected value representation of the minimal mean square distance which in many applications can efficiently be approximated with a standard Monte Carlo average. This enables us to provide guarantees for the accuracy of any numerical approximation of a given conditional expectation. We illustrate the method by assessing the quality of approximate conditional expectations obtained by linear, polynomial as well as neural network regression in different concrete examples.
【26】 High-Precision Inversion of Dynamic Radiography Using Hydrodynamic Features 标题:利用流体力学特征进行高精度动态射线成像反演 链接:https://arxiv.org/abs/2112.01627
作者:Maliha Hossain,Balasubramanya T. Nadiga,Oleg Korobkin,Marc L. Klasky,Jennifer L. Schei,Joshua W. Burby,Michael T. McCann,Trevor Wilcox,Soumi De,Charles A. Bouman 备注:Submitted to Optics Express 摘要:射线照相术通常用于探测动态系统中复杂的、不断演变的密度场,从而深入了解底层物理。这项技术已应用于许多领域,包括材料科学、冲击物理、惯性约束聚变和其他国家安全应用。然而,在许多此类应用中,噪声、散射、复杂光束动力学等导致的复杂性妨碍了密度重建的准确性,无法以足够的置信度识别基础物理。因此,静态/动态射线照相的密度重建通常仅限于在许多此类应用中识别不连续特征,如裂纹和空洞。在这项工作中,我们提出了一种从射线图像的时间序列重建密度的全新方法。仅使用射线照片中可识别的稳健特征,我们使用机器学习方法,即条件生成对抗网络(cGAN),将其与基本的流体动力学运动方程相结合,以确定动态射线照片序列中的密度场。接下来,我们寻求通过参数估计和投影到流体动力流形的过程来进一步增强基于ML的密度重建的流体动力一致性。在这种情况下,我们注意到,从训练数据给出的流体动力歧管到考虑的参数空间中的测试数据之间的距离既可作为预测稳健性的诊断,也可用于增强训练数据库,期望后者将进一步减少未来的密度重建误差。最后,我们证明了该方法在捕获允许的流体动力路径方面优于传统的射线照相重建,即使存在相对少量的散射。 摘要:Radiography is often used to probe complex, evolving density fields in dynamic systems and in so doing gain insight into the underlying physics. This technique has been used in numerous fields including materials science, shock physics, inertial confinement fusion, and other national security applications. In many of these applications, however, complications resulting from noise, scatter, complex beam dynamics, etc. prevent the reconstruction of density from being accurate enough to identify the underlying physics with sufficient confidence. As such, density reconstruction from static/dynamic radiography has typically been limited to identifying discontinuous features such as cracks and voids in a number of these applications. In this work, we propose a fundamentally new approach to reconstructing density from a temporal sequence of radiographic images. Using only the robust features identifiable in radiographs, we combine them with the underlying hydrodynamic equations of motion using a machine learning approach, namely, conditional generative adversarial networks (cGAN), to determine the density fields from a dynamic sequence of radiographs. Next, we seek to further enhance the hydrodynamic consistency of the ML-based density reconstruction through a process of parameter estimation and projection onto a hydrodynamic manifold. In this context, we note that the distance from the hydrodynamic manifold given by the training data to the test data in the parameter space considered both serves as a diagnostic of the robustness of the predictions and serves to augment the training database, with the expectation that the latter will further reduce future density reconstruction errors. Finally, we demonstrate the ability of this method to outperform a traditional radiographic reconstruction in capturing allowable hydrodynamic paths even when relatively small amounts of scatter are present.
【27】 LeapfrogLayers: A Trainable Framework for Effective Topological Sampling 标题:LeapfrogLayers:一种可训练的有效拓扑采样框架 链接:https://arxiv.org/abs/2112.01582
作者:Sam Foreman,Xiao-Yong Jin,James C. Osborn 备注:10 pages, 12 figures, presented at the 38th International Symposium on Lattice Field Theory, LATTICE2021 26th-30th July, 2021, Zoom/Gather @ Massachusetts Institute of Technology 摘要:我们介绍了LeapfrogLayers,这是一种可逆的神经网络结构,可以通过训练有效地对2D$U(1)$格点规范理论的拓扑结构进行采样。与传统的HMC相比,我们在拓扑电荷的积分自相关时间上有了改进,并提出了将我们的模型扩展到更大晶格体积的方法。我们的实现是开源的,并在github上公开提供https://github.com/saforem2/l2hmc-qcd 摘要:We introduce LeapfrogLayers, an invertible neural network architecture that can be trained to efficiently sample the topology of a 2D $U(1)$ lattice gauge theory. We show an improvement in the integrated autocorrelation time of the topological charge when compared with traditional HMC, and propose methods for scaling our model to larger lattice volumes. Our implementation is open source, and is publicly available on github at https://github.com/saforem2/l2hmc-qcd
【28】 Invariant Priors for Bayesian Quadrature 标题:贝叶斯求积的不变先验 链接:https://arxiv.org/abs/2112.01578
作者:Masha Naslidnyk,Javier Gonzalez,Maren Mahsereci 摘要:贝叶斯求积(BQ)是一种基于模型的数值积分方法,能够通过编码和利用手头积分任务的已知结构来提高样本效率。在本文中,我们探讨了在输入域中一组双射变换,特别是一些幺正变换(如旋转、轴翻转或点对称)下被积函数不变性的编码先验。我们在几个合成应用和一个实际应用中展示了与标准贝叶斯求积相比优越性能的初步结果。 摘要:Bayesian quadrature (BQ) is a model-based numerical integration method that is able to increase sample efficiency by encoding and leveraging known structure of the integration task at hand. In this paper, we explore priors that encode invariance of the integrand under a set of bijective transformations in the input domain, in particular some unitary transformations, such as rotations, axis-flips, or point symmetries. We show initial results on superior performance in comparison to standard Bayesian quadrature on several synthetic and one real world application.
【29】 Automatic tumour segmentation in H&E-stained whole-slide images of the pancreas 标题:胰腺H&E染色全片图像中肿瘤的自动分割 链接:https://arxiv.org/abs/2112.01533
作者:Pierpaolo Vendittelli,Esther M. M. Smeets,Geert Litjens 摘要:胰腺癌很快将成为西方社会癌症相关死亡的第二大原因。CT、MRI和超声波等成像技术通常有助于提供初步诊断,但组织病理学评估仍然是最终确认疾病存在和预后的金标准。近年来,机器学习方法和病理学管道在改善其他癌症实体(如乳腺癌和前列腺癌)的诊断和预后方面显示出潜力。在这些管道中,关键的第一步通常是识别和分割肿瘤区域。理想情况下,此步骤是自动完成的,以避免耗时的手动注释。我们提出了一种多任务卷积神经网络来平衡疾病检测和分割精度。我们在29名患者的数据集(共58张幻灯片)上以不同的分辨率验证了我们的方法。最佳单任务分割网络在15.56$\mu$m的分辨率下实现了0.885(0.122)IQR的中值骰子。我们的多任务网络在这方面有所改进,骰子得分中值为0.934(0.077)IQR。 摘要:Pancreatic cancer will soon be the second leading cause of cancer-related death in Western society. Imaging techniques such as CT, MRI and ultrasound typically help providing the initial diagnosis, but histopathological assessment is still the gold standard for final confirmation of disease presence and prognosis. In recent years machine learning approaches and pathomics pipelines have shown potential in improving diagnostics and prognostics in other cancerous entities, such as breast and prostate cancer. A crucial first step in these pipelines is typically identification and segmentation of the tumour area. Ideally this step is done automatically to prevent time consuming manual annotation. We propose a multi-task convolutional neural network to balance disease detection and segmentation accuracy. We validated our approach on a dataset of 29 patients (for a total of 58 slides) at different resolutions. The best single task segmentation network achieved a median Dice of 0.885 (0.122) IQR at a resolution of 15.56 $\mu$m. Our multi-task network improved on that with a median Dice score of 0.934 (0.077) IQR.
机器翻译,仅供参考