统计学学术速递[8.20]

公众号-arXiv每日学术速递

发布于 2021-08-24 16:36:39

5690

Update！H5支持摘要折叠，体验更佳！点击阅读原文访问arxivdaily.com，涵盖CS|物理|数学|经济|统计|金融|生物|电气领域，更有搜索、收藏等功能！

stat统计学，共计24篇

【1】 Transfer learning in genome-wide association studies with knockoffs 标题：全基因组与假冒相关研究中的迁移学习链接：https://arxiv.org/abs/2108.08813

作者：Shuangning Li,Zhimei Ren,Chiara Sabatti,Matteo Sesia 摘要：本文介绍并比较了其他转移学习方法，这些方法通过利用从不同人群收集的外部数据集中的先验信息或测量相关结果，通过仿冒提高条件测试的能力。特别是在全基因组关联研究的背景下探索了该方法的相关性，在全基因组关联研究中，它有助于解决迫切需要有原则的方法来适当解释和有效地学习与不同祖先相关的遗传变异。最后，我们应用这些方法分析了英国生物银行数据集中的几种表型，证明转移学习有助于仿冒品在从少数群体收集的数据中发现更多的关联，有可能为制定更准确的多基因风险评分开辟道路。摘要：This paper presents and compares alternative transfer learning methods that can increase the power of conditional testing via knockoffs by leveraging prior information in external data sets collected from different populations or measuring related outcomes. The relevance of this methodology is explored in particular within the context of genome-wide association studies, where it can be helpful to address the pressing need for principled ways to suitably account for, and efficiently learn from the genetic variation associated to diverse ancestries. Finally, we apply these methods to analyze several phenotypes in the UK Biobank data set, demonstrating that transfer learning helps knockoffs discover more numerous associations in the data collected from minority populations, potentially opening the way to the development of more accurate polygenic risk scores.

【2】 The effect of the number of distractors and the "None of the above" - "All of the above" options in multiple choice questions 链接：https://arxiv.org/abs/2108.08777

作者：Anna Helga Jonsdottir,Thorarinn Jonmundsson,Inga Huld Armann,Birna Borg Gunnarsdottir,Gunnar Stefansson 机构：StefanssonUniversity of Iceland 摘要：多项选择题（MCQ）是高等教育中常用的评估问题。随着在线检查使用的增加，未来几年MCQ的使用可能会更多。因此，有兴趣研究此类问题的一些特征，例如使用的干扰物数量的影响以及“以上无”（NOTA）或“以上所有”（AOTA）选项。导师网是一个开源的在线钻井系统，任何人都可以免费访问互联网。该系统设计用于数学和统计学教学，但原则上也可用于其他学科。该系统提供了数以千计的多项选择题在高中和大学水平。除了作为学生学习的工具外，它还被用作网络辅助教育研究的试验台。在2020春季的数学统计课程中，导师网络系统被用作一种学习工具和测试工具。大约300名学生参加了该课程，提供了成千上万个答案，用于调查干扰因素的数量和使用NOTA和AOTA选项的效果。问题。该研究的主要发现是，当AOTA选项被用作干扰物时，以及当NOTA和AOTA未被用于提问时，正确回答问题的概率最高。正确回答问题的概率随着干扰物的数量而降低。摘要：Multiple choice questions (MCQs) are commonly used for assessment in higher education. With increased use of on-line examination it is likely that the usage of MCQs will be even more in years to come. It is therefore of interest to examine some characteristics of these type of questions such as the effect of the number of distractors used and the "None of the above" (NOTA) or "All of the above" (AOTA) options. The tutor-web is an open-source, on-line drilling system that is freely available to anyone having access to the Internet. The system was designed to be used for teaching mathematics and statistics but can in principle be used for other subjects as well. The system offers thousands of multiple choice questions at high school and university level. In addition to be a tool used by students for learning it has also been used as a testbed for research on web-assisted education. The tutor-web system was used both as a learning tool and as a testing tool in a university course on mathematical statistics in the spring of 2020. Around 300 students were enrolled in the course providing tens of thousands of answers to MCQs designed to investigate the effect of the number of distractors and the use of NOTA and AOTA options in questions. The main findings of the study were that the probability of answering a question correctly was highest when a AOTA option was used as a distractor and when NOTA and AOTA were not used in questions. The probability of answering a question correctly decreased with the number of distractors.

【3】 SNIP: An Adaptation of Sorted Neighborhood Methods for Deduplicating Pedigree Data 标题：SNIP：家谱数据去重排序邻域方法的改进链接：https://arxiv.org/abs/2108.08773

作者：Theodore Huang,Matthew Ploenzke,Danielle Braun 机构：Department of Biostatistics, Harvard T.H. Chan School of Public Health, Department of Data Science, Dana-Farber Cancer Institute 备注：39 pages, 22 figures (including supplementary materials) 摘要：系谱数据包含用于分析遗传性疾病的家族史信息。这些临床数据集可能包含重复记录，因为同一家庭多次访问诊所，或者临床医生出于测试目的输入多个版本的家庭。从数据中得出的推论或在不删除重复项的情况下将其用于训练或验证可能会导致无效结论，因此识别重复项至关重要。由于族结构可能很复杂，现有的重复数据消除算法无法直接应用。我们首先通过研究谱系重复对家族风险预测模型的训练和验证的影响来激发重复数据消除的重要性。然后，我们引入了一种无监督算法，我们称之为SNIP（谱系的排序邻域），该算法建立在排序邻域方法的基础上，通过利用谱系固有的层次性有效地查找和分类成对比较。我们进行了模拟研究，以评估算法的性能，并找到算法能够准确检测重复项的参数配置。然后，我们将该方法应用于风险服务中心（Risk Service）的数据，其中包括超过300000个遗传性癌症高风险家族，并发现了大量潜在的重复家族。删除104520个谱系（原始数据的33%）后，生成的风险服务数据集现在可用于未来的分析、训练和验证。该算法作为R包snipR提供，网址为https://github.com/bayesmendel/snipR. 摘要：Pedigree data contain family history information that is used to analyze hereditary diseases. These clinical data sets may contain duplicate records due to the same family visiting a clinic multiple times or a clinician entering multiple versions of the family for testing purposes. Inferences drawn from the data or using them for training or validation without removing the duplicates could lead to invalid conclusions, and hence identifying the duplicates is essential. Since family structures can be complex, existing deduplication algorithms cannot be applied directly. We first motivate the importance of deduplication by examining the impact of pedigree duplicates on the training and validation of a familial risk prediction model. We then introduce an unsupervised algorithm, which we call SNIP (Sorted NeIghborhood for Pedigrees), that builds on the sorted neighborhood method to efficiently find and classify pairwise comparisons by leveraging the inherent hierarchical nature of the pedigrees. We conduct a simulation study to assess the performance of the algorithm and find parameter configurations where the algorithm is able to accurately detect the duplicates. We then apply the method to data from the Risk Service, which includes over 300,000 pedigrees at high risk of hereditary cancers, and uncover large clusters of potential duplicate families. After removing 104,520 pedigrees (33% of original data), the resulting Risk Service dataset can now be used for future analysis, training, and validation. The algorithm is available as an R package snipR available at https://github.com/bayesmendel/snipR.

【4】 Combining Real-World and Randomized Control Trial Data Using Data-Adaptive Weighting via the On-Trial Score 标题：通过试验得分使用数据自适应加权将真实对照试验数据与随机对照试验数据相结合链接：https://arxiv.org/abs/2108.08756

作者：Joanna Harton,Brian Segal,Ronac Mamtani,Nandita Mitra,Rebecca Hubbard 备注：Presented at JSM 2020, ASA Biopharmaceutical Section Regulatory-Industry Statistics Workshop 2020, submitted to Pharmaceutical Statistics on 8/17/21 摘要：混合对照组的临床试验（由随机患者和标准临床实践中接受常规护理的患者的真实数据组成的对照组）有可能降低随机试验的成本，同时增加获得新疗法的试验患者的比例。然而，由于严格的试验纳入标准以及试验和社区实践之间的护理和数据质量差异，试验患者与现实世界中的患者相比可能有系统性的不同结果。我们提出了一种使用混合控制臂分析试验的新方法，该控制臂可有效控制偏倚和I型错误。根据我们提出的方法，选择真实世界的患者通过“试验得分”的函数进行加权，这反映了他们和试验患者的相似性。与之前开发的将相同权重分配给所有真实世界患者的混合控制设计不同，我们的方法增加了真实世界患者的权重，这些患者更接近随机对照患者，而不同的患者则被打折。通过Cox比例风险模型对治疗效果进行估计。我们通过模拟将我们的方法与现有方法进行比较，并将这些方法应用到使用电子健康记录数据的研究中。我们提出的方法能够控制I型误差，最小化偏差，并减少方差，相比之下，在几乎所有检查的场景中仅使用试验数据。因此，我们的新方法可以在进行临床试验时使用，通过使用EHR中的加权患者增加护理标准，从而在不引起偏倚的情况下增加功率。摘要：Clinical trials with a hybrid control arm (a control arm constructed from a combination of randomized patients and real-world data on patients receiving usual care in standard clinical practice) have the potential to decrease the cost of randomized trials while increasing the proportion of trial patients given access to novel therapeutics. However, due to stringent trial inclusion criteria and differences in care and data quality between trials and community practice, trial patients may have systematically different outcomes compared to their real-world counterparts. We propose a new method for analyses of trials with a hybrid control arm that efficiently controls bias and type I error. Under our proposed approach, selected real-world patients are weighted by a function of the "on-trial score," which reflects their similarity to trial patients. In contrast to previously developed hybrid control designs that assign the same weight to all real-world patients, our approach upweights of real-world patients who more closely resemble randomized control patients while dissimilar patients are discounted. Estimates of the treatment effect are obtained via Cox proportional hazards models. We compare our approach to existing approaches via simulations and apply these methods to a study using electronic health record data. Our proposed method is able to control type I error, minimize bias, and decrease variance when compared to using only trial data in nearly all scenarios examined. Therefore, our new approach can be used when conducting clinical trials by augmenting the standard-of-care arm with weighted patients from the EHR to increase power without inducing bias.

【5】 A Framework for an Assessment of the Kernel-target Alignment in Tree Ensemble Kernel Learning 标题：树集成核学习中的核-目标对齐评估框架链接：https://arxiv.org/abs/2108.08752

作者：Dai Feng,Richard Baumgartner 机构：Data and Statistical Sciences, AbbVie Inc., North Chicago, IL, United States of America, Biometrics Research, Merck & Co., Inc., Kenilworth, NJ, United States of America 摘要：当用于内核学习时，由树集合（如随机森林（RF）或梯度增强树（GBT））生成的内核已被证明与各自的树集合（特别是在高维场景中）具有竞争力。另一方面，研究还表明，核算法的性能取决于核目标对齐的程度。然而，基于树集合的核学习的核-目标对齐还没有被研究，填补这一空白是我们工作的主要目标。利用核矩阵的特征分析，我们证明了对于连续目标，基于树的核学习的良好性能与强核目标对齐有关。此外，我们还表明，性能良好的基于树系综的核具有强目标对齐成分的特征，这些成分通过核矩阵的特征向量与目标之间的标量积表示。这表明，当基于树集成的核学习成功时，有监督问题的相关信息集中在目标对齐组件跨越的低维流形附近。通过landmark学习的敏感性分析进一步支持基于树集成的内核中强目标对齐组件的持久性。除了全面的模拟研究外，我们还提供了与模拟一致的几个真实数据集的实验结果。摘要：Kernels ensuing from tree ensembles such as random forest (RF) or gradient boosted trees (GBT), when used for kernel learning, have been shown to be competitive to their respective tree ensembles (particularly in higher dimensional scenarios). On the other hand, it has been also shown that performance of the kernel algorithms depends on the degree of the kernel-target alignment. However, the kernel-target alignment for kernel learning based on the tree ensembles has not been investigated and filling this gap is the main goal of our work. Using the eigenanalysis of the kernel matrix, we demonstrate that for continuous targets good performance of the tree-based kernel learning is associated with strong kernel-target alignment. Moreover, we show that well performing tree ensemble based kernels are characterized by strong target aligned components that are expressed through scalar products between the eigenvectors of the kernel matrix and the target. This suggests that when tree ensemble based kernel learning is successful, relevant information for the supervised problem is concentrated near lower dimensional manifold spanned by the target aligned components. Persistence of the strong target aligned components in tree ensemble based kernels is further supported by sensitivity analysis via landmark learning. In addition to a comprehensive simulation study, we also provide experimental results from several real life data sets that are in line with the simulations.

【6】 Clustering dynamics on graphs: from spectral clustering to mean shift through Fokker-Planck interpolation 标题：图的聚类动力学：从谱聚类到基于Fokker-Planck插值的均值漂移链接：https://arxiv.org/abs/2108.08687

作者：Katy Craig,Nicolás García Trillos,Dejan Slepčev 摘要：在这项工作中，我们建立了一个统一的框架，在数据聚类的密度驱动算法和基于几何的算法之间进行插值，特别是在离散和连续水平上将均值漂移算法与谱聚类相连接。我们通过在数据图上引入福克-普朗克方程来寻求这种联系。除了在图上引入新形式的均值漂移算法外，我们还提供了在大样本限制下扩散映射族行为的新理论见解，以及在固定图上扩散映射和均值漂移动力学之间提供了新的联系。几个数值例子说明了我们的理论发现，并强调了插值密度驱动和基于几何的聚类算法的好处。摘要：In this work we build a unifying framework to interpolate between density-driven and geometry-based algorithms for data clustering, and specifically, to connect the mean shift algorithm with spectral clustering at discrete and continuum levels. We seek this connection through the introduction of Fokker-Planck equations on data graphs. Besides introducing new forms of mean shift algorithms on graphs, we provide new theoretical insights on the behavior of the family of diffusion maps in the large sample limit as well as provide new connections between diffusion maps and mean shift dynamics on a fixed graph. Several numerical examples illustrate our theoretical findings and highlight the benefits of interpolating density-driven and geometry-based clustering algorithms.

【7】 Item Response Theory -- A Statistical Framework for Educational and Psychological Measurement 标题：项目反应理论--教育和心理测量的统计框架链接：https://arxiv.org/abs/2108.08604

作者：Yunxiao Chen,Xiaoou Li,Jingchen Liu,Zhiliang Ying 机构： London School of Economics and Political ScienceXiaoou Li, University of MinnesotaJingchen Liu and Zhiliang Ying, Columbia UniversityAbstractItem response theory (IRT) has become one of the most popular statistical mod-els for psychometrics 摘要：项目反应理论（IRT）已成为心理测量学中最流行的统计模型之一，这是一个涉及心理测量理论和技术的研究领域。IRT模型是一种潜在因素模型，专门用于分析、解释和预测个体在回答一组通常涉及分类反应数据的测量项目时的行为。许多重要的测量问题都是通过使用IRT模型直接或间接回答的，包括对个人的测试成绩评分、验证测试量表、连接两个测试等。本文回顾了项目反应理论，包括其统计框架和心理测量学应用。我们建立了项目反应理论与统计学相关主题之间的联系，包括经验贝叶斯、非参数方法、矩阵完备、正则化估计和序列分析。从统计学习的角度讨论了IRT未来可能的发展方向。摘要：Item response theory (IRT) has become one of the most popular statistical models for psychometrics, a field of study concerned with the theory and techniques of psychological measurement. The IRT models are latent factor models tailored to the analysis, interpretation, and prediction of individuals' behaviors in answering a set of measurement items that typically involve categorical response data. Many important questions of measurement are directly or indirectly answered through the use of IRT models, including scoring individuals' test performances, validating a test scale, linking two tests, among others. This paper provides a review of item response theory, including its statistical framework and psychometric applications. We establish connections between item response theory and related topics in statistics, including empirical Bayes, nonparametric methods, matrix completion, regularized estimation, and sequential analysis. Possible future directions of IRT are discussed from the perspective of statistical learning.

【8】 Bayesian sample size determination for diagnostic accuracy studies 标题：诊断准确性研究中贝叶斯样本量的确定链接：https://arxiv.org/abs/2108.08594

作者：Kevin J. Wilson,S. Faye Williamson,A. Joy Allen,Cameron J. Williams,Thomas P. Hellyer,B. Clare Lendrem 机构：School of Mathematics, Statistics and Physics, Newcastle, Biostatistics Research Group, Population Health Sciences, Institute, Newcastle University, U.K., NIHR Newcastle In Vitro Diagnostics Co-operative, Newcastle 摘要：理想情况下，新诊断测试的开发遵循一系列阶段，其中包括评估技术性能。这包括分析有效性研究、诊断准确性研究和介入性临床效用研究。目前设计和分析诊断准确度研究的方法可能会受到样本量过大和区间估计不理想的影响。在本文中，我们提出了一种新的贝叶斯方法，它利用了分析有效性阶段的可用信息。我们利用保证，根据后验概率区间的目标宽度计算所需样本量，并可在随后推断测试准确度度量时，选择使用或忽略分析有效性研究中的数据。进行敏感性分析以评估建议样本量对先验选择的稳健性，并通过将数据与先验预测分布进行比较来评估先验数据冲突。我们使用一个激励性的实际应用程序来说明所提出的方法，该应用程序涉及呼吸机相关肺炎的诊断测试。最后，我们将所提出的方法与常用的方法进行了比较。结果表明，通过更好地利用早期研究的现有数据，与替代方法相比，基于保证的方法不仅可以减少所需的样本量，而且还可以为诊断准确性研究提供更可靠的样本量。摘要：The development of a new diagnostic test ideally follows a sequence of stages which, amongst other aims, evaluate technical performance. This includes an analytical validity study, a diagnostic accuracy study and an interventional clinical utility study. Current approaches to the design and analysis of the diagnostic accuracy study can suffer from prohibitively large sample sizes and interval estimates with undesirable properties. In this paper, we propose a novel Bayesian approach which takes advantage of information available from the analytical validity stage. We utilise assurance to calculate the required sample size based on the target width of a posterior probability interval and can choose to use or disregard the data from the analytical validity study when subsequently inferring measures of test accuracy. Sensitivity analyses are performed to assess the robustness of the proposed sample size to the choice of prior, and prior-data conflict is evaluated by comparing the data to the prior predictive distributions. We illustrate the proposed approach using a motivating real-life application involving a diagnostic test for ventilator associated pneumonia. Finally, we compare the properties of the proposed approach against commonly used alternatives. The results show that by making better use of existing data from earlier studies, the assurance-based approach can not only reduce the required sample size when compared to alternatives, but can also produce more reliable sample sizes for diagnostic accuracy studies.

【9】 Empirical process theory for nonsmooth functions under functional dependence 标题：函数相依下非光滑函数的经验过程理论链接：https://arxiv.org/abs/2108.08512

作者：Nathawut Phandoidaen,Stefan Richter 机构：Institut f¨ur angewandte Mathematik, Im Neuenheimer Feld , Universit¨at Heidelberg 备注：arXiv admin note: substantial text overlap with arXiv:2007.05737 摘要：我们提供了非光滑函数类上局部平稳过程的经验过程理论。与其他方法相比，一个重要的新颖之处是使用灵活的函数依赖性度量来量化依赖性。给出了一个泛函中心极限定理和非辛极大不等式。该理论被用来证明经验分布函数（EDF）的函数收敛性，并得到平稳和局部平稳过程的核密度估计的一致收敛速度。与基于其他依赖性度量的早期结果进行了比较。摘要：We provide an empirical process theory for locally stationary processes over nonsmooth function classes. An important novelty over other approaches is the use of the flexible functional dependence measure to quantify dependence. A functional central limit theorem and nonasymptotic maximal inequalities are provided. The theory is used to prove the functional convergence of the empirical distribution function (EDF) and to derive uniform convergence rates for kernel density estimators both for stationary and locally stationary processes. A comparison with earlier results based on other measures of dependence is carried out.

【10】 Seven Principles for Rapid-Response Data Science: Lessons Learned from Covid-19 Forecasting 标题：快速反应数据科学的七项原则：从冠状病毒预测中吸取的教训链接：https://arxiv.org/abs/2108.08445

作者：Bin Yu,Chandan Singh 机构：Statistics Department, University of California, Berkeley, EECS Department, University of California, Berkeley 备注：5 pages, submission for special issue of "Statistical Science" on COVID-19 Response 摘要：在本文中，我们后退一步，从2020年春季的经验中提炼出七条原则，当时我们的12人快速反应团队使用数据科学和其他技术帮助分发新冠肺炎个人防护设备。这一过程包括利用流行病学和医疗物流链的领域知识，管理相关数据存储库，开发美国短期县级死亡预测模型，以及建立共享可视化（自动化人工智能机器）的网站。这些原则是在与Response4Life合作的背景下描述的，Response4Life当时是一个新的非营利组织，以说明它们的必要性。这些原则中有许多与标准数据科学团队的原则重叠，但重点放在处理需要快速响应的问题上，通常类似于敏捷软件开发。摘要：In this article, we take a step back to distill seven principles out of our experience in the spring of 2020, when our 12-person rapid-response team used skills of data science and beyond to help distribute Covid PPE. This process included tapping into domain knowledge of epidemiology and medical logistics chains, curating a relevant data repository, developing models for short-term county-level death forecasting in the US, and building a website for sharing visualization (an automated AI machine). The principles are described in the context of working with Response4Life, a then-new nonprofit organization, to illustrate their necessity. Many of these principles overlap with those in standard data-science teams, but an emphasis is put on dealing with problems that require rapid response, often resembling agile software development.

【11】 Bayesian Semiparametric Hidden Markov Tensor Partition Models for Longitudinal Data with Local Variable Selection 标题：具有局部变量选择的纵向数据贝叶斯半参数隐马尔可夫张量划分模型链接：https://arxiv.org/abs/2108.08439

作者：Giorgio Paulon,Peter Müller,Abhra Sarkar 机构：Department of Statistics and Data Sciences, The University of Texas at Austin, Speedway D, Austin, TX ,-, USA, Department of Mathematics, Speedway C, Austin, TX ,-, USA 摘要：我们提出了一个灵活的贝叶斯半参数混合模型，用于存在潜在高维类别协变量的纵向数据分析。基于一种新的隐马尔可夫张量分解技术，我们提出的方法允许固定效应分量在不同时间点的协变量空间的相关随机划分之间变化。该机制不仅允许在不同的时间点将不同的协变量集包含在模型中，而且允许所选预测因子的影响随时间灵活变化。平滑的时变加性随机效应用于捕获特定于受试者的异质性。我们建立了函数估计和变量选择的后验收敛保证。我们设计了一个马尔可夫链蒙特卡罗算法进行后验计算。我们通过综合实验评估了该方法的经验性能，并通过实际应用证明了其实用性。摘要：We present a flexible Bayesian semiparametric mixed model for longitudinal data analysis in the presence of potentially high-dimensional categorical covariates. Building on a novel hidden Markov tensor decomposition technique, our proposed method allows the fixed effects components to vary between dependent random partitions of the covariate space at different time points. The mechanism not only allows different sets of covariates to be included in the model at different time points but also allows the selected predictors' influences to vary flexibly over time. Smooth time-varying additive random effects are used to capture subject specific heterogeneity. We establish posterior convergence guarantees for both function estimation and variable selection. We design a Markov chain Monte Carlo algorithm for posterior computation. We evaluate the method's empirical performances through synthetic experiments and demonstrate its practical utility through real world applications.

【12】 Estimating the natural indirect effect and the mediation proportion via the product method 标题：用乘积法估算自然间接效应和中介比例链接：https://arxiv.org/abs/2108.08417

作者：Chao Cheng,Donna Spiegelman,Fan Li 机构：Estimating the natural indirect effect, and the mediation proportion via the, product method 摘要：自然间接效应（NIE）和调解比例（MP）是调解分析中两个主要关注的指标。估计NIE和MP的标准方法是通过乘积法，该方法包括一个以中介体和暴露为条件的结果模型和另一个描述暴露-中介体关系的模型。本文的目的是通过乘积方法全面发展和研究NIE和MP估计量的有限样本性能。对于四种常见的数据类型，我们通过估计方程理论和多元delta方法提出了闭式区间估计，并评估了其相对于bootstrap方法的经验性能。此外，我们还观察到，罕见结果假设经常被用来用二元结果近似NIE和MP，尽管这种近似可能会在结果常见时导致不可忽略的偏差。因此，我们在没有稀有结果假设的情况下，引入了具有二元结果的NIE和MP的精确表达式，并将其性能与近似估计进行了比较。在这些理论发展和实证研究的基础上，我们提出了一些实际建议，以指导实践。开发了一个R包mediateP来实现本文讨论的点和方差估计方法。摘要：The natural indirect effect (NIE) and mediation proportion (MP) are two measures of primary interest in mediation analysis. The standard approach for estimating NIE and MP is through the product method, which involves a model for the outcome conditional on the mediator and exposure and another model describing the exposure-mediator relationship. The purpose of this article is to comprehensively develop and investigate the finite-sample performance of NIE and MP estimators via the product method. With four common data types, we propose closed-form interval estimators via the theory of estimating equations and multivariate delta method, and evaluate its empirical performance relative to the bootstrap approach. In addition, we have observed that the rare outcome assumption is frequently invoked to approximate the NIE and MP with a binary outcome, although this approximation may lead to non-negligible bias when the outcome is common. We therefore introduce the exact expressions for NIE and MP with a binary outcome without the rare outcome assumption and compare its performance with the approximate estimators. Based upon these theoretical developments and empirical studies, we offer several practical recommendations to inform practice. An R package mediateP is developed to implement the methods for point and variance estimation discussed in this paper.

【13】 Transfer learning of individualized treatment rules from experimental to real-world data 标题：个体化治疗规则从实验数据到真实世界数据的迁移学习链接：https://arxiv.org/abs/2108.08415

作者：Lili Wu,Shu Yang 机构：Department of Statistics, North Carolina State University 摘要：个体化治疗效果是精确医学的核心。可解释的个体化治疗规则（ITR）因其直观的吸引力和透明度而受到临床医生或决策者的青睐。估计ITRs的金标准方法是随机实验，将受试者随机分为不同的治疗组，并尽可能减少偏差。然而，由于实验数据的选择限制，其外部有效性受到限制，因此不能代表目标现实世界人口。传统的基于实验数据的目标群体最优可解释ITR学习方法存在偏差。另一方面，真实世界数据（RWD）正变得越来越流行，并提供了具有代表性的人口样本。为了学习广义最优可解释ITR，我们提出了一种基于加权方案的综合迁移学习方法，将实验的协变量分布校正为RWD的协变量分布。我们证明了所提出的ITR估计具有风险一致性的理论保证。我们通过模拟评估了基于有限样本性能的迁移学习者，并将其应用于工作训练计划的实际数据应用中。摘要：Individualized treatment effect lies at the heart of precision medicine. Interpretable individualized treatment rules (ITRs) are desirable for clinicians or policymakers due to their intuitive appeal and transparency. The gold-standard approach to estimating the ITRs is randomized experiments, where subjects are randomized to different treatment groups and the bias is minimized to the extent possible. However, experimental data are limited in external validity because of their selection restrictions and therefore are not representative of the target real-world population. Conventional learning methods of optimal interpretable ITRs for a target population based only on experimental data are biased. On the other hand, real-world data (RWD) are becoming popular and provide a representative sample of the population. To learn the generalizable optimal interpretable ITRs, we propose an integrative transfer learning method based on weighting schemes to calibrate the covariate distribution of the experiment to that of the RWD. We show that the proposed ITR estimator has a theoretical guarantee of the risk consistency. We evaluate the transfer learner based on the finite-sample performance through simulation and apply it to a real data application of a job training program.

【14】 Learning Equilibria in Matching Markets from Bandit Feedback 标题：从Bandit反馈学习匹配市场中的均衡链接：https://arxiv.org/abs/2108.08843

作者：Meena Jagadeesan,Alexander Wei,Yixin Wang,Michael I. Jordan,Jacob Steinhardt 机构：UC Berkeley, EECS and Statistics, UC Berkeley, Statistics 摘要：大规模双边匹配平台必须找到符合用户偏好的市场结果，同时从数据中学习这些偏好。然而，由于偏好在学习过程中固有的不确定性，稳定性的经典概念（Gale和Shapley，1962；Shapley和Shubik，1971）在这些环境中是无法实现的。为了弥补这一差距，我们开发了一个框架和算法，用于在不确定性条件下学习稳定的市场结果。我们的主要设置是与可转移的实用程序相匹配，平台既匹配代理，又设置代理之间的货币转移。我们设计了一个具有激励意识的学习目标，以捕捉市场结果与均衡的距离。利用这个目标，我们分析了学习的复杂性作为偏好结构的函数，将学习归结为一个随机的多臂强盗问题。在算法上，我们证明了“面对不确定性时的乐观主义”（许多bandit算法的基本原理）适用于与传输匹配的原始-对偶公式，并导致接近最优的遗憾边界。我们的工作为阐明大型数据驱动市场中何时以及如何出现稳定匹配迈出了第一步。摘要：Large-scale, two-sided matching platforms must find market outcomes that align with user preferences while simultaneously learning these preferences from data. However, since preferences are inherently uncertain during learning, the classical notion of stability (Gale and Shapley, 1962; Shapley and Shubik, 1971) is unattainable in these settings. To bridge this gap, we develop a framework and algorithms for learning stable market outcomes under uncertainty. Our primary setting is matching with transferable utilities, where the platform both matches agents and sets monetary transfers between them. We design an incentive-aware learning objective that captures the distance of a market outcome from equilibrium. Using this objective, we analyze the complexity of learning as a function of preference structure, casting learning as a stochastic multi-armed bandit problem. Algorithmically, we show that "optimism in the face of uncertainty," the principle underlying many bandit algorithms, applies to a primal-dual formulation of matching with transfers and leads to near-optimal regret bounds. Our work takes a first step toward elucidating when and how stable matchings arise in large, data-driven marketplaces.

【15】 Do Vision Transformers See Like Convolutional Neural Networks? 标题：视觉Transformer看起来像卷积神经网络吗？链接：https://arxiv.org/abs/2108.08810

作者：Maithra Raghu,Thomas Unterthiner,Simon Kornblith,Chiyuan Zhang,Alexey Dosovitskiy 机构：Dosovitskiy, Google Research, Brain Team 摘要：迄今为止，卷积神经网络（CNN）已成为视觉数据的事实模型。最近的工作表明，（视觉）变换器模型（ViT）可以在图像分类任务上实现相当甚至更高的性能。这就提出了一个中心问题：视觉转换器是如何解决这些任务的？他们是像卷积网络一样工作，还是学习完全不同的视觉表现？通过分析ViT和CNN在图像分类基准上的内部表示结构，我们发现这两种体系结构之间存在显著差异，例如ViT在所有层上都有更统一的表示。我们探索了这些差异是如何产生的，发现了自我注意所起的关键作用，自我注意使全局信息得以早期聚合，以及ViT残余连接，它强烈地将特征从较低层传播到较高层。我们研究了空间定位的影响，证明VIT成功地保留了输入的空间信息，不同分类方法的效果显著。最后，我们研究了（预训练）数据集规模对中间特征和迁移学习的影响，最后讨论了与新体系结构（如MLP混合器）的连接。摘要：Convolutional neural networks (CNNs) have so far been the de-facto model for visual data. Recent work has shown that (Vision) Transformer models (ViT) can achieve comparable or even superior performance on image classification tasks. This raises a central question: how are Vision Transformers solving these tasks? Are they acting like convolutional networks, or learning entirely different visual representations? Analyzing the internal representation structure of ViTs and CNNs on image classification benchmarks, we find striking differences between the two architectures, such as ViT having more uniform representations across all layers. We explore how these differences arise, finding crucial roles played by self-attention, which enables early aggregation of global information, and ViT residual connections, which strongly propagate features from lower to higher layers. We study the ramifications for spatial localization, demonstrating ViTs successfully preserve input spatial information, with noticeable effects from different classification methods. Finally, we study the effect of (pretraining) dataset scale on intermediate features and transfer learning, and conclude with a discussion on connections to new architectures such as the MLP-Mixer.

【16】 Threshold Phenomena in Learning Halfspaces with Massart Noise 标题：带有Massart噪声的半空间学习中的阈值现象链接：https://arxiv.org/abs/2108.08767

作者：Ilias Diakonikolas,Daniel M. Kane,Vasilis Kontonis,Christos Tzamos,Nikos Zarifis 机构：UW Madison, UC San-Diego 摘要：研究了高斯边缘条件下带马萨特噪声的$\mathbb{R}^d$上PAC学习半空间的问题。在Massart噪声模型中，对于[0,1/2]$中的某些参数$\eta，允许对手以$\eta（\mathbf{x}）\leq\eta$的概率翻转每个点$\mathbf{x}$的标签。学习者的目标是输出分类错误为$\mathrm{opt}+\epsilon$的假设，其中$\mathrm{opt}$是目标半空间的错误。之前的工作研究了这个问题，假设目标半空间是齐次的，并且参数$\eta$严格小于$1/2$。我们探索了当这些假设中的任何一个被移除时，问题的复杂性是如何变化的，建立了以下阈值现象：对于$\eta=1/2$，我们证明了$d^{\Omega（\log（1/\epsilon））}$的下界与问题的任何统计查询（SQ）算法的复杂性有关，即使对于齐次半空间也是如此。从积极的方面来看，我们给出了一个新的学习算法，该算法具有样本复杂度和运行时间$O\u0\epsilon（1）\，d^{O（\log（1/\epsilon））}$。对于$\eta<1/2$，我们在问题的SQ复杂性上建立了$d^{\Omega（\log（1/\gamma））}$的下界，其中$\gamma=\max\{\epsilon、\min\{\mathbf{Pr}[f（\mathbf{x}）=1]、\mathbf{Pr f}[f（\mathbf{x}）=-1]}$和$f$是目标半空间。特别是，这意味着学习任意Massart半空间（即使是小常量$\eta$）的SQ下界为$d^{\Omega（\log（1/\epsilon））}$。我们用一个新的学习算法来补充这个下界，该算法具有样本复杂性和运行时$d^{O_{\eta}（\log（1/\gamma））}\mathrm{poly}（1/\epsilon）$。总之，我们的结果定性地描述了Massart模型中学习半空间的复杂性。摘要：We study the problem of PAC learning halfspaces on $\mathbb{R}^d$ with Massart noise under Gaussian marginals. In the Massart noise model, an adversary is allowed to flip the label of each point $\mathbf{x}$ with probability $\eta(\mathbf{x}) \leq \eta$, for some parameter $\eta \in [0,1/2]$. The goal of the learner is to output a hypothesis with missclassification error $\mathrm{opt} + \epsilon$, where $\mathrm{opt}$ is the error of the target halfspace. Prior work studied this problem assuming that the target halfspace is homogeneous and that the parameter $\eta$ is strictly smaller than $1/2$. We explore how the complexity of the problem changes when either of these assumptions is removed, establishing the following threshold phenomena: For $\eta = 1/2$, we prove a lower bound of $d^{\Omega (\log(1/\epsilon))}$ on the complexity of any Statistical Query (SQ) algorithm for the problem, which holds even for homogeneous halfspaces. On the positive side, we give a new learning algorithm for arbitrary halfspaces in this regime with sample complexity and running time $O_\epsilon(1) \, d^{O(\log(1/\epsilon))}$. For $\eta <1/2$, we establish a lower bound of $d^{\Omega(\log(1/\gamma))}$ on the SQ complexity of the problem, where $\gamma = \max\{\epsilon, \min\{\mathbf{Pr}[f(\mathbf{x}) = 1], \mathbf{Pr}[f(\mathbf{x}) = -1]\} \}$ and $f$ is the target halfspace. In particular, this implies an SQ lower bound of $d^{\Omega (\log(1/\epsilon) )}$ for learning arbitrary Massart halfspaces (even for small constant $\eta$). We complement this lower bound with a new learning algorithm for this regime with sample complexity and runtime $d^{O_{\eta}(\log(1/\gamma))} \mathrm{poly}(1/\epsilon)$. Taken together, our results qualitatively characterize the complexity of learning halfspaces in the Massart model.

【17】 Provably Efficient Generative Adversarial Imitation Learning for Online and Offline Setting with Linear Function Approximation 标题：基于线性函数逼近的在线和离线环境下可证明有效的生成性对抗性模仿学习链接：https://arxiv.org/abs/2108.08765

作者：Zhihan Liu,Yufeng Zhang,Zuyue Fu,Zhuoran Yang,Zhaoran Wang 备注：54 pages, in submission 摘要：在生成性对抗性模仿学习（GAIL）中，agent的目标是从专家演示中学习策略，以便在某个预定义的奖励集上不能将其性能与专家策略区分开来。在本文中，我们使用线性函数近似研究了在线和离线环境下的GAIL，其中特征映射中的转移函数和奖励函数都是线性的。除了专家演示之外，在联机设置中，代理可以与环境交互，而在脱机设置中，代理仅访问先前用户收集的附加数据集。对于在线GAIL，我们提出了一种乐观生成对抗策略优化算法（OGAP），并证明了OGAP实现了$\widetilde{\mathcal{O}（H^2d{3/2}K^{1/2}+KH^{3/2}dN u 1^{-1/2}）$遗憾。这里，$N_1$表示专家演示的轨迹数，$d$表示特征维度，$K$表示剧集数。对于离线GAIL，我们提出了一种悲观生成对抗策略优化算法（PGAP）。对于一个任意的附加数据集，我们得到了PGAP的最优性缺口，实现了附加数据集利用率的极小极大下界。假设在附加数据集上有足够的覆盖率，我们表明PGAP实现了$\widetilde{\mathcal{O}（H^{2}dK^{-1/2}+H^2d^{3/2}Nè2^{-1/2}+H^{3/2}dNè1^{-1/2}）$最优性缺口。此处$N_2$表示具有足够覆盖率的附加数据集的轨迹数。摘要：In generative adversarial imitation learning (GAIL), the agent aims to learn a policy from an expert demonstration so that its performance cannot be discriminated from the expert policy on a certain predefined reward set. In this paper, we study GAIL in both online and offline settings with linear function approximation, where both the transition and reward function are linear in the feature maps. Besides the expert demonstration, in the online setting the agent can interact with the environment, while in the offline setting the agent only accesses an additional dataset collected by a prior. For online GAIL, we propose an optimistic generative adversarial policy optimization algorithm (OGAP) and prove that OGAP achieves $\widetilde{\mathcal{O}}(H^2 d^{3/2}K^{1/2}+KH^{3/2}dN_1^{-1/2})$ regret. Here $N_1$ represents the number of trajectories of the expert demonstration, $d$ is the feature dimension, and $K$ is the number of episodes. For offline GAIL, we propose a pessimistic generative adversarial policy optimization algorithm (PGAP). For an arbitrary additional dataset, we obtain the optimality gap of PGAP, achieving the minimax lower bound in the utilization of the additional dataset. Assuming sufficient coverage on the additional dataset, we show that PGAP achieves $\widetilde{\mathcal{O}}(H^{2}dK^{-1/2} +H^2d^{3/2}N_2^{-1/2}+H^{3/2}dN_1^{-1/2} \ )$ optimality gap. Here $N_2$ represents the number of trajectories of the additional dataset with sufficient coverage.

【18】 Parallel Quasi-concave set optimization: A new frontier that scales without needing submodularity 标题：并行准凹集优化：无子模块扩展的新前沿链接：https://arxiv.org/abs/2108.08758

作者：Praneeth Vepakomma,Yulia Kempner,Ramesh Raskar 备注：SubSetML: Subset Selection in Machine Learning: From Theory to Practice 摘要：集合函数的类别以及地面集合的选择是确定和开发贪婪算法的相应变体以获得组合优化问题有效解的基础。近似约束子模优化在计算效率、通用性和近似保证方面取得了巨大进展，而无约束子模优化的精确解是NP难的。当子模块性不成立时，有什么替代方案？能得到有效的全局精确解吗？我们引入了这样一个新的前沿：拟凹集函数类作为单调连接函数的对偶类。我们提供了一个时间复杂度超过$n$处理器的并行算法$\mathcal{O}（n^2g）+\mathcal{O}（\log{\log{n}）$，其中$n$是基集的基数，$g$是计算单调连接函数的复杂度，该函数通过对偶性导出相应的拟凹集函数。在$n^2$处理器上，复杂性降低到$\mathcal{O}（gn\log（n））$，在$n^3$处理器上，复杂性降低到$\mathcal{O}（gn）$。我们的算法提供了最大最小问题的全局最优解，而不是近似的子模优化。通过一个具有精确全局最大最小保证的多样特征子集选择示例，我们展示了一种广泛应用的潜力，该示例显示了一种称为距离相关性的统计相关性度量可用于诱导准凹集函数。摘要：Classes of set functions along with a choice of ground set are a bedrock to determine and develop corresponding variants of greedy algorithms to obtain efficient solutions for combinatorial optimization problems. The class of approximate constrained submodular optimization has seen huge advances at the intersection of good computational efficiency, versatility and approximation guarantees while exact solutions for unconstrained submodular optimization are NP-hard. What is an alternative to situations when submodularity does not hold? Can efficient and globally exact solutions be obtained? We introduce one such new frontier: The class of quasi-concave set functions induced as a dual class to monotone linkage functions. We provide a parallel algorithm with a time complexity over $n$ processors of $\mathcal{O}(n^2g) +\mathcal{O}(\log{\log{n}})$ where $n$ is the cardinality of the ground set and $g$ is the complexity to compute the monotone linkage function that induces a corresponding quasi-concave set function via a duality. The complexity reduces to $\mathcal{O}(gn\log(n))$ on $n^2$ processors and to $\mathcal{O}(gn)$ on $n^3$ processors. Our algorithm provides a globally optimal solution to a maxi-min problem as opposed to submodular optimization which is approximate. We show a potential for widespread applications via an example of diverse feature subset selection with exact global maxi-min guarantees upon showing that a statistical dependency measure called distance correlation can be used to induce a quasi-concave set function.

【19】 odeN: Simultaneous Approximation of Multiple Motif Counts in Large Temporal Networks 标题：ODEN：大型时态网络中多个基序计数的同时逼近链接：https://arxiv.org/abs/2108.08734

作者：Ilie Sarpe,Fabio Vandin 机构：Department of Information Engineering, University of Padova, Padova, Italy 备注：14 pages, 8 figures, accepted at CIKM 2021 摘要：计算小连通子图（称为时态基序）的出现次数已成为分析时态网络的基本原语，时态网络的边用它们所代表的事件的时间进行注释。研究时间基序的一个主要复杂因素是，即使顶点或边的数量有限，也可以构建大量的基序。因此，由于在许多应用中，模体用于探索性分析，用户需要迭代地选择和分析代表网络不同方面的几个模体，从而导致效率低下、耗时的过程。这一问题在大型网络中更为严重，在大型网络中，即使是对单个图案的分析也需要计算。作为解决方案，在这项工作中，我们提出并研究了同时计算多个时间基序出现次数的问题，所有这些时间基序都对应于相同的（静态）拓扑（例如三角形）。考虑到大型时态网络计算精确计数是不可行的，我们提出了一种基于采样的算法odeN，该算法提供了所有基序计数的精确近似值。我们提供了odeN计算严格、概率、相对近似所需样本数量的分析界限。我们广泛的实验评估表明，odeN能够在最新方法所需时间的一小部分内近似时间网络中的基序数，并且它还报告了比此类方法更精确的近似。摘要：Counting the number of occurrences of small connected subgraphs, called temporal motifs, has become a fundamental primitive for the analysis of temporal networks, whose edges are annotated with the time of the event they represent. One of the main complications in studying temporal motifs is the large number of motifs that can be built even with a limited number of vertices or edges. As a consequence, since in many applications motifs are employed for exploratory analyses, the user needs to iteratively select and analyze several motifs that represent different aspects of the network, resulting in an inefficient, time-consuming process. This problem is exacerbated in large networks, where the analysis of even a single motif is computationally demanding. As a solution, in this work we propose and study the problem of simultaneously counting the number of occurrences of multiple temporal motifs, all corresponding to the same (static) topology (e.g., a triangle). Given that for large temporal networks computing the exact counts is unfeasible, we propose odeN, a sampling-based algorithm that provides an accurate approximation of all the counts of the motifs. We provide analytical bounds on the number of samples required by odeN to compute rigorous, probabilistic, relative approximations. Our extensive experimental evaluation shows that odeN enables the approximation of the counts of motifs in temporal networks in a fraction of the time needed by state-of-the-art methods, and that it also reports more accurate approximations than such methods.

【20】 Teaching Uncertainty Quantification in Machine Learning through Use Cases 标题：通过用例讲授机器学习中的不确定性量化链接：https://arxiv.org/abs/2108.08712

作者：Matias Valdenegro-Toro 机构： mostlybeing present in advanced summer schools (like MLSS 备注：2nd Teaching in Machine Learning Workshop, Camera Ready, 5 pages, 3 figures 摘要：机器学习中的不确定性在机器学习课程中通常不作为一般知识教授。在本文中，我们为一门关于机器学习中的不确定性的课程提出了一个简短的课程，并选择了一些用例作为补充，旨在引发讨论，让学生在编程环境中玩不确定性的概念。我们的用例包括输出不确定性的概念、贝叶斯神经网络和权重分布、不确定性的来源以及分布外检测。我们期望本课程和一组用例能够激励社区将这些重要概念纳入AI安全课程中。摘要：Uncertainty in machine learning is not generally taught as general knowledge in Machine Learning course curricula. In this paper we propose a short curriculum for a course about uncertainty in machine learning, and complement the course with a selection of use cases, aimed to trigger discussion and let students play with the concepts of uncertainty in a programming setting. Our use cases cover the concept of output uncertainty, Bayesian neural networks and weight distributions, sources of uncertainty, and out of distribution detection. We expect that this curriculum and set of use cases motivates the community to adopt these important concepts into courses for safety in AI.

【21】 Neural density estimation and uncertainty quantification for laser induced breakdown spectroscopy spectra 标题：激光诱导击穿光谱的神经密度估计和不确定度量化链接：https://arxiv.org/abs/2108.08709

作者：Katiana Kontolati,Natalie Klein,Nishant Panda,Diane Oyen 机构：Johns Hopkins University, Baltimore, MD , Los Alamos National Laboratory, Los Alamos, NM 备注：5 pages, 3 figures 摘要：在高维光谱数据中构造用于推断的概率密度通常是困难的。在这项工作中，我们使用结构化谱潜空间上的归一化流来估计这种密度，从而实现下游推理任务。此外，我们评估了一种在预测与每个光谱相关的未观测状态向量时进行不确定性量化的方法。我们在火星漫游者好奇号上的ChemCam仪器收集的激光诱导击穿光谱数据上展示了这种方法的能力。使用我们的方法，我们能够生成真实的光谱样本，并准确预测具有相关校准不确定性的状态向量。我们预计，这种方法将能够对光谱数据进行有效的概率建模，从而在多个领域取得潜在进展，包括分布外检测和灵敏度分析。摘要：Constructing probability densities for inference in high-dimensional spectral data is often intractable. In this work, we use normalizing flows on structured spectral latent spaces to estimate such densities, enabling downstream inference tasks. In addition, we evaluate a method for uncertainty quantification when predicting unobserved state vectors associated with each spectrum. We demonstrate the capability of this approach on laser-induced breakdown spectroscopy data collected by the ChemCam instrument on the Mars rover Curiosity. Using our approach, we are able to generate realistic spectral samples and to accurately predict state vectors with associated well-calibrated uncertainties. We anticipate that this methodology will enable efficient probabilistic modeling of spectral data, leading to potential advances in several areas, including out-of-distribution detection and sensitivity analysis.

【22】 On Accelerating Distributed Convex Optimizations 标题：关于加速分布式凸优化的研究链接：https://arxiv.org/abs/2108.08670

作者：Kushal Chakrabarti,Nirupam Gupta,Nikhil Chopra 机构：Department of Electrical and Computer Engineering, University of Maryland, College Park, Maryland , U.S.A., École polytechnique fédérale de Lausanne (EPFL), CH-, Lausanne, Switzerland, Department of Mechanical Engineering 摘要：本文研究了一个分布式多智能体凸优化问题。在这个问题中，系统由多个代理组成，每个代理都有一组本地数据点和一个相关的本地成本函数。代理连接到服务器，并且没有代理间通信。代理的目标是学习一个参数向量，该参数向量在不暴露本地数据点的情况下优化本地成本的集合。原则上，代理可以通过使用传统的分布式梯度下降方法与服务器协作来解决此问题。然而，当总成本是病态的时，梯度下降法（i）需要大量迭代才能收敛，（ii）对过程噪声非常不稳定。我们提出了一种迭代预处理技术来减轻代价函数条件对分布式梯度下降收敛速度的不利影响。与传统的预处理技术不同，我们提出的技术中的预处理矩阵迭代更新，以便于在分布式网络上实现。在分布式环境下，我们证明了该算法的线性收敛性，与传统的和自适应的梯度下降方法相比，具有更好的收敛速度。此外，对于总成本最小值唯一的特殊情况，我们的算法超线性收敛。我们证明了我们的算法在解决实际logistic回归问题和通过带噪声的二次模型模拟神经网络训练方面，与著名的分布式算法相比，具有优越的性能，从而表明了该算法在分布式求解非凸优化问题方面的效率。此外，我们的经验表明，该算法在不影响泛化性能的情况下，训练速度更快。摘要：This paper studies a distributed multi-agent convex optimization problem. The system comprises multiple agents in this problem, each with a set of local data points and an associated local cost function. The agents are connected to a server, and there is no inter-agent communication. The agents' goal is to learn a parameter vector that optimizes the aggregate of their local costs without revealing their local data points. In principle, the agents can solve this problem by collaborating with the server using the traditional distributed gradient-descent method. However, when the aggregate cost is ill-conditioned, the gradient-descent method (i) requires a large number of iterations to converge, and (ii) is highly unstable against process noise. We propose an iterative pre-conditioning technique to mitigate the deleterious effects of the cost function's conditioning on the convergence rate of distributed gradient-descent. Unlike the conventional pre-conditioning techniques, the pre-conditioner matrix in our proposed technique updates iteratively to facilitate implementation on the distributed network. In the distributed setting, we provably show that the proposed algorithm converges linearly with an improved rate of convergence than the traditional and adaptive gradient-descent methods. Additionally, for the special case when the minimizer of the aggregate cost is unique, our algorithm converges superlinearly. We demonstrate our algorithm's superior performance compared to prominent distributed algorithms for solving real logistic regression problems and emulating neural network training via a noisy quadratic model, thereby signifying the proposed algorithm's efficiency for distributively solving non-convex optimization. Moreover, we empirically show that the proposed algorithm results in faster training without compromising the generalization performance.

【23】 Global Convergence of the ODE Limit for Online Actor-Critic Algorithms in Reinforcement Learning 标题：强化学习中在线Actor-Critic算法ODE极限的全局收敛性链接：https://arxiv.org/abs/2108.08655

作者：Ziheng Wang,Justin Sirignano 摘要：Actor-critic算法广泛应用于强化学习中，但由于非i.i.d.数据样本的在线到达，因此难以进行数学分析。数据样本的分布随着模型的更新而动态变化，在数据分布和强化学习算法之间引入了一个复杂的反馈回路。我们证明了在时间尺度下，随着更新次数的增加，具有表格参数化的在线演员-评论家算法收敛到常微分方程（ODE）。证明首先建立了固定参与者策略下数据样本的几何遍历性。然后，利用泊松方程，我们证明了动态概率测度（作为演化参与者模型的函数）周围的数据样本的波动随着更新次数的增加而消失。一旦导出了常微分方程极限，我们将使用两个时间尺度分析来研究其收敛性，该分析将临界常微分方程与演员常微分方程渐近解耦。证明了批评家对Bellman方程解的收敛性和行动者对最优策略的收敛性。此外，还建立了该全局极小值的收敛速度。我们的收敛性分析在演员-批评家算法的学习率和探索率的特定选择下成立，这可以为演员-批评家算法在实践中的实现提供指导。摘要：Actor-critic algorithms are widely used in reinforcement learning, but are challenging to mathematically analyze due to the online arrival of non-i.i.d. data samples. The distribution of the data samples dynamically changes as the model is updated, introducing a complex feedback loop between the data distribution and the reinforcement learning algorithm. We prove that, under a time rescaling, the online actor-critic algorithm with tabular parametrization converges to an ordinary differential equations (ODEs) as the number of updates becomes large. The proof first establishes the geometric ergodicity of the data samples under a fixed actor policy. Then, using a Poisson equation, we prove that the fluctuations of the data samples around a dynamic probability measure, which is a function of the evolving actor model, vanish as the number of updates become large. Once the ODE limit has been derived, we study its convergence properties using a two time-scale analysis which asymptotically de-couples the critic ODE from the actor ODE. The convergence of the critic to the solution of the Bellman equation and the actor to the optimal policy are proven. In addition, a convergence rate to this global minimum is also established. Our convergence analysis holds under specific choices for the learning rates and exploration rates in the actor-critic algorithm, which could provide guidance for the implementation of actor-critic algorithms in practice.

【24】 The Bootstrap for Dynamical Systems 标题：动力系统的Bootstrap 链接：https://arxiv.org/abs/2108.08461

作者：Kasun Fernando,Nan Zou 备注：51 pages, 4 figures 摘要：尽管动力系统具有确定性，但它们往往表现出看似随机的行为。因此，动力系统通常由概率模型表示，其未知参数必须使用统计方法估计。在测量此类参数估计的不确定度时，bootstrap是一种简单但功能强大的技术。在本文中，我们发展了动力系统的bootstrap，并通过一个新的动力系统的{textit{continuous}Edgeworth展开，不仅建立了它的一致性，而且建立了它的二阶效率。这是第一次研究这种连续的Edgeworth展开。此外，我们还通过计算机模拟验证了有关自举的理论结果。摘要：Despite their deterministic nature, dynamical systems often exhibit seemingly random behaviour. Consequently, a dynamical system is usually represented by a probabilistic model of which the unknown parameters must be estimated using statistical methods. When measuring the uncertainty of such parameter estimation, the bootstrap stands out as a simple but powerful technique. In this paper, we develop the bootstrap for dynamical systems and establish not only its consistency but also its second-order efficiency via a novel \textit{continuous} Edgeworth expansion for dynamical systems. This is the first time such continuous Edgeworth expansions have been studied. Moreover, we verify the theoretical results about the bootstrap using computer simulations.

本文参与腾讯云自媒体同步曝光计划，分享自微信公众号。

原始发表：2021-08-20，如有侵权请联系 cloudcommunity@tencent.com 删除

linux