前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
圈层
工具
发布
首页
学习
活动
专区
圈层
工具
社区首页 >专栏 >统计学学术速递[7.13]

统计学学术速递[7.13]

作者头像
公众号-arXiv每日学术速递
发布2021-07-27 10:49:42
发布2021-07-27 10:49:42
1K0
举报

访问www.arxivdaily.com获取含摘要速递,涵盖CS|物理|数学|经济|统计|金融|生物|电气领域,更有搜索、收藏、发帖等功能!点击阅读原文即可访问

stat统计学,共计54篇

【1】 Statistical Modeling for Practical Pooled Testing During the COVID-19 Pandemic 标题:冠状病毒大流行期间实用联合检测的统计建模

作者:Saskia Comess,Hannah Wang,Susan Holmes,Claire Donnat 机构: Susan Holmes is a Professor in the Department of Statisticsat Stanford University (e-mail 链接:https://arxiv.org/abs/2107.05619 摘要:联合检测提供了一个有效的解决方案,以满足COVID-19大流行前所未有的检测需求,尽管在某些环境下可能具有较低的敏感性和较高的实施成本。对这种权衡的评估通常假定汇集的样本是独立的并且分布相同。然而,在COVID-19的背景下,这些假设常常被违背:在网络(室友、配偶、同事)上进行的测试捕获了相关的个体,而感染风险在时间、地点和个体上有很大的差异。忽略依赖性和异质性可能会使已建立的最优性网格产生偏差,并导致程序的次优实现。作为从这次大流行中吸取的教训,本文强调了将现场抽样信息与统计建模相结合以有效优化混合检验的必要性。利用实际数据,我们表明:(a)通过利用样本之间的自然相关性(非独立性),可以以较低的物流成本获得更大的收益——使灵敏度和效率分别提高30%和90%;(b)尽管各池之间存在实质性的异质性(不完全相同),但这些收益是稳健的。我们的建模结果补充和扩展了Barak等人(2021年)的观察结果,他们报告了远远超出预期的经验敏感性。最后,我们提供了一个交互式工具,用于使用上下文信息选择最佳池大小 摘要:Pooled testing offers an efficient solution to the unprecedented testing demands of the COVID-19 pandemic, although with potentially lower sensitivity and increased costs to implementation in some settings. Assessments of this trade-off typically assume pooled specimens are independent and identically distributed. Yet, in the context of COVID-19, these assumptions are often violated: testing done on networks (housemates, spouses, co-workers) captures correlated individuals, while infection risk varies substantially across time, place and individuals. Neglecting dependencies and heterogeneity may bias established optimality grids and induce a sub-optimal implementation of the procedure. As a lesson learned from this pandemic, this paper highlights the necessity of integrating field sampling information with statistical modeling to efficiently optimize pooled testing. Using real data, we show that (a) greater gains can be achieved at low logistical cost by exploiting natural correlations (non-independence) between samples -- allowing improvements in sensitivity and efficiency of up to 30% and 90% respectively; and (b) these gains are robust despite substantial heterogeneity across pools (non-identical). Our modeling results complement and extend the observations of Barak et al (2021) who report an empirical sensitivity well beyond expectations. Finally, we provide an interactive tool for selecting an optimal pool size using contextual information

【2】 Choosing Imputation Models 标题:归责模型的选择

作者:Moritz Marbach 机构:Texas A&M University, The Bush School of Government & Public Service 链接:https://arxiv.org/abs/2107.05427 摘要:缺失值的插补是数据分析中一个重要的预处理步骤,但文献对如何选择不同的插补模型提供了很少的指导。这封信建议采用插补模型,在平衡所有其他协变量后,生成与不完整变量观测值最相似的插补值密度。我们推荐稳定的平衡权作为平衡协变量的一种实用方法,如果值不是完全随机缺失的话,协变量的分布预计会有所不同。平衡后,差异统计可用于比较插补值和观测值的密度。我们使用美国全国选举研究的模拟和真实调查数据说明了建议方法的应用,比较了流行的插补方法,包括随机森林、热甲板、预测平均值匹配和多元正态插补。本函附有一个实现建议方法的R包。 摘要:Imputing missing values is an important preprocessing step in data analysis, but the literature offers little guidance on how to choose between different imputation models. This letter suggests adopting the imputation model that generates a density of imputed values most similar to those of the observed values for an incomplete variable after balancing all other covariates. We recommend stable balancing weights as a practical approach to balance covariates whose distribution is expected to differ if the values are not missing completely at random. After balancing, discrepancy statistics can be used to compare the density of imputed and observed values. We illustrate the application of the suggested approach using simulated and real-world survey data from the American National Election Study, comparing popular imputation approaches including random forests, hot-deck, predictive mean matching, and multivariate normal imputation. An R package implementing the suggested approach accompanies this letter.

【3】 Cohesion and Repulsion in Bayesian Distance Clustering 标题:贝叶斯距离聚类中的凝聚力和斥力

作者:Abhinav Natarajan,Maria De Iorio,Andreas Heinecke,Emanuel Mayer,Simon Glenn 机构:Yale-NUS College, National University of Singapore, University of Oxford 链接:https://arxiv.org/abs/2107.05414 摘要:高维聚类带来了许多统计挑战。虽然传统的基于距离的聚类方法在计算上是可行的,但是它们缺乏概率解释,并且依赖于启发式算法来估计聚类的数目。另一方面,基于概率模型的聚类技术往往无法扩展,设计能够有效探索后验空间的算法是一个开放的问题。基于贝叶斯距离聚类的最新进展,我们提出了一种混合解决方案,该方案需要定义观测值之间成对距离的可能性。该方法的新颖之处在于在似然中同时包含了内聚项和排斥项,从而实现了聚类的可识别性。这意味着簇是由对象组成的,这些对象在它们之间具有很小的“差异”(内聚),并且与其他簇中的观察结果具有相似的差异(排斥)。我们展示了这种建模策略如何与文献中已有的建议以及决策理论解释有着有趣的联系。该方法计算效率高,适用于多种情况。我们在模拟研究和数字货币学的应用中演示了这种方法。 摘要:Clustering in high-dimensions poses many statistical challenges. While traditional distance-based clustering methods are computationally feasible, they lack probabilistic interpretation and rely on heuristics for estimation of the number of clusters. On the other hand, probabilistic model-based clustering techniques often fail to scale and devising algorithms that are able to effectively explore the posterior space is an open problem. Based on recent developments in Bayesian distance-based clustering, we propose a hybrid solution that entails defining a likelihood on pairwise distances between observations. The novelty of the approach consists in including both cohesion and repulsion terms in the likelihood, which allows for cluster identifiability. This implies that clusters are composed of objects which have small "dissimilarities" among themselves (cohesion) and similar dissimilarities to observations in other clusters (repulsion). We show how this modelling strategy has interesting connection with existing proposals in the literature as well as a decision-theoretic interpretation. The proposed method is computationally efficient and applicable to a wide variety of scenarios. We demonstrate the approach in a simulation study and an application in digital numismatics.

【4】 Metalearning Linear Bandits by Prior Update 标题:基于先验更新的元学习线性Bitts

作者:Amit Peleg,Naama Pearl,Ron Meir 机构:Technion, Israel, University of Haifa, Israel 链接:https://arxiv.org/abs/2107.05320 摘要:序贯决策的完全贝叶斯方法假设问题参数是由已知的先验信息生成的,而在实际应用中,这种信息往往是缺乏的,需要通过学习来估计。这一问题在具有部分信息的决策设置中更加严重,使用错误的先验可能导致较差的探索和较差的性能。在这项工作中,我们证明,在随机线性强盗和高斯先验的情况下,只要先验估计足够接近真实先验,使用错误先验的算法的性能接近使用真实先验的算法的性能。接下来,我们讨论通过metalearning学习先验知识的任务,即学习者在多个任务实例中更新先验知识的估计,以提高未来任务的性能。然后在每个任务中根据传入的观察更新估计的先验值,同时选择行动以最大化预期回报。在这项工作中,我们应用这个方案在一个线性土匪设置,并提供了算法和遗憾界,证明其有效性,相比,算法知道正确的先验知识。我们的结果适用于一类广泛的算法,包括,例如,汤普森采样和信息定向采样。 摘要:Fully Bayesian approaches to sequential decision-making assume that problem parameters are generated from a known prior, while in practice, such information is often lacking, and needs to be estimated through learning. This problem is exacerbated in decision-making setups with partial information, where using a misspecified prior may lead to poor exploration and inferior performance. In this work we prove, in the context of stochastic linear bandits and Gaussian priors, that as long as the prior estimate is sufficiently close to the true prior, the performance of an algorithm that uses the misspecified prior is close to that of the algorithm that uses the true prior. Next, we address the task of learning the prior through metalearning, where a learner updates its estimate of the prior across multiple task instances in order to improve performance on future tasks. The estimated prior is then updated within each task based on incoming observations, while actions are selected in order to maximize expected reward. In this work we apply this scheme within a linear bandit setting, and provide algorithms and regret bounds, demonstrating its effectiveness, as compared to an algorithm that knows the correct prior. Our results hold for a broad class of algorithms, including, for example, Thompson Sampling and Information Directed Sampling.

【5】 A stochastic Gauss-Newton algorithm for regularized semi-discrete optimal transport 标题:正则化半离散最优运输的随机高斯-牛顿算法

作者:Bernard Bercu,Jérémie Bigot,Sébastien Gadat,Emilia Siviero 机构:Institut de Math´ematiques de Bordeaux et CNRS (UMR ,), Universit´e de Bordeaux, Toulouse School of Economics, Universit´e Toulouse , Capitole, LTCI, T´el´ecom Paris, Institut Polytechnique de Paris 链接:https://arxiv.org/abs/2107.05291 摘要:我们引入一个新的二阶随机算法来估计两个概率测度之间的熵正则最优运输成本。源度量可以任意选择,可以是绝对连续的,也可以是离散的,而目标度量被假定为离散的。为了解决这样的正则化和半离散最优运输问题的半对偶公式,我们建议考虑使用从源测度中采样的数据序列的随机Gauss Newton算法。该算法在不需要精确调整重要超参数的情况下,对凸优化问题的几何结构具有自适应性。我们建立了几乎确定的收敛性和渐近正态的各种估计的兴趣是由这个随机高斯-牛顿算法。我们还分析了在目标函数不具有强凸性的情况下,它们对于期望二次风险的非渐近收敛速度。数值实验的结果也说明了高斯-牛顿算法在随机正则化最优输运问题上的有限样本性质,以及与随机梯度下降、随机牛顿和ADAM算法相比的优越性。 摘要:We introduce a new second order stochastic algorithm to estimate the entropically regularized optimal transport cost between two probability measures. The source measure can be arbitrary chosen, either absolutely continuous or discrete, while the target measure is assumed to be discrete. To solve the semi-dual formulation of such a regularized and semi-discrete optimal transportation problem, we propose to consider a stochastic Gauss-Newton algorithm that uses a sequence of data sampled from the source measure. This algorithm is shown to be adaptive to the geometry of the underlying convex optimization problem with no important hyperparameter to be accurately tuned. We establish the almost sure convergence and the asymptotic normality of various estimators of interest that are constructed from this stochastic Gauss-Newton algorithm. We also analyze their non-asymptotic rates of convergence for the expected quadratic risk in the absence of strong convexity of the underlying objective function. The results of numerical experiments from simulated data are also reported to illustrate the finite sample properties of this Gauss-Newton algorithm for stochastic regularized optimal transport, and to show its advantages over the use of the stochastic gradient descent, stochastic Newton and ADAM algorithms.

【6】 Multiplicative deconvolution in survival analysis under dependency 标题:相依条件下生存分析中的乘法反卷积

作者:Sergio Brenner Miguel,Nathawut Phandoidaen 机构:Institut f¨ur angewandte Mathematik, Im Neuenheimer Feld , Universit¨at Heidelberg 备注:29 pages, 2 figures, 2 tables 链接:https://arxiv.org/abs/2107.05267 摘要:研究了在R+上支持的未知生存函数S的非参数估计问题。所提出的完全数据驱动程序是基于生存函数S的Mellin变换的估计和Mellin变换逆的谱截止正则化。即将到来的偏差-方差权衡是通过数据驱动的截止参数选择来处理的。为了讨论偏项,我们考虑了Melin Sobolev空间,通过其梅林变换的衰减来刻画未知生存函数S的正则性。对于方差项的分析,我们考虑I.I.D.情况,并以伯努利移位过程和β混合序列的形式包含相关的观测值。此外,我们还证明了谱截止估计在Mellin-Sobolev空间上的minimax最优性。 摘要:We study the non-parametric estimation of an unknown survival function S with support on R+ based on a sample with multiplicative measurement errors. The proposed fully-data driven procedure is based on the estimation of the Mellin transform of the survival function S and a regularisation of the inverse of the Mellin transform by a spectral cut-off. The upcoming bias-variance trade-off is dealt with by a data-driven choice of the cut-off parameter. In order to discuss the bias term, we consider the Mellin-Sobolev spaces which characterize the regularity of the unknown survival function S through the decay of its Mellin transform. For the analysis of the variance term, we consider the i.i.d. case and incorporate dependent observations in form of Bernoulli shift processes and beta mixing sequences. Additionally, we show minimax-optimality over Mellin-Sobolev spaces of the spectral cut-off estimator.

【7】 Consensus as a Nash Equilibrium of a stochastic differential game 标题:作为随机微分博弈纳什均衡的共识

作者:Paramahansa Pramanik 机构:Department of Mathematics and Statistics, University of South Alabama, Mobile, AL , USA. 链接:https://arxiv.org/abs/2107.05183 摘要:本文在一个社会网络中建立了一个共识,这个社会网络是由该网络中的代理所进行的随机微分博弈所模拟的。每个代理独立地最小化一个代表其动机的成本函数。在agent的意见过滤下,考虑了一个条件期望的积分代价函数。在随机微分意见动态下,使动态代价泛函最小化。由于意见动力学代表了一个代理人与其他人以及他们以前的意见的不同,随机影响和顽固性使其更加不稳定。代理使用他们在特定时间点的意见变化率作为控制输入。这是一个具有反馈纳什均衡的非合作随机微分对策。采用Feynman型路径积分方法确定最优反馈意见和控制。这是本文的一种新方法。本文随后给出了反馈纳什均衡意见的显式解。 摘要:In this paper a consensus has been constructed in a social network which is modeled by a stochastic differential game played by agents of that network. Each agent independently minimizes a cost function which represents their motives. A conditionally expected integral cost function has been considered under an agent's opinion filtration. The dynamic cost functional is minimized subject to a stochastic differential opinion dynamics. As opinion dynamics represents an agent's differences of opinion from the others as well as from their previous opinions, random influences and stubbornness make it more volatile. An agent uses their rate of change of opinion at certain time point as a control input. This turns out to be a non-cooperative stochastic differential game which have a feedback Nash equilibrium. A Feynman-type path integral approach has been used to determine an optimal feedback opinion and control. This is a new approach in this literature. Later in this paper an explicit solution of a feedback Nash equilibrium opinion is determined.

【8】 Discovery of Bayes' Table at Tunbridge Wells 标题:隧桥井贝叶斯表的发现

作者:David C. Schneider,Roy Thompson 机构:Department of Ocean Sciences, Memorial University, St. John’s NL A,C,S, Canada, College Drive, Tunbridge Wells, Kent, TN,PN UK, In , Thomas Bayes expressed an interest in the problem of combining repeated 备注:6 pages, 2 figures 链接:https://arxiv.org/abs/2107.05145 摘要:1755年,托马斯·贝耶斯(thomasbayes)对结合恒星位置的重复测量问题表示了兴趣。贝耶斯描述了一个串联的设置一个球扔在一张桌子上,然后重复扔第二个球。贝耶斯的桌子长期以来一直被当作台球桌,对此没有任何证据。我们报道了贝耶斯桌子的发现,这是一个保龄球场,距离贝耶斯担任部长20年的会议室半公里(东南)。Bayes的绘图显示了一个以码为单位的矩形空间,它允许计算不确定度的间隔测量。从2.5%到97.5%的Bayes规则间隔为0.56-0.42=0.12,相当于0.61 m。Bayes表的发现为Bayes对称概率模型建立了物理基础,即固定参数二项式({\theta}=0.5)。这一发现使贝叶斯成为统计科学的奠基人,统计科学被定义为数学在科学测量中的应用。 摘要:In 1755 Thomas Bayes expressed an interest in the problem of combining repeated measurements of the location of a star. Bayes described a tandem set-up of a ball thrown on a table, followed by repeated throws of a second ball. Bayes' table has long been taken as a billiard table, for which there is no evidence. We report the discovery of Bayes' table, a bowling green located half a km uphill (SE) from the meeting house where Bayes served as minister for two decades. Bayes' drawing shows a rectangular space marked off in yards, which allows calculation of an interval measurement of uncertainty. The Bayes rule interval from 2.5% to 97.5% is from 0.56 - 0.42 = 0.12 perches equivalent to 0.61 m. The discovery of Bayes' table establishes the physical basis for Bayes' symmetrical probability model, a fixed parameter binomial ({\theta} = 0.5). The discovery establishes Bayes as the founder of statistical science, defined as the application of mathematics to scientific measurement.

【9】 Derivatives and residual distribution of regularized M-estimators with application to adaptive tuning 标题:正则化M-估值器的导数和残差分布及其在自适应校正中的应用

作者:Pierre C Bellec,Yiwei Shen 机构:Department of Statistics, Rutgers University 链接:https://arxiv.org/abs/2107.05143 摘要:研究了具有高斯设计矩阵和任意噪声分布的线性模型中具有凸罚正则化的梯度Lipschitz损失函数的M估计。一个实例是用Huber损失和弹性网惩罚构造的噪声分布具有重尾的鲁棒M估计。我们的主要贡献有三个方面(i) 我们提供了正则M-估计$\hat\beta(y,X)$导数的一般公式,其中对$y$和$X$进行微分;这揭示了一个简单的可微结构共享所有凸正则M-估计(ii)利用这些导数,我们描述了中高维区域中剩余$r_i=y_i-x_i^\top\hat\beta$的分布,其中维度和样本大小具有相同的顺序(iii)受残差分布的影响,我们提出了一种新的自适应准则来选择正则M-估计的调谐参数。该准则将样本外误差近似为一个独立于估计量的加性常数,因此最小化该准则为最小化样本外误差提供了一个代理。所提出的自适应准则不需要了解噪声分布或设计协方差。模拟数据证实了理论结果,关于残差的分布和作为样本外误差代理的准则的成功。最后,我们的结果揭示了$\hat\beta(y,X)$的导数与M-估计的有效自由度之间的新关系,它们是独立的。 摘要:This paper studies M-estimators with gradient-Lipschitz loss function regularized with convex penalty in linear models with Gaussian design matrix and arbitrary noise distribution. A practical example is the robust M-estimator constructed with the Huber loss and the Elastic-Net penalty and the noise distribution has heavy-tails. Our main contributions are three-fold. (i) We provide general formulae for the derivatives of regularized M-estimators $\hat\beta(y,X)$ where differentiation is taken with respect to both $y$ and $X$; this reveals a simple differentiability structure shared by all convex regularized M-estimators. (ii) Using these derivatives, we characterize the distribution of the residual $r_i = y_i-x_i^\top\hat\beta$ in the intermediate high-dimensional regime where dimension and sample size are of the same order. (iii) Motivated by the distribution of the residuals, we propose a novel adaptive criterion to select tuning parameters of regularized M-estimators. The criterion approximates the out-of-sample error up to an additive constant independent of the estimator, so that minimizing the criterion provides a proxy for minimizing the out-of-sample error. The proposed adaptive criterion does not require the knowledge of the noise distribution or of the covariance of the design. Simulated data confirms the theoretical findings, regarding both the distribution of the residuals and the success of the criterion as a proxy of the out-of-sample error. Finally our results reveal new relationships between the derivatives of $\hat\beta(y,X)$ and the effective degrees of freedom of the M-estimator, which are of independent interest.

【10】 Rank-based Bayesian variable selection for genome-wide transcriptomic analyses 标题:基于秩的贝叶斯变量选择用于全基因组转录分析

作者:Emilie Eliseussen,Thomas Fleischer,Valeria Vitelli 机构:Oslo Centre for Biostatistics and Epidemiology, Department of, Biostatistics, University of Oslo, Oslo, Norway, Department of Cancer Genetics, Institute for Cancer Research, Oslo University Hospital, Oslo, Norway 备注:23 pages, 11 figures 链接:https://arxiv.org/abs/2107.05072 摘要:变量选择在基于高维组学的分析中是至关重要的,因为在生物学上,假设只有一部分非噪声特征对数据结构有贡献是合理的。然而,在无监督的环境下,这项任务尤其困难,尽管存在明显的缺陷和缺乏可重复性,但先验的特殊变量选择仍然是一种非常常见的方法。我们提出了一种用于基于秩的转录组分析的贝叶斯变量选择方法。与传统的统计方法相比,使用数据排序代替实际的连续测量可以提高结论的稳健性,并且将变量选择嵌入到推理任务中可以实现完全的重复性。具体来说,我们开发了一个新的扩展贝叶斯Mallows模型的变量选择,允许一个完整的概率分析,导致连贯量化的不确定性。我们使用几个数据生成程序对模拟数据进行了测试,证明了该方法在不同场景下的通用性和鲁棒性。然后,我们使用这种新的方法来分析卵巢癌样本的全基因组RNAseq基因表达数据:以完全无监督的方式正确地检测到影响癌症发展的几个基因,显示了该方法在癌症基因组学特征发现方面的有用性。此外,还进行不确定度量化的可能性在随后的生物学研究中起着关键作用。 摘要:Variable selection is crucial in high-dimensional omics-based analyses, since it is biologically reasonable to assume only a subset of non-noisy features contributes to the data structures. However, the task is particularly hard in an unsupervised setting, and a priori ad hoc variable selection is still a very frequent approach, despite the evident drawbacks and lack of reproducibility. We propose a Bayesian variable selection approach for rank-based transcriptomic analysis. Making use of data rankings instead of the actual continuous measurements increases the robustness of conclusions when compared to classical statistical methods, and embedding variable selection into the inferential tasks allows complete reproducibility. Specifically, we develop a novel extension of the Bayesian Mallows model for variable selection that allows for a full probabilistic analysis, leading to coherent quantification of uncertainties. We test our approach on simulated data using several data generating procedures, demonstrating the versatility and robustness of the method under different scenarios. We then use the novel approach to analyse genome-wide RNAseq gene expression data from ovarian cancer samples: several genes that affect cancer development are correctly detected in a completely unsupervised fashion, showing the method usefulness in the context of signature discovery for cancer genomics. Moreover, the possibility to also perform uncertainty quantification plays a key role in the subsequent biological investigation.

【11】 Improving Efficiency and Accuracy of Causal Discovery Using a Hierarchical Wrapper 标题:使用分层包装器提高因果发现的效率和准确性

作者:Shami Nisimov,Yaniv Gurwicz,Raanan Y. Rohekar,Gal Novik 机构:Intel Labs 备注:The 37th Conference on Uncertainty in Artificial Intelligence (UAI 2021), Workshop on Tractable Probabilistic Modeling 链接:https://arxiv.org/abs/2107.05001 摘要:从观测数据中发现因果关系是许多科学分支的重要工具。在某些假设下,它允许科学家解释现象、预测和做出决定。在大样本限制下,已有完善的因果发现算法被引入其中,搜索表示因果关系的有向无环图(DAG)或其等价类。然而,在现实世界中,只有有限的训练数据可用,这限制了这些算法使用的统计测试的能力,导致推断的因果模型中的错误。这通常是通过设计一个策略来解决的,该策略使用尽可能少的统计测试。在本文中,我们以递归包装的形式为现有的基于约束的因果发现算法引入了这样一种策略,它保持了算法的可靠性和完整性。它从一开始就使用规范化的最小割准则递归地对观测变量进行聚类,并在回溯过程中使用基线因果发现算法来学习局部子图。然后将它们结合起来,确保完整性。通过烧蚀研究、使用合成数据和常见的实际基准,我们证明了我们的方法需要更少的统计测试,学习更精确的图形,并且需要比基线算法更短的运行时间。 摘要:Causal discovery from observational data is an important tool in many branches of science. Under certain assumptions it allows scientists to explain phenomena, predict, and make decisions. In the large sample limit, sound and complete causal discovery algorithms have been previously introduced, where a directed acyclic graph (DAG), or its equivalence class, representing causal relations is searched. However, in real-world cases, only finite training data is available, which limits the power of statistical tests used by these algorithms, leading to errors in the inferred causal model. This is commonly addressed by devising a strategy for using as few as possible statistical tests. In this paper, we introduce such a strategy in the form of a recursive wrapper for existing constraint-based causal discovery algorithms, which preserves soundness and completeness. It recursively clusters the observed variables using the normalized min-cut criterion from the outset, and uses a baseline causal discovery algorithm during backtracking for learning local sub-graphs. It then combines them and ensures completeness. By an ablation study, using synthetic data, and by common real-world benchmarks, we demonstrate that our approach requires significantly fewer statistical tests, learns more accurate graphs, and requires shorter run-times than the baseline algorithm.

【12】 A prediction perspective on the Wiener-Hopf equations for discrete time series 标题:离散时间序列Wiener-Hopf方程的预测前景

作者:Suhasini Subba Rao,Junho Yang 机构:Texas A&M University, College Station, TX , U.S.A. 链接:https://arxiv.org/abs/2107.04994 摘要:Wiener-Hopf方程是一个Toeplitz线性方程组,在时间序列中有多种应用。其中包括平稳卡尔曼滤波方程的更新和预测步骤以及二元时间序列的预测。Wiener-Hopf技术是求解方程组的经典工具,它基于傅里叶级数展开中系数的比较。本文的目的是重温(离散)Wiener-Hopf方程,并获得更符合时间序列分析精神的解的替代表达式。具体来说,我们提出了一种结合线性预测和反褶积的Wiener-Hopf方程的解。求解Wiener-Hopf方程需要对其下的谱密度函数进行谱分解。对于一般的谱密度函数,这是不可行的。因此,通常假设谱密度是合理的,这使得我们可以得到一个计算上易于处理的解。当基本谱密度不是有理函数时,这会导致近似误差。利用所提出的解和Baxter不等式导出了有理谱密度近似的误差界。 摘要:The Wiener-Hopf equations are a Toeplitz system of linear equations that have several applications in time series. These include the update and prediction step of the stationary Kalman filter equations and the prediction of bivariate time series. The Wiener-Hopf technique is the classical tool for solving the equations, and is based on a comparison of coefficients in a Fourier series expansion. The purpose of this note is to revisit the (discrete) Wiener-Hopf equations and obtain an alternative expression for the solution that is more in the spirit of time series analysis. Specifically, we propose a solution to the Wiener-Hopf equations that combines linear prediction with deconvolution. The solution of the Wiener-Hopf equations requires one to obtain the spectral factorization of the underlying spectral density function. For general spectral density functions this is infeasible. Therefore, it is usually assumed that the spectral density is rational, which allows one to obtain a computationally tractable solution. This leads to an approximation error when the underlying spectral density is not a rational function. We use the proposed solution together with Baxter's inequality to derive an error bound for the rational spectral density approximation.

【13】 Extrapolation Estimation for Parametric Regression with Normal Measurement Error 标题:具有正态测量误差的参数回归的外推估计

作者:Kanwal Ayub,Weixing Song 机构:Department of Statistics, Kansas State University, Manhattan, KS 备注:30 pages, 8 figures, 3 tables 链接:https://arxiv.org/abs/2107.04923 摘要:对于含有正态测量误差的协变量的一般参数回归模型,本文提出了一种加速的经典模拟外推算法来估计回归函数中的未知参数。通过将条件期望直接应用于目标函数,该算法通过生成一个估计方程以供立即使用或外推,成功地消除了模拟步骤,从而显著减少了计算时间。详细讨论了估计的大样本性质,包括相合性和渐近正态性。通过实例、仿真研究和实际数据分析,说明了该方法的广泛应用前景。 摘要:For the general parametric regression models with covariates contaminated with normal measurement errors, this paper proposes an accelerated version of the classical simulation extrapolation algorithm to estimate the unknown parameters in the regression function. By applying the conditional expectation directly to the target function, the proposed algorithm successfully removes the simulation step, by generating an estimation equation either for immediate use or for extrapolating, thus significantly reducing the computational time. Large sample properties of the resulting estimator, including the consistency and the asymptotic normality, are thoroughly discussed. Potential wide applications of the proposed estimation procedure are illustrated by examples, simulation studies, as well as a real data analysis.

【14】 Deep Quantile Regression: Mitigating the Curse of Dimensionality Through Composition 标题:深度分位数回归:通过组合减轻维度诅咒

作者:Guohao Shen,Yuling Jiao,Yuanyuan Lin,Joel L. Horowitz,Jian Huang 备注:Guohao Shen and Yuling Jiao contributed equally to this work. Co-corresponding authors: Yuanyuan Lin (email: ylin@sta.cuhk.edu.hk) and Jian Huang (email: jian-huang@uiowa.edu) 链接:https://arxiv.org/abs/2107.04907 摘要:在目标条件分位数函数是一系列低维函数的组合的假设下,研究了非参数分位数回归问题。研究了用深度神经网络逼近目标条件分位数函数的非参数分位数回归估计。为了方便起见,我们将这种估计称为深分位数回归(DQR)估计。建立了DQR估计的超额风险和积分均方误差的非渐近误差界。我们的结果表明,DQR估计具有预言性质,在这个意义上,它实现了由条件分位数函数的基本组成结构的内在维数决定的非参数minimax最优速率,而不是由预测器的环境维数决定的。因此,在假设条件分位数函数具有组成结构的情况下,DQR能够减轻维数灾难。为了建立这些结果,我们用神经网络分析了复合函数的逼近误差,结果表明,误差率只取决于分量函数的维数,而不是函数的环境维数。我们将我们的一般结果应用于几个常用于减轻维数灾难的重要统计模型,包括单指数、加法、投影寻踪、单变量合成和广义层次交互模型。我们用数据的维数明确地描述了误差界中的预因子,并证明了这些模型中的预因子线性地或二次地依赖于数据的维数。 摘要:This paper considers the problem of nonparametric quantile regression under the assumption that the target conditional quantile function is a composition of a sequence of low-dimensional functions. We study the nonparametric quantile regression estimator using deep neural networks to approximate the target conditional quantile function. For convenience, we shall refer to such an estimator as a deep quantile regression (DQR) estimator. We establish non-asymptotic error bounds for the excess risk and the mean integrated squared errors of the DQR estimator. Our results show that the DQR estimator has an oracle property in the sense that it achieves the nonparametric minimax optimal rate determined by the intrinsic dimension of the underlying compositional structure of the conditional quantile function, not the ambient dimension of the predictor. Therefore, DQR is able to mitigate the curse of dimensionality under the assumption that the conditional quantile function has a compositional structure. To establish these results, we analyze the approximation error of a composite function by neural networks and show that the error rate only depends on the dimensions of the component functions, instead of the ambient dimension of the function. We apply our general results to several important statistical models often used in mitigating the curse of dimensionality, including the single index, the additive, the projection pursuit, the univariate composite, and the generalized hierarchical interaction models. We explicitly describe the prefactors in the error bounds in terms of the dimensionality of the data and show that the prefactors depends on the dimensionality linearly or quadratically in these models.

【15】 The EAS approach to variable selection for multivariate response data in high-dimensional settings 标题:高维环境下多变量响应数据变量选择的EAS方法

作者:Salil Koner,Jonathan P Williams 机构:ME] 10 Jul 20 2 1Submitted to the Annals of StatisticsTHE EAS APPROACH TO VARIABLE SELECTION FOR MULTIVARIATERESPONSE DATA IN HIGH-DIMENSIONAL SETTINGSBY SALIL KONER 1 AND JONATHAN P WILLIAMS 1 1Department of Statistics, North Carolina State University 链接:https://arxiv.org/abs/2107.04873 摘要:在本文中,我们将epsilon容许子集(EAS)模型选择方法从高维线性回归环境下的初始构造扩展到高维多元回归环境下进行组变量选择的EAS框架。假设一个矩阵正态线性模型,我们证明了如果存在一个稀疏的、真实的预测数据生成集,EAS策略是渐近一致的。尽管如此,我们的EAS策略被设计用于在相关预测和/或在缺乏稀疏性假设的情况下估计一类简约模型上的后验广义置信分布。为此,我们的方法的有效性在模拟研究中得到了实证证明,并与其他最先进的模型/变量选择方法进行了比较。 摘要:In this paper, we extend the epsilon admissible subsets (EAS) model selection approach, from its original construction in the high-dimensional linear regression setting, to an EAS framework for performing group variable selection in the high-dimensional multivariate regression setting. Assuming a matrix-Normal linear model we show that the EAS strategy is asymptotically consistent if there exists a sparse, true data generating set of predictors. Nonetheless, our EAS strategy is designed to estimate a posterior-like, generalized fiducial distribution over a parsimonious class of models in the setting of correlated predictors and/or in the absence of a sparsity assumption. The effectiveness of our approach, to this end, is demonstrated empirically in simulation studies, and is compared to other state-of-the-art model/variable selection procedures.

【16】 A test for normality and independence based on characteristic function 标题:基于特征函数的正态性和独立性检验

作者:Wiktor Ejsmont,Bojana Milošević,Marko Obradović 链接:https://arxiv.org/abs/2107.04845 摘要:本文证明了多元正态分布的Ejsmont特征的一个推广。在此基础上,提出了一种新的独立性和正态性检验方法。该测试使用经验特征函数乘积与某个常数之差的平方模的积分。特别注意检验一元正态性的情形,其中我们用贝塞尔函数显式地导出检验统计量,以及检验二元正态性和独立性的情形。测试显示,与一些受欢迎的强大竞争对手相比,产品质量表现良好。 摘要:In this article we prove a generalization of the Ejsmont characterization of the multivariate normal distribution. Based on it, we propose a new test for independence and normality. The test uses an integral of the squared modulus of the difference between the product of empirical characteristic functions and some constant. Special attention is given to the case of testing univariate normality in which we derive the test statistic explicitly in terms of Bessel function, and the case of testing bivariate normality and independence. The tests show quality performance in comparison to some popular powerful competitors.

【17】 On Estimating Optimal Regime for Treatment Initiation Time Based on Restricted Mean Residual Lifetime 标题:基于约束平均剩余寿命估计治疗起始时间的最优方案

作者:Xin Chen,Rui Song,Jiajia Zhang,Swann Arp Adams,Liuquan Sun,Wenbin Lu 链接:https://arxiv.org/abs/2107.04839 摘要:何时开始对病人进行治疗是许多医学研究中的一个重要问题,如艾滋病和癌症。在这篇文章中,我们建立了时间到事件数据的治疗开始时间问题,并提出了一个最佳的个体化方案,根据个体患者的特点来确定最佳治疗开始时间。不同于现有的最佳治疗方案,治疗是在预先指定的时间进行的,这里新的挑战来自于治疗起始时间数据中复杂的缺失机制和起始时间方面的连续治疗规则。为了应对这些挑战,我们建议使用限制平均剩余寿命作为价值函数来评估不同治疗启动机制的性能,并开发价值函数的非参数估计,这是一致的,即使治疗开始时间是不完全可观察的,其分布是未知的。我们还建立了决策规则的估计及其相关值函数估计的渐近性质。特别地,估计值函数的渐近分布是非标准的,服从加权卡方分布。通过仿真研究对该方法的有限样本性能进行了评价,并以一个乳腺癌数据为例进行了验证。 摘要:When to initiate treatment on patients is an important problem in many medical studies such as AIDS and cancer. In this article, we formulate the treatment initiation time problem for time-to-event data and propose an optimal individualized regime that determines the best treatment initiation time for individual patients based on their characteristics. Different from existing optimal treatment regimes where treatments are undertaken at a pre-specified time, here new challenges arise from the complicated missing mechanisms in treatment initiation time data and the continuous treatment rule in terms of initiation time. To tackle these challenges, we propose to use restricted mean residual lifetime as a value function to evaluate the performance of different treatment initiation regimes, and develop a nonparametric estimator for the value function, which is consistent even when treatment initiation times are not completely observable and their distribution is unknown. We also establish the asymptotic properties of the resulting estimator in the decision rule and its associated value function estimator. In particular, the asymptotic distribution of the estimated value function is nonstandard, which follows a weighted chi-squared distribution. The finite-sample performance of the proposed method is evaluated by simulation studies and is further illustrated with an application to a breast cancer data.

【18】 Cluster Regularization via a Hierarchical Feature Regression 标题:基于分层特征回归的聚类正则化

作者:Johann Pfitzinger 机构:Goethe University, Frankfurt am Main 链接:https://arxiv.org/abs/2107.04831 摘要:具有高维非正交预测集的预测任务对基于最小二乘法的拟合过程提出了挑战。一个庞大而富有成效的文献存在,讨论各种正则化方法,以提高样本外鲁棒性的参数估计。本文提出了一种新的基于聚类的正则化方法-层次特征回归(HFR),它利用机器学习和图论的知识,沿着预测集的有监督的层次表示来估计参数,将参数向群体目标收缩。该方法的创新之处在于它能够估计预测组的最优组成,以及组内目标。HFR可以看作是一个有监督的因子回归,收缩强度由拟合过程中捕捉到的异质性变化程度的惩罚决定。该方法具有很好的预测精度和通用性,在一系列不同的模拟回归任务(包括密集、稀疏和分组数据生成过程)中优于一组基准正则化估计。一个经济增长预测的应用被用来说明HFR的有效性在一个经验设置,并与几种常用的和贝叶斯的选择进行了有利的比较。 摘要:Prediction tasks with high-dimensional nonorthogonal predictor sets pose a challenge for least squares based fitting procedures. A large and productive literature exists, discussing various regularized approaches to improving the out-of-sample robustness of parameter estimates. This paper proposes a novel cluster-based regularization - the hierarchical feature regression (HFR) -, which mobilizes insights from the domains of machine learning and graph theory to estimate parameters along a supervised hierarchical representation of the predictor set, shrinking parameters towards group targets. The method is innovative in its ability to estimate optimal compositions of predictor groups, as well as the group targets endogenously. The HFR can be viewed as a supervised factor regression, with the strength of shrinkage governed by a penalty on the extent of idiosyncratic variation captured in the fitting process. The method demonstrates good predictive accuracy and versatility, outperforming a panel of benchmark regularized estimators across a diverse set of simulated regression tasks, including dense, sparse and grouped data generating processes. An application to the prediction of economic growth is used to illustrate the HFR's effectiveness in an empirical setting, with favorable comparisons to several frequentist and Bayesian alternatives.

【19】 Convergence Analysis of Schr{ö}dinger-F{ö}llmer Sampler without Convexity 标题:无凸性Schr{ö}dinger-F{ö}llmer采样器的收敛性分析

作者:Yuling Jiao,Lican Kang,Yanyan Liu,Youzhou Zhou 机构:cn†School of Mathematics and Statistics, Wuhan University, cn‡School of Mathematics and Statistics 备注:arXiv admin note: text overlap with arXiv:2106.10880 链接:https://arxiv.org/abs/2107.04766 摘要:Schr{o}dinger-F{o}llmer采样器(SFS)是一种新的、有效的无遍历性的非正规分布采样方法。SFS是基于薛定谔扩散过程$$\mathrm{d}X{t}=-\nabla U\ left(X\U t,t\ right)\mathrm{d}t+\mathrm{d}B{t}、\quad t\in[0,1]、\quad X{U 0=0$$在单位区间上的Euler-Maruyama离散化,它将时间零点的简并分布传递到时间零点的目标分布。在{sfs21}中,SFS的一致性是在一个有限的假设下建立的,即漂移项$b(x,t)$势$U(x,t)$是一致的(在$t$)强%凹凸的(在$x$)。本文在一些光滑有界的条件下,给出了Wasserstein距离上SFS的一个非交感误差界,即目标分布的密度比超过标准正态分布,但不要求势的强凸性。 摘要:Schr\"{o}dinger-F\"{o}llmer sampler (SFS) is a novel and efficient approach for sampling from possibly unnormalized distributions without ergodicity. SFS is based on the Euler-Maruyama discretization of Schr\"{o}dinger-F\"{o}llmer diffusion process $$\mathrm{d} X_{t}=-\nabla U\left(X_t, t\right) \mathrm{d} t+\mathrm{d} B_{t}, \quad t \in[0,1],\quad X_0=0$$ on the unit interval, which transports the degenerate distribution at time zero to the target distribution at time one. In \cite{sfs21}, the consistency of SFS is established under a restricted assumption that %the drift term $b(x,t)$ the potential $U(x,t)$ is uniformly (on $t$) strongly %concave convex (on $x$). In this paper we provide a nonasymptotic error bound of SFS in Wasserstein distance under some smooth and bounded conditions on the density ratio of the target distribution over the standard normal distribution, but without requiring the strongly convexity of the potential.

【20】 Gaussian Process Subspace Regression for Model Reduction 标题:基于高斯过程子空间回归的模型降阶

作者:Ruda Zhang,Simon Mak,David Dunson 机构:edu)‡Department of Statistical Science, Duke University, edu)§Department of Mathematics and Department of Statistical Science 备注:20 pages, 4 figures; with supplementary material 链接:https://arxiv.org/abs/2107.04668 摘要:子空间值函数存在于许多问题中,包括参数降阶建模(PROM)。在PROM中,每个参数点可以与一个子空间相关联,用于大系统矩阵的Petrov-Galerkin投影。以前近似这类函数的工作是在流形上使用插值,这可能是不准确和缓慢的。为了解决这个问题,我们提出了一种新的用于子空间预测的贝叶斯非参数模型:高斯过程子空间回归(GPS)模型。这种方法既有内在的又有外在的:利用欧氏空间上的多元高斯分布,在固定维子空间的Grassmann流形上建立联合概率模型。GPS采用了一种简单而通用的相关结构和一种有原则的模型选择方法。它的预测分布允许一种解析形式,允许在参数空间上进行有效的子空间预测。对于PROM,GPS以不依赖于系统维数的计算复杂度,在保留局部简化模型精度的新参数点处提供概率预测,因此适合于在线计算。我们给出了四个数值例子来比较我们的方法与子空间插值,以及两种插值局部简化模型的方法。总的来说,GPS是数据效率最高的,比子空间插值计算效率更高,并且可以通过不确定性量化进行平滑预测。 摘要:Subspace-valued functions arise in a wide range of problems, including parametric reduced order modeling (PROM). In PROM, each parameter point can be associated with a subspace, which is used for Petrov-Galerkin projections of large system matrices. Previous efforts to approximate such functions use interpolations on manifolds, which can be inaccurate and slow. To tackle this, we propose a novel Bayesian nonparametric model for subspace prediction: the Gaussian Process Subspace regression (GPS) model. This method is extrinsic and intrinsic at the same time: with multivariate Gaussian distributions on the Euclidean space, it induces a joint probability model on the Grassmann manifold, the set of fixed-dimensional subspaces. The GPS adopts a simple yet general correlation structure, and a principled approach for model selection. Its predictive distribution admits an analytical form, which allows for efficient subspace prediction over the parameter space. For PROM, the GPS provides a probabilistic prediction at a new parameter point that retains the accuracy of local reduced models, at a computational complexity that does not depend on system dimension, and thus is suitable for online computation. We give four numerical examples to compare our method to subspace interpolation, as well as two methods that interpolate local reduced models. Overall, GPS is the most data efficient, more computationally efficient than subspace interpolation, and gives smooth predictions with uncertainty quantification.

【21】 Relative Performance of Fisher Information in Interval Estimation 标题:区间估计中Fisher信息的相对性能

作者:Sihang Jiang 机构:Department of Engineering Systems and Environment, University of Virginia, Charlottesville, United States 备注:11 pages 链接:https://arxiv.org/abs/2107.04620 摘要:最大似然估计及其相应的置信域是统计推断中常用的估计。在实际应用中,人们通常根据极大似然估计的渐近正态分布,利用给定样本数据的Fisher信息构造近似置信域。两个常见的Fisher信息矩阵(FIMs,对于多变量参数)是观测FIM(负对数似然函数的Hessian矩阵)和期望FIM(观测FIM的期望)。在本文中,我们证明了在一定条件下,在均方误差(MSE)准则下,具有期望FIM的MLE的每个元素的近似置信区间至少与具有观测FIM的MLE的近似置信区间一样精确。 摘要:Maximum likelihood estimates and corresponding confidence regions of the estimates are commonly used in statistical inference. In practice, people often construct approximate confidence regions with the Fisher information at given sample data based on the asymptotic normal distribution of the MLE (maximum likelihood estimate). Two common Fisher information matrices (FIMs, for multivariate parameters) are the observed FIM (the Hessian matrix of negative log-likelihood function) and the expected FIM (the expectation of the observed FIM). In this article, we prove that under certain conditions and with an MSE (mean-squared error) criterion, approximate confidence interval of each element of the MLE with the expected FIM is at least as accurate as that with the observed FIM.

【22】 Understanding the Communist Party of China's Information Operations 标题:对中国共产党信息作战的几点认识

作者:Rohit Dube 机构:Independent Researcher, California, USA 备注:5 pages 链接:https://arxiv.org/abs/2107.05602 摘要:众所周知,中国共产党从事影响舆论的信息行动。在这篇文章中,我们试图了解共产党在最近的信息行动中所使用的策略——一种影响香港民主运动的叙事。我们使用一个包含帐户信息和tweets的Twitter数据集进行操作。我们的研究表明,香港的操作至少部分是由人类手动进行的,而不是完全由自动化机器人进行的。我们还显示,中共在对中国异见人士的人身攻击和COVID-19上的信息中,夹杂着中共对行动期间抗议活动的看法。最后,我们得出结论,Twitter数据集中的信息操作网络是为了放大其他地方产生的内容,而不是用原始内容影响叙事。 摘要:The Communist Party of China is known to engage in Information Operations to influence public opinion. In this paper, we seek to understand the tactics used by the Communist Party in a recent Information Operation - the one conducted to influence the narrative around the pro-democracy movement in Hong Kong. We use a Twitter dataset containing account information and tweets for the operation. Our research shows that the Hong Kong operation was (at least) partially conducted manually by humans rather than entirely by automated bots. We also show that the Communist Party mixed in personal attacks on Chinese dissidents and messages on COVID-19 with the party's views on the protests during the operation. Finally, we conclude that the Information Operation network in the Twitter dataset was set up to amplify content generated elsewhere rather than to influence the narrative with original content.

【23】 Nonlinear Least Squares for Large-Scale Machine Learning using Stochastic Jacobian Estimates 标题:基于随机雅可比估计的非线性最小二乘大规模机器学习

作者:Johannes J. Brust 机构: Note that in ( 1) both 1Department of Mathematics, University of California SanDiego 备注:None 链接:https://arxiv.org/abs/2107.05598 摘要:对于机器学习中的大型非线性最小二乘损失函数,我们利用了模型参数的数量通常超过一批数据的特性。这意味着在损失的Hessian中有一个低秩结构,这使得能够有效地计算搜索方向。利用这一性质,我们开发了两种估计雅可比矩阵的算法,并与现有的方法进行了比较。 摘要:For large nonlinear least squares loss functions in machine learning we exploit the property that the number of model parameters typically exceeds the data in one batch. This implies a low-rank structure in the Hessian of the loss, which enables effective means to compute search directions. Using this property, we develop two algorithms that estimate Jacobian matrices and perform well when compared to state-of-the-art methods.

【24】 Investor Behavior Modeling by Analyzing Financial Advisor Notes: A Machine Learning Perspective 标题:基于机器学习的财务顾问笔记分析投资者行为建模

作者:Cynthia Pagliaro,Dhagash Mehta,Han-Tai Shiao,Shaofei Wang,Luwei Xiong 机构:The Vanguard Group, Malvern, PA, USA 备注:8 pages, 2 column format, 7 figures+5 tables 链接:https://arxiv.org/abs/2107.05592 摘要:对投资者行为进行建模对于确定财务顾问的行为辅导机会至关重要。借助自然语言处理(NLP),我们分析了一个非结构化(文本)的数据集,这些数据集是金融顾问在每次投资者谈话后所做的总结笔记,从而首次深入了解顾问与投资者之间的互动。这些洞察用于预测不利市场条件下的投资者需求;从而使顾问能够指导投资者,帮助避免不当的财务决策。首先,我们执行主题建模以深入了解新出现的主题和趋势。基于这一观点,我们构建了一个监督分类模型来预测在市场波动期间,建议投资者需要行为指导的概率。据我们所知,我们的研究是第一个利用非结构化数据探索顾问与投资者关系的研究。这项工作可能对传统的和新兴的金融咨询服务模式(如robo咨询)产生深远的影响。 摘要:Modeling investor behavior is crucial to identifying behavioral coaching opportunities for financial advisors. With the help of natural language processing (NLP) we analyze an unstructured (textual) dataset of financial advisors' summary notes, taken after every investor conversation, to gain first ever insights into advisor-investor interactions. These insights are used to predict investor needs during adverse market conditions; thus allowing advisors to coach investors and help avoid inappropriate financial decision-making. First, we perform topic modeling to gain insight into the emerging topics and trends. Based on this insight, we construct a supervised classification model to predict the probability that an advised investor will require behavioral coaching during volatile market periods. To the best of our knowledge, ours is the first work on exploring the advisor-investor relationship using unstructured data. This work may have far-reaching implications for both traditional and emerging financial advisory service models like robo-advising.

【25】 Differentially Private Stochastic Optimization: New Results in Convex and Non-Convex Settings 标题:微分私有随机优化:凸和非凸设置下的新结果

作者:Cristóbal Guzmán,Raef Bassily,Michael Menart 机构: The Ohio State University, edu†Department of Applied Mathematics, University of Twente and Institute for Mathematical and Computational Engineering, Pon-tificia Universidad Católica de Chile c 链接:https://arxiv.org/abs/2107.05585 摘要:研究了凸和非凸环境下的微分私有随机优化问题。对于凸情形,我们重点研究了非光滑广义线性损失(GLLs)族。我们的$\ellu 2$集的算法在近似线性时间内实现了最优的超额总体风险,而最著名的一般凸损失的微分私有算法在超线性时间内运行。我们的$\ellu 1$设置的算法具有接近最优的超额总体风险$\tilde{O}\big(\sqrt{\frac{\log{d}}{n}}}\big)$,并且避开了一般非光滑凸损失的[AFKT21]的维数依赖下界。在微分私有非凸环境下,我们提出了几种新的算法来逼近群体风险的平稳点。对于具有光滑损失和多面体约束的$\ellu1$-情形,我们给出了线性时间内的第一个近似维数无关率$\tilde O\big(\frac{\log^{2/3}{d}}{n^{1/3}}}\big)$。对于有约束的$\ellu 2$情形,在光滑损失下,我们得到了一个速率为$\tilde O\big(\frac{1}{n^{3/10}d^{1/10}}+\big(\frac{d}{n^2}\big)^{1/5}\big)$的线性时间算法。最后,对于$\ell\u2$情形,我们提供了第一种方法,用于{\em非光滑弱凸}随机优化,其速率为$\tilde O\big(\frac{1}{n^{1/4}}+\big(\frac{d}{n^2}\big)^{1/6}\big)$,当$d=O(\sqrt{n})$时,它与现有的最佳非私有算法相匹配。我们还将上述非凸$\ellu 2$设置的所有结果推广到$\ellu p$设置,其中$1摘要:We study differentially private stochastic optimization in convex and non-convex settings. For the convex case, we focus on the family of non-smooth generalized linear losses (GLLs). Our algorithm for the $\ell_2$ setting achieves optimal excess population risk in near-linear time, while the best known differentially private algorithms for general convex losses run in super-linear time. Our algorithm for the $\ell_1$ setting has nearly-optimal excess population risk $\tilde{O}\big(\sqrt{\frac{\log{d}}{n}}\big)$, and circumvents the dimension dependent lower bound of [AFKT21] for general non-smooth convex losses. In the differentially private non-convex setting, we provide several new algorithms for approximating stationary points of the population risk. For the $\ell_1$-case with smooth losses and polyhedral constraint, we provide the first nearly dimension independent rate, $\tilde O\big(\frac{\log^{2/3}{d}}{{n^{1/3}}}\big)$ in linear time. For the constrained $\ell_2$-case, with smooth losses, we obtain a linear-time algorithm with rate $\tilde O\big(\frac{1}{n^{3/10}d^{1/10}}+\big(\frac{d}{n^2}\big)^{1/5}\big)$. Finally, for the $\ell_2$-case we provide the first method for {\em non-smooth weakly convex} stochastic optimization with rate $\tilde O\big(\frac{1}{n^{1/4}}+\big(\frac{d}{n^2}\big)^{1/6}\big)$ which matches the best existing non-private algorithm when $d= O(\sqrt{n})$. We also extend all our results above for the non-convex $\ell_2$ setting to the $\ell_p$ setting, where $1 < p \leq 2$, with only polylogarithmic (in the dimension) overhead in the rates.

【26】 Forster Decomposition and Learning Halfspaces with Noise 标题:带噪声的半空间的Forster分解与学习

作者:Ilias Diakonikolas,Daniel M. Kane,Christos Tzamos 机构:University of Wisconsin-Madison, University of California, San Diego 链接:https://arxiv.org/abs/2107.05582 摘要:Forster变换是将分布转化为具有良好反集中特性的分布的操作。虽然Forster变换并不总是存在,但是我们证明了任何分布都可以有效地分解为几个分布的不相交混合,Forster变换存在并且可以有效地计算。作为这一结果的主要应用,我们得到了第一个多项式时间算法,用于Massart噪声模型中半空间的分布无关PAC学习,具有强多项式样本复杂度,即独立于样本的比特复杂度。以前的学习算法都是用位复杂度多项式来表示样本复杂度的,尽管这种依赖性在理论上是不必要的。 摘要:A Forster transform is an operation that turns a distribution into one with good anti-concentration properties. While a Forster transform does not always exist, we show that any distribution can be efficiently decomposed as a disjoint mixture of few distributions for which a Forster transform exists and can be computed efficiently. As the main application of this result, we obtain the first polynomial-time algorithm for distribution-independent PAC learning of halfspaces in the Massart noise model with strongly polynomial sample complexity, i.e., independent of the bit complexity of the examples. Previous algorithms for this learning problem incurred sample complexity scaling polynomially with the bit complexity, even though such a dependence is not information-theoretically necessary.

【27】 Unifying the effective reproduction number, incidence, and prevalence under a stochastic age-dependent branching process 标题:随机年龄相关分枝过程下有效繁殖数、发生率和流行率的统一

作者:Tresnia Berah,Thomas A. Mellan,Xenia Miscouridou,Swapnil Mishra,Kris V. Parag,Mikko S. Pakkanen,Samir Bhatt 机构: Department of Infectious Disease Epidemiology, Imperial College London, Department of Mathematics, Imperial College London, ∗ Joint Authorship, arXiv:,.,v, [q-bio.PE] , Jul 链接:https://arxiv.org/abs/2107.05579 摘要:更新方程是一种流行的方法,用于模拟暴发中新感染(发病率)的数量。一个统一的更新方程组,其中发病率,患病率和累积发病率都可以从同一随机过程中恢复,还没有尝试。在这里,我们从具有时变再生数的年龄相关分支过程导出一组更新方程。我们的新推导使用了测度理论的方法,并产生了一个完全独立的数学解释。我们发现,流行病学中常用的用于模拟发病率的更新方程是我们方程的一个等价特例。我们表明,我们的这些方程在某种意义上是内在一致的,即它们可以在流行率和发病率之间的共同反向计算方法下单独联系起来。我们引入了一种计算效率高的离散格式来求解这些更新方程,该算法依赖于行和和和元素乘法,具有很强的并行性。最后,我们用概率规划语言Stan给出了一个简单的模拟例子,在这个例子中,我们在一个单一的时变复制次数和生成间隔下联合拟合发病率和患病率。 摘要:Renewal equations are a popular approach used in modelling the number of new infections (incidence) in an outbreak. A unified set of renewal equations where incidence, prevalence and cumulative incidence can all be recovered from the same stochastic process has not been attempted. Here, we derive a set of renewal equations from an age-dependent branching process with a time-varying reproduction number. Our new derivation utilises a measure-theoretic approach and yields a fully self-contained mathematical exposition. We find that the renewal equations commonly used in epidemiology for modelling incidence are an equivalent special case of our equations. We show that these our equations are internally consistent in the sense that they can be separately linked under the common back calculation approach between prevalence and incidence. We introduce a computationally efficient discretisation scheme to solve these renewal equations, and this algorithm is highly parallelisable as it relies on row sums and elementwise multiplication. Finally we present a simple simulation example in the probabilistic programming language Stan where we jointly fit incidence and prevalence under a single time-varying reproduction number and generation interval.

【28】 Impact of Scene-Specific Enhancement Spectra on Matched Filter Greenhouse Gas Retrievals from Imaging Spectroscopy 标题:场景增强光谱对过滤成像光谱反演匹配温室气体的影响

作者:Markus D. Foote,Philip E. Dennison,Patrick R. Sullivan,Kelly B. O'Neill,Andrew K. Thorpe,David R. Thompson,Daniel H. Cusworth,Riley Duren,Sarang C. Joshi 机构:Scientific Computing and Imaging Institute, University of Utah, Salt Lake City, UT, USA, Department of Biomedical Engineering, University of Utah, Salt Lake City, UT, USA, Department of Geography, University of Utah, Salt Lake City, UT, USA 备注:14 pages, 5 figures, 3 tables 链接:https://arxiv.org/abs/2107.05578 摘要:匹配滤波(MF)技术已广泛应用于从成像光谱数据集中提取温室气体增强效应(enh)。虽然已经提出了多种算法技术和改进,但温室气体目标光谱用于浓度增强。自从引入定量MF检索以来,估计在很大程度上保持不变。回收的甲烷和二氧化碳的量增加,从而综合质量增加(IME)和估计的点源发射通量,在很大程度上依赖于这个目标光谱。目前标准使用分子吸收系数来创建单位enh。目标光谱不考虑温室气体背景浓度、太阳和传感器几何结构或大气水蒸气吸收的吸收。我们将几何参数和大气参数引入到场景特定单元enh的生成中。光谱提供与所有温室气体回收技术兼容的目标光谱。对于甲烷羽流,IME是使用标准、通用enh产生的。与SS-enh相比,光谱变化范围为-22%~+28.7%。光谱。由于光谱形状的差异,在通用和SS-enh之间。光谱、甲烷羽流的差异除了与几何和大气参数有关外,还与地表光谱特征有关。二氧化碳羽流的IME差异,通用enh。产生积分质量的光谱增强-与SS enh相比为76.1%至-48.1%。光谱。由这些积分enh计算的通量。在同等风力条件下,将以相同的%s变化。甲烷和二氧化碳对太阳天顶角和地面高度的变化最为敏感。SS目标光谱可以提高在不同几何和大气条件下收集的场景中温室气体反演和通量估计的可信度。 摘要:Matched filter (MF) techniques have been widely used for retrieval of greenhouse gas enhancements (enh.) from imaging spectroscopy datasets. While multiple algorithmic techniques and refinements have been proposed, the greenhouse gas target spectrum used for concentration enh. estimation has remained largely unaltered since the introduction of quantitative MF retrievals. The magnitude of retrieved methane and carbon dioxide enh., and thereby integrated mass enh. (IME) and estimated flux of point-source emitters, is heavily dependent on this target spectrum. Current standard use of molecular absorption coefficients to create unit enh. target spectra does not account for absorption by background concentrations of greenhouse gases, solar and sensor geometry, or atmospheric water vapor absorption. We introduce geometric and atmospheric parameters into the generation of scene-specific (SS) unit enh. spectra to provide target spectra that are compatible with all greenhouse gas retrieval MF techniques. For methane plumes, IME resulting from use of standard, generic enh. spectra varied from -22 to +28.7% compared to SS enh. spectra. Due to differences in spectral shape between the generic and SS enh. spectra, differences in methane plume IME were linked to surface spectral characteristics in addition to geometric and atmospheric parameters. IME differences for carbon dioxide plumes, with generic enh. spectra producing integrated mass enh. -76.1 to -48.1% compared to SS enh. spectra. Fluxes calculated from these integrated enh. would vary by the same %s, assuming equivalent wind conditions. Methane and carbon dioxide IME were most sensitive to changes in solar zenith angle and ground elevation. SS target spectra can improve confidence in greenhouse gas retrievals and flux estimates across collections of scenes with diverse geometric and atmospheric conditions.

【29】 Strong recovery of geometric planted matchings 标题:几何种植匹配的强恢复

作者:Dmitriy Kunisky,Jonathan Niles-Weed 机构:Department of Mathematics, Courant Institute of Mathematical Sciences, New York, University 备注:47 pages, 8 figures 链接:https://arxiv.org/abs/2107.05567 摘要:我们研究了$\mathbb{R}^d$中$n$点的未标记集合与这些点的小随机扰动之间的匹配的有效恢复问题。我们考虑一个模型,其中初始点是I.I.D标准高斯向量,通过加入具有方差$sigma ^ 2 $的I.D.高斯向量来扰动。在这种情况下,最大似然估计(MLE)可以在多项式时间内找到,作为线性分配问题的解决方案。我们在$\sigma^2$上为MLE建立阈值,以便在$d$不变和$d=d(n)$任意增长的情况下,完全恢复种植匹配(不产生错误)和强烈恢复种植匹配(产生$o(n)$错误)。在这两个阈值之间,我们证明了MLE对于(0,1)$中的显式$\delta\产生$n^{\delta+o(1)}$错误。这些结果扩展到几何设置最近的工作恢复匹配种植在随机图与独立加权边。我们的证明技术依赖于使用一阶矩和二阶矩方法仔细分析大型弱相依随机图中部分匹配的组合结构。 摘要:We study the problem of efficiently recovering the matching between an unlabelled collection of $n$ points in $\mathbb{R}^d$ and a small random perturbation of those points. We consider a model where the initial points are i.i.d. standard Gaussian vectors, perturbed by adding i.i.d. Gaussian vectors with variance $\sigma^2$. In this setting, the maximum likelihood estimator (MLE) can be found in polynomial time as the solution of a linear assignment problem. We establish thresholds on $\sigma^2$ for the MLE to perfectly recover the planted matching (making no errors) and to strongly recover the planted matching (making $o(n)$ errors) both for $d$ constant and $d = d(n)$ growing arbitrarily. Between these two thresholds, we show that the MLE makes $n^{\delta + o(1)}$ errors for an explicit $\delta \in (0, 1)$. These results extend to the geometric setting a recent line of work on recovering matchings planted in random graphs with independently-weighted edges. Our proof techniques rely on careful analysis of the combinatorial structure of partial matchings in large, weakly dependent random graphs using the first and second moment methods.

【30】 Inference on Individual Treatment Effects in Nonseparable Triangular Models 标题:不可分三角模型中个体处理效应的推断

作者:Jun Ma,Vadim Marmer,Zhengfei Yu 机构: Renmin University of China, China†Vancouver School of Economics, University of British Columbia, University of Tsukuba 链接:https://arxiv.org/abs/2107.05559 摘要:在具有二元内生治疗和二元工具变量的不可分离三角模型中,Vuong和Xu(2017)表明个体治疗效应(ITEs)是可识别的。Feng,Vuong和Xu(2019)表明,使用非参数估计的岩土作为观测值的核密度估计器对于岩土的密度是一致一致的。在本文中,我们建立了Feng,Vuong和Xu(2019)的密度估计的渐近正态性,并表明尽管其收敛速度较快,但ITEs的估计误差对密度估计的渐近分布有不可忽略的影响。我们提出了渐近有效的标准误差的密度估计ITE以及偏差校正。此外,我们利用非参数或刀切乘数自举临界值建立了ITE密度的统一置信带。我们的一致置信带在多项式误差率下渐近地具有正确的覆盖概率,可以用来推断ITE分布的形状。 摘要:In nonseparable triangular models with a binary endogenous treatment and a binary instrumental variable, Vuong and Xu (2017) show that the individual treatment effects (ITEs) are identifiable. Feng, Vuong and Xu (2019) show that a kernel density estimator that uses nonparametrically estimated ITEs as observations is uniformly consistent for the density of the ITE. In this paper, we establish the asymptotic normality of the density estimator of Feng, Vuong and Xu (2019) and show that despite their faster rate of convergence, ITEs' estimation errors have a non-negligible effect on the asymptotic distribution of the density estimator. We propose asymptotically valid standard errors for the density of the ITE that account for estimated ITEs as well as bias correction. Furthermore, we develop uniform confidence bands for the density of the ITE using nonparametric or jackknife multiplier bootstrap critical values. Our uniform confidence bands have correct coverage probabilities asymptotically with polynomial error rates and can be used for inference on the shape of the ITE's distribution.

【31】 Towards Better Laplacian Representation in Reinforcement Learning with Generalized Graph Drawing 标题:广义图画强化学习中更好的拉普拉斯表示

作者:Kaixin Wang,Kuangqi Zhou,Qixin Zhang,Jie Shao,Bryan Hooi,Jiashi Feng 机构: ex-Equal contribution 1National University of Singapore 2CityUniversity of Hong Kong 3ByteDance AI lab 备注:ICML 2021 链接:https://arxiv.org/abs/2107.05545 摘要:拉普拉斯表示法以状态转移图的拉普拉斯矩阵的特征向量作为状态嵌入,为状态提供了简洁、信息丰富的表示,近年来在强化学习中受到越来越多的关注。这样的表示捕获了底层状态空间的几何结构,并且有利于RL任务,例如选项发现和奖励成形。为了在大的(甚至是连续的)状态空间中逼近拉普拉斯表示,最近的工作提出最小化一个谱图绘制目标,然而它除了特征向量之外还有无穷多个全局极小值。因此,他们所学的拉普拉斯表示可能不同于基本真理。为了解决这个问题,我们将图的绘制目标转化为一个广义形式,并导出一个新的学习目标,证明了它的特征向量是唯一的全局极小值。它使学习高质量的拉普拉斯表示,忠实地接近地面真理。我们通过在一组gridworld和连续控制环境上的综合实验来验证这一点。此外,我们发现,我们学习拉普拉斯表示导致更多的探索性选择和更好的奖励塑造。 摘要:The Laplacian representation recently gains increasing attention for reinforcement learning as it provides succinct and informative representation for states, by taking the eigenvectors of the Laplacian matrix of the state-transition graph as state embeddings. Such representation captures the geometry of the underlying state space and is beneficial to RL tasks such as option discovery and reward shaping. To approximate the Laplacian representation in large (or even continuous) state spaces, recent works propose to minimize a spectral graph drawing objective, which however has infinitely many global minimizers other than the eigenvectors. As a result, their learned Laplacian representation may differ from the ground truth. To solve this problem, we reformulate the graph drawing objective into a generalized form and derive a new learning objective, which is proved to have eigenvectors as its unique global minimizer. It enables learning high-quality Laplacian representations that faithfully approximate the ground truth. We validate this via comprehensive experiments on a set of gridworld and continuous control environments. Moreover, we show that our learned Laplacian representations lead to more exploratory options and better reward shaping.

【32】 Rate-Exponent Region for a Class of Distributed Hypothesis Testing Against Conditional Independence Problems 标题:一类条件独立问题分布假设检验的率指数域

作者:Abdellatif Zaidi 机构:† Universit´e Paris-Est, Champs-sur-Marne , France, ∤ Mathematical and Algorithmic Sciences Lab., Paris Research Center, Huawei France 备注:Submitted for possible publication in the IEEE Transactions of Information Theory. arXiv admin note: substantial text overlap with arXiv:1904.03028, arXiv:1811.03933 链接:https://arxiv.org/abs/2107.05538 摘要:我们研究一类针对条件独立性问题的$K$-编码器假设检验。在第二类错误指数服从第一类错误率(常数)上界$\epsilon$的最小化准则下,我们刻画了离散无记忆和无记忆向量高斯设置下的编码率和指数集。对于DM设置,我们提供了一个相反的证明,并证明了它是使用Rahman和Wagner的量子化Bin测试方案实现的。对于无记忆向量高斯背景,我们利用debruijn恒等式和Fisher信息的性质建立了一个紧外界。结果表明,对于无记忆矢量高斯信源,采用高斯测试信道的量化Bin测试方案,速率指数区域被耗尽;并且没有由于限制传感器的编码器不采用分时而导致的性能损失。此外,我们还研究了源(不一定是高斯)具有有限差分熵,传感器在零假设下的观测噪声是高斯噪声的问题。对于这个模型,我们的主要结果是指数率函数的上界。该界反映了相应的显式下界,除了下界包含源功率(方差),而上界包含源熵功率。所建立的界限的一部分效用是用来研究渐近指数/速率和分布式检测引起的损失,作为传感器数量的函数。 摘要:We study a class of $K$-encoder hypothesis testing against conditional independence problems. Under the criterion that stipulates minimization of the Type II error exponent subject to a (constant) upper bound $\epsilon$ on the Type I error rate, we characterize the set of encoding rates and exponent for both discrete memoryless and memoryless vector Gaussian settings. For the DM setting, we provide a converse proof and show that it is achieved using the Quantize-Bin-Test scheme of Rahman and Wagner. For the memoryless vector Gaussian setting, we develop a tight outer bound by means of a technique that relies on the de Bruijn identity and the properties of Fisher information. In particular, the result shows that for memoryless vector Gaussian sources the rate-exponent region is exhausted using the Quantize-Bin-Test scheme with Gaussian test channels; and there is no loss in performance caused by restricting the sensors' encoders not to employ time sharing. Furthermore, we also study a variant of the problem in which the source, not necessarily Gaussian, has finite differential entropy and the sensors' observations noises under the null hypothesis are Gaussian. For this model, our main result is an upper bound on the exponent-rate function. The bound is shown to mirror a corresponding explicit lower bound, except that the lower bound involves the source power (variance) whereas the upper bound has the source entropy power. Part of the utility of the established bound is for investigating asymptotic exponent/rates and losses incurred by distributed detection as function of the number of sensors.

【33】 Recent advances in Bayesian optimization with applications to parameter reconstruction in optical nano-metrology 标题:贝叶斯优化及其在光学纳米计量参数重构中的应用研究进展

作者:Matthias Plock,Sven Burger,Philipp-Immanuel Schneider 机构:Zuse Institute Berlin, Takustraße , Berlin, Germany, JCMwave GmbH, Bolivarallee , Berlin, Germany 备注:None 链接:https://arxiv.org/abs/2107.05499 摘要:参数重构是光学纳米计量学中的一个常见问题。它通常涉及一组测量值,人们试图对测量过程的数值模型进行拟合。模型评估通常涉及求解麦克斯韦方程组,因此非常耗时。这使得重建的计算要求很高。有几种方法可以使模型与测量值相吻合。一方面,贝叶斯优化方法通过训练一个偏差平方和的机器学习模型,实现了昂贵的黑盒优化问题的高效重构。另一方面,曲线拟合算法,如Levenberg-Marquardt方法,考虑了所有模型输出和相应测量值之间的偏差,从而实现了快速的局部收敛。本文提出了一种结合这两种方法的贝叶斯目标向量优化方案。我们比较了该方法与标准的Levenberg-Marquardt算法、传统的贝叶斯优化方案、L-BFGS-B和Nelder-Mead单纯形算法的性能。作为纳米计量学问题的替身,我们采用了NIST标准参考数据库中的非线性最小二乘问题。我们发现,该方法通常使用较少的模型函数调用比任何竞争方案,以实现类似的重建性能。 摘要:Parameter reconstruction is a common problem in optical nano metrology. It generally involves a set of measurements, to which one attempts to fit a numerical model of the measurement process. The model evaluation typically involves to solve Maxwell's equations and is thus time consuming. This makes the reconstruction computationally demanding. Several methods exist for fitting the model to the measurements. On the one hand, Bayesian optimization methods for expensive black-box optimization enable an efficient reconstruction by training a machine learning model of the squared sum of deviations. On the other hand, curve fitting algorithms, such as the Levenberg-Marquardt method, take the deviations between all model outputs and corresponding measurement values into account which enables a fast local convergence. In this paper we present a Bayesian Target Vector Optimization scheme which combines these two approaches. We compare the performance of the presented method against a standard Levenberg-Marquardt-like algorithm, a conventional Bayesian optimization scheme, and the L-BFGS-B and Nelder-Mead simplex algorithms. As a stand-in for problems from nano metrology, we employ a non-linear least-square problem from the NIST Standard Reference Database. We find that the presented method generally uses fewer calls of the model function than any of the competing schemes to achieve similar reconstruction performance.

【34】 Prequential MDL for Causal Structure Learning with Neural Networks 标题:神经网络因果结构学习的先验MDL

作者:Jorg Bornschein,Silvia Chiappa,Alan Malek,Rosemary Nan Ke 机构:DeepMind, London 链接:https://arxiv.org/abs/2107.05481 摘要:从观测数据中学习贝叶斯网络的结构和因果关系是科学和技术领域的一个共同目标。我们证明了当使用柔性和超参数化的神经网络来模拟观测变量之间的条件概率分布时,序列最小描述长度原理(MDL)可以用来推导贝叶斯网络的实用评分函数。MDL代表了Occam剃刀的一个具体化,我们获得了合理的、简洁的图结构,而不依赖于稀疏诱导先验或其他必须调整的正则化。在经验上,我们证明了合成和真实世界数据的竞争结果。即使变量之间存在强非线性关系,得分也往往能恢复正确的结构;一种情况是先前的方法很难解决,通常会失败。此外,我们还讨论了当观察来自一个正在经历分布转移的来源时,前序得分如何与最近的工作相关,即从适应速度推断因果结构。 摘要:Learning the structure of Bayesian networks and causal relationships from observations is a common goal in several areas of science and technology. We show that the prequential minimum description length principle (MDL) can be used to derive a practical scoring function for Bayesian networks when flexible and overparametrized neural networks are used to model the conditional probability distributions between observed variables. MDL represents an embodiment of Occam's Razor and we obtain plausible and parsimonious graph structures without relying on sparsity inducing priors or other regularizers which must be tuned. Empirically we demonstrate competitive results on synthetic and real-world data. The score often recovers the correct structure even in the presence of strongly nonlinear relationships between variables; a scenario were prior approaches struggle and usually fail. Furthermore we discuss how the the prequential score relates to recent work that infers causal structure from the speed of adaptation when the observations come from a source undergoing distributional shift.

【35】 Source-Free Adaptation to Measurement Shift via Bottom-Up Feature Restoration 标题:通过自下而上的特征恢复实现测量漂移的无源自适应

作者:Cian Eastwood,Ian Mason,Christopher K. I. Williams,Bernhard Schölkopf 机构:† School of Informatics, University of Edinburgh, ‡ Alan Turing Institute, London, § Max Planck Institute for Intelligent Systems, Tübingen 链接:https://arxiv.org/abs/2107.05446 摘要:源自由域自适应(SFDA)的目的是在自适应过程中,在不访问源域数据的情况下,将源域中已标记数据训练的模型自适应到目标域中未标记的数据。现有的SFDA方法利用熵最小化技术:(i)只适用于分类(ii)破坏模型校准;并且(iii)依赖源模型在目标域中实现良好的特征空间类分离。我们针对一种特别普遍的领域转移(称为测量转移)来解决这些问题,其特征是测量系统的变化(例如传感器或照明的变化)。在源域中,我们存储了源数据下特征分布的轻量级和灵活的近似值。在目标域,我们采用特征抽取器,使得目标数据下的近似特征分布与源数据上的近似特征分布重新对齐。我们称这种方法为特征恢复(FR),因为它试图从目标域中提取与先前从源域中提取的语义相同的特征。我们还提出了自底向上的特征恢复(BUFR),这是一种自底向上的特征恢复训练方案,通过在网络的后一层保留学习到的结构来提高性能。通过实验,我们证明了BUFR在精度、校准和数据效率方面往往优于现有的SFDA方法,同时对源模型在目标域的性能依赖性较小。 摘要:Source-free domain adaptation (SFDA) aims to adapt a model trained on labelled data in a source domain to unlabelled data in a target domain without access to the source-domain data during adaptation. Existing methods for SFDA leverage entropy-minimization techniques which: (i) apply only to classification; (ii) destroy model calibration; and (iii) rely on the source model achieving a good level of feature-space class-separation in the target domain. We address these issues for a particularly pervasive type of domain shift called measurement shift, characterized by a change in measurement system (e.g. a change in sensor or lighting). In the source domain, we store a lightweight and flexible approximation of the feature distribution under the source data. In the target domain, we adapt the feature-extractor such that the approximate feature distribution under the target data realigns with that saved on the source. We call this method Feature Restoration (FR) as it seeks to extract features with the same semantics from the target domain as were previously extracted from the source. We additionally propose Bottom-Up Feature Restoration (BUFR), a bottom-up training scheme for FR which boosts performance by preserving learnt structure in the later layers of a network. Through experiments we demonstrate that BUFR often outperforms existing SFDA methods in terms of accuracy, calibration, and data efficiency, while being less reliant on the performance of the source model in the target domain.

【36】 Learning Expected Emphatic Traces for Deep RL 标题:了解深度RL的预期强调痕迹

作者:Ray Jiang,Shangtong Zhang,Veronica Chelu,Adam White,Hado van Hasselt 机构:DeepMind, London, UK, University of Oxford, Oxford, UK, McGill University, Montreal, QC, Canada, Edmonton, Canada 链接:https://arxiv.org/abs/2107.05405 摘要:非策略采样和经验回放是提高样本效率和无标度模型时态差分学习方法的关键。当与函数逼近(如神经网络)相结合时,这种结合被称为致命的三元组,并且具有潜在的不稳定性。近年来的研究表明,将强调加权和多步更新相结合,可以获得稳定和良好的规模性能。然而,这种方法通常仅限于对完整的轨迹进行采样,以计算所需的强调权重。在本文中,我们研究如何结合强调权重与非连续,离线数据采样从重放缓冲区。我们提出了一种可与重放相结合的多步强调加权算法,并提出了一种时间反转的$n$步TD学习算法来学习所需的强调加权。我们证明了这些状态权重与以前的方法相比减少了方差,同时提供了收敛性保证。我们在Atari 2600视频游戏上测试了该方法,并观察到新的X-ETD($n$)代理比基线代理有改进,突出了我们方法的可伸缩性和广泛的适用性。 摘要:Off-policy sampling and experience replay are key for improving sample efficiency and scaling model-free temporal difference learning methods. When combined with function approximation, such as neural networks, this combination is known as the deadly triad and is potentially unstable. Recently, it has been shown that stability and good performance at scale can be achieved by combining emphatic weightings and multi-step updates. This approach, however, is generally limited to sampling complete trajectories in order, to compute the required emphatic weighting. In this paper we investigate how to combine emphatic weightings with non-sequential, off-line data sampled from a replay buffer. We develop a multi-step emphatic weighting that can be combined with replay, and a time-reversed $n$-step TD learning algorithm to learn the required emphatic weighting. We show that these state weightings reduce variance compared with prior approaches, while providing convergence guarantees. We tested the approach at scale on Atari 2600 video games, and observed that the new X-ETD($n$) agent improved over baseline agents, highlighting both the scalability and broad applicability of our approach.

【37】 Nonparametric Regression with Shallow Overparameterized Neural Networks Trained by GD with Early Stopping 标题:GD提前停止训练浅层过参数神经网络的非参数回归

作者:Ilja Kuzborskij,Csaba Szepesvári 机构:DeepMind, London, Csaba Szepesv´ari, DeepMind, Canada and University of Alberta, Edmonton 备注:COLT 2021 链接:https://arxiv.org/abs/2107.05341 摘要:研究了梯度下降(GD)训练过参数化浅层神经网络学习Lipschitz回归函数的能力。为了避免在有噪声标签的情况下,训练到几乎零训练误差的神经网络在该类上不一致的问题,我们提出了一个允许我们显示最优速率的提前停止规则。这提供了Hu等人(2021年)研究$\ell 2$-正则化GD在非参数回归中训练浅层网络的性能的另一种结果,该方法完全依赖于无限宽网络(神经切线核(NTK))近似。在这里,我们提出了一个简单的分析,它是基于输入空间的划分参数(如1-最近邻规则的情况),再加上训练的神经网络在GD训练时对其输入是平滑的。在无噪声的情况下,证明不依赖于任何核化,可以看作是有限宽度的结果。在标签噪声的情况下,通过稍微修改校样,使用Yao、Rosasco和Caponnetto(2007)的技术来控制噪声。 摘要:We explore the ability of overparameterized shallow neural networks to learn Lipschitz regression functions with and without label noise when trained by Gradient Descent (GD). To avoid the problem that in the presence of noisy labels, neural networks trained to nearly zero training error are inconsistent on this class, we propose an early stopping rule that allows us to show optimal rates. This provides an alternative to the result of Hu et al. (2021) who studied the performance of $\ell 2$ -regularized GD for training shallow networks in nonparametric regression which fully relied on the infinite-width network (Neural Tangent Kernel (NTK)) approximation. Here we present a simpler analysis which is based on a partitioning argument of the input space (as in the case of 1-nearest-neighbor rule) coupled with the fact that trained neural networks are smooth with respect to their inputs when trained by GD. In the noise-free case the proof does not rely on any kernelization and can be regarded as a finite-width result. In the case of label noise, by slightly modifying the proof, the noise is controlled using a technique of Yao, Rosasco, and Caponnetto (2007).

【38】 Structured Directional Pruning via Perturbation Orthogonal Projection 标题:基于扰动正交投影的结构化定向剪枝

作者:YinchuanLi,XiaofengLiu,YunfengShao,QingWang,YanhuiGeng 机构:Yunfeng Shao, Huawei Noah’s Ark Lab, Qing Wang†, Tianjin University, Yanhui Geng 链接:https://arxiv.org/abs/2107.05328 摘要:结构化剪枝是减少神经网络计算量的一种有效的压缩技术,它通常通过增加扰动来减少网络参数,而代价是略微增加训练损失。一种更合理的方法是沿着优化器找到的平坦最小谷找到一个稀疏的极小值,即随机梯度下降法,它使训练损失保持不变。为了实现这一目标,我们提出了基于正交投影的结构化方向剪枝方法。我们还提出了一种快速的解算器sDprun,并进一步证明了它在充分训练后渐近地实现了方向剪枝。在CIFAR-10和CIFAR-100数据集上使用VGG-Net和ResNet进行的实验表明,该方法在不进行再训练的情况下获得了最先进的修剪精度(VGG16和CIFAR-10任务的修剪精度为93.97%)。使用DNN、VGG-Net和WRN28X10在MNIST、CIFAR-10和CIFAR-100数据集上进行的实验表明,该方法能够进行结构化的定向剪枝,达到与优化器相同的最小谷值。 摘要:Structured pruning is an effective compression technique to reduce the computation of neural networks, which is usually achieved by adding perturbations to reduce network parameters at the cost of slightly increasing training loss. A more reasonable approach is to find a sparse minimizer along the flat minimum valley found by optimizers, i.e. stochastic gradient descent, which keeps the training loss constant. To achieve this goal, we propose the structured directional pruning based on orthogonal projecting the perturbations onto the flat minimum valley. We also propose a fast solver sDprun and further prove that it achieves directional pruning asymptotically after sufficient training. Experiments using VGG-Net and ResNet on CIFAR-10 and CIFAR-100 datasets show that our method obtains the state-of-the-art pruned accuracy (i.e. 93.97% on VGG16, CIFAR-10 task) without retraining. Experiments using DNN, VGG-Net and WRN28X10 on MNIST, CIFAR-10 and CIFAR-100 datasets demonstrate our method performs structured directional pruning, reaching the same minimum valley as the optimizer.

【39】 Learning interaction rules from multi-animal trajectories via augmented behavioral models 标题:通过增广行为模型从多动物轨迹中学习交互规则

作者:Keisuke Fujii,Naoya Takeishi,Kazushi Tsutsui,Emyo Fujioka,Nozomi Nishiumi,Ryooya Tanaka,Mika Fukushiro,Kaoru Ide,Hiroyoshi Kohno,Ken Yoda,Susumu Takahashi,Shizuko Hiryu,Yoshinobu Kawahara 机构:Nagoya University, RIKEN Center for Advanced Intelligence Project, JST PRESTO, University of Applied Sciences, and Arts Western Switzerland, Doshisha University, National Institute, for Basic Biology, Tokai University, Kyushu University 备注:22 pages, 4 figures 链接:https://arxiv.org/abs/2107.05326 摘要:从运动序列中提取生物制剂的相互作用规律在各个领域都是一个挑战。格兰杰因果关系是一个实用的框架,分析相互作用的观测时间序列数据;然而,这一框架忽视了动物行为中的生成过程结构,这可能导致解释问题,有时还可能导致对因果关系的错误评估。在本文中,我们提出了一个新的框架,学习格兰杰因果关系从多动物的轨迹,通过增强理论为基础的行为模型与解释数据驱动模型。我们采用神经网络来扩充时变动态系统所描述的不完全多智能体行为模型。为了有效和可解释的学习,我们的模型利用了分离导航和运动过程的基于理论的架构,以及用于可靠行为建模的理论指导的正则化。这可以提供随时间变化的格兰杰因果效应的可解释迹象,即当特定的其他因素导致接近或分离时。在使用合成数据集的实验中,我们的方法取得了比各种基线更好的性能。然后,我们分析了小鼠、苍蝇、鸟类和蝙蝠的多种动物数据集,验证了我们的方法并获得了新的生物学见解。 摘要:Extracting the interaction rules of biological agents from moving sequences pose challenges in various domains. Granger causality is a practical framework for analyzing the interactions from observed time-series data; however, this framework ignores the structures of the generative process in animal behaviors, which may lead to interpretational problems and sometimes erroneous assessments of causality. In this paper, we propose a new framework for learning Granger causality from multi-animal trajectories via augmented theory-based behavioral models with interpretable data-driven models. We adopt an approach for augmenting incomplete multi-agent behavioral models described by time-varying dynamical systems with neural networks. For efficient and interpretable learning, our model leverages theory-based architectures separating navigation and motion processes, and the theory-guided regularization for reliable behavioral modeling. This can provide interpretable signs of Granger-causal effects over time, i.e., when specific others cause the approach or separation. In experiments using synthetic datasets, our method achieved better performance than various baselines. We then analyzed multi-animal datasets of mice, flies, birds, and bats, which verified our method and obtained novel biological insights.

【40】 Continuous Time Bandits With Sampling Costs 标题:具有抽样费用的连续时间带

作者:Rahul Vaze,Manjesh K. Hanawal 机构:School of Technology and Computer Science, Tata Institute of Fundamental Research, Mumbai, Maharastra , India, Industrial Engineering and Operations Research, Indian Institute of Technology, Bombay, Mumbai, Maharashtra , India 链接:https://arxiv.org/abs/2107.05289 摘要:我们考虑一个连续时间的多臂强盗问题(CTMAB),其中学习者可以在给定的间隔中采样任意次数的臂,并从每个样本获得随机回报,然而,增加采样频率会导致附加惩罚/成本。因此,作为采样频率的函数,在获得大量报酬和产生采样成本之间存在权衡。其目标是设计一个最小化后悔的学习算法,即oracle策略的收益与学习算法的收益之差。CTMAB与通常的多臂bandit问题(MAB)有本质的不同,例如,在CTMAB中,即使是单臂情况也是不平凡的,因为最佳采样频率取决于需要估计的臂的平均值。我们首先建立任何算法可达到的遗憾的下界,然后提出达到对数因子下界的算法。对于单臂情形,我们证明了遗憾的下界是$\Omega((\logt)^2/\mu)$,其中$\mu$是臂的平均值,$T$是时间范围。对于多臂情形,我们证明了遗憾的下界是$\Omega((\logt)^2\mu/\Delta^2)$,其中$\mu$现在表示最佳臂的平均值,$\Delta$是最佳臂和次最佳臂的平均值之差。然后,我们提出一个算法,实现绑定到常数项。 摘要:We consider a continuous-time multi-arm bandit problem (CTMAB), where the learner can sample arms any number of times in a given interval and obtain a random reward from each sample, however, increasing the frequency of sampling incurs an additive penalty/cost. Thus, there is a tradeoff between obtaining large reward and incurring sampling cost as a function of the sampling frequency. The goal is to design a learning algorithm that minimizes regret, that is defined as the difference of the payoff of the oracle policy and that of the learning algorithm. CTMAB is fundamentally different than the usual multi-arm bandit problem (MAB), e.g., even the single-arm case is non-trivial in CTMAB, since the optimal sampling frequency depends on the mean of the arm, which needs to be estimated. We first establish lower bounds on the regret achievable with any algorithm and then propose algorithms that achieve the lower bound up to logarithmic factors. For the single-arm case, we show that the lower bound on the regret is $\Omega((\log T)^2/\mu)$, where $\mu$ is the mean of the arm, and $T$ is the time horizon. For the multiple arms case, we show that the lower bound on the regret is $\Omega((\log T)^2 \mu/\Delta^2)$, where $\mu$ now represents the mean of the best arm, and $\Delta$ is the difference of the mean of the best and the second-best arm. We then propose an algorithm that achieves the bound up to constant terms.

【41】 Prb-GAN: A Probabilistic Framework for GAN Modelling 标题:PRB-GAN:GaN建模的概率框架

作者:Blessen George,Vinod K. Kurmi,Vinay P. Namboodiri 机构:Indian Institute of Technology Kanpur, Kanpur, India 链接:https://arxiv.org/abs/2107.05241 摘要:生成性对抗网络(Generative敌对网络,GANs)是一种非常流行的生成真实感图像的网络,但它往往存在训练不稳定和模式丢失的问题。为了在GAN合成数据中获得更大的多样性,解决模丢失问题至关重要。我们的工作探讨了概率方法的GAN建模,可以让我们解决这些问题。我们提出了一种新的变分算法Prb-GANs,它利用dropout在网络参数上建立一个分布,并通过变分推理学习后验参数。我们用简单和复杂的数据集从理论上描述和实验上验证了这种方法的好处。我们使用不确定性度量的概念来研究进一步的改进。通过对GAN各网络的损耗函数进行进一步的修正,我们可以得到GAN性能改善的结果。我们的方法非常简单,只需对现有GAN结构进行很少的修改。 摘要:Generative adversarial networks (GANs) are very popular to generate realistic images, but they often suffer from the training instability issues and the phenomenon of mode loss. In order to attain greater diversity in GAN synthesized data, it is critical to solving the problem of mode loss. Our work explores probabilistic approaches to GAN modelling that could allow us to tackle these issues. We present Prb-GANs, a new variation that uses dropout to create a distribution over the network parameters with the posterior learnt using variational inference. We describe theoretically and validate experimentally using simple and complex datasets the benefits of such an approach. We look into further improvements using the concept of uncertainty measures. Through a set of further modifications to the loss functions for each network of the GAN, we are able to get results that show the improvement of GAN performance. Our methods are extremely simple and require very little modification to existing GAN architecture.

【42】 Fock State-enhanced Expressivity of Quantum Machine Learning Models 标题:Fock态增强量子机学习模型的表现力

作者:Beng Yee Gan,Daniel Leykam,Dimitris G. Angelakis 机构:Centre for Quantum Technologies, National University of Singapore, Science Drive , Singapore , School of Electrical and Computer Engineering, Technical University of Crete, Chania, Greece , ) 备注:12 pages, 6 figures 链接:https://arxiv.org/abs/2107.05224 摘要:数据嵌入过程是量子机器学习的瓶颈之一,可能会抵消任何量子加速。鉴于此,需要更有效的数据编码策略。我们提出了一种基于光子的玻色子数据编码方案,该方案使用较少的编码层嵌入经典数据点,并通过将数据点映射到高维Fock空间来避免对非线性光学元件的需要。电路的表现力可以通过输入光子的数量来控制。我们的工作揭示了量子光子学在量子机器学习模型表达能力方面的独特优势。利用光子数相关的表达能力,我们提出了三种不同的噪声中尺度量子兼容二值分类方法,它们具有不同的所需资源比例,适用于不同的监督分类任务。 摘要:The data-embedding process is one of the bottlenecks of quantum machine learning, potentially negating any quantum speedups. In light of this, more effective data-encoding strategies are necessary. We propose a photonic-based bosonic data-encoding scheme that embeds classical data points using fewer encoding layers and circumventing the need for nonlinear optical components by mapping the data points into the high-dimensional Fock space. The expressive power of the circuit can be controlled via the number of input photons. Our work shed some light on the unique advantages offers by quantum photonics on the expressive power of quantum machine learning models. By leveraging the photon-number dependent expressive power, we propose three different noisy intermediate-scale quantum-compatible binary classification methods with different scaling of required resources suitable for different supervised classification tasks.

【43】 Stateful Detection of Model Extraction Attacks 标题:模型提取攻击的状态检测

作者:Soham Pal,Yash Gupta,Aditya Kanade,Shirish Shevade 机构:Indian Institute of Science, Bangalore, nference 链接:https://arxiv.org/abs/2107.05166 摘要:机器学习即服务提供商通过应用程序编程接口(API)向开发人员公开机器学习(ML)模型。最近的工作表明,攻击者可以利用这些API,通过使用自己选择的样本查询这些ML模型,从而提取这些模型的良好近似值。我们提出VarDetect,一个有状态的监视器,它可以跟踪这样一个服务的用户所做的查询的分布,来检测模型提取攻击。VarDetect利用改进的变分自动编码器学习到的潜在分布,将三种类型的攻击者样本从良性样本中鲁棒地分离出来,并成功地为每种类型发出警报。此外,由于VarDetect被部署为一种自动防御机制,提取的替代模型显示出预期的较差性能和可转移性。最后,我们证明了即使是预先知道VarDetect部署的自适应攻击者,也能被它检测到。 摘要:Machine-Learning-as-a-Service providers expose machine learning (ML) models through application programming interfaces (APIs) to developers. Recent work has shown that attackers can exploit these APIs to extract good approximations of such ML models, by querying them with samples of their choosing. We propose VarDetect, a stateful monitor that tracks the distribution of queries made by users of such a service, to detect model extraction attacks. Harnessing the latent distributions learned by a modified variational autoencoder, VarDetect robustly separates three types of attacker samples from benign samples, and successfully raises an alarm for each. Further, with VarDetect deployed as an automated defense mechanism, the extracted substitute models are found to exhibit poor performance and transferability, as intended. Finally, we demonstrate that even adaptive attackers with prior knowledge of the deployment of VarDetect, are detected by it.

【44】 Dual Training of Energy-Based Models with Overparametrized Shallow Neural Networks 标题:基于能量模型的超参数化浅层神经网络的双重训练

作者:Carles Domingo-Enrich,Alberto Bietti,Marylou Gabrié,Joan Bruna,Eric Vanden-Eijnden 机构:Vanden-Eijndena, Courant Institute of Mathematical Sciences, New York University, Center for Data Science, New York University, Center for Computational Mathematics, Flatiron Institute 链接:https://arxiv.org/abs/2107.05134 摘要:基于能量的模型(EBMs)是一种生成模型,通常通过极大似然估计进行训练。由于需要对与能量相关的吉布斯分布进行采样,这种方法在训练能量非凸的一般情况下具有挑战性。利用一般的Fenchel对偶结果,我们导出了在主动(又称特征学习)和惰性状态下,具有浅超参数化神经网络能量的对偶到极大似然EBMs的变分原理。在主动状态下,这种对偶形式导致了一种训练算法,其中一个同时更新样本空间中的粒子和能量参数空间中的神经元。我们还考虑了该算法的一种变体,其中粒子有时在从数据集抽取的随机样本中重新启动,并且表明在每次迭代步骤中执行这些重启对应于得分匹配训练。在我们的对偶算法中使用中间参数设置,从而提供了一种在最大似然和分数匹配训练之间插值的方法。这些结果在简单的数值实验中得到了说明。 摘要:Energy-based models (EBMs) are generative models that are usually trained via maximum likelihood estimation. This approach becomes challenging in generic situations where the trained energy is nonconvex, due to the need to sample the Gibbs distribution associated with this energy. Using general Fenchel duality results, we derive variational principles dual to maximum likelihood EBMs with shallow overparametrized neural network energies, both in the active (aka feature-learning) and lazy regimes. In the active regime, this dual formulation leads to a training algorithm in which one updates concurrently the particles in the sample space and the neurons in the parameter space of the energy. We also consider a variant of this algorithm in which the particles are sometimes restarted at random samples drawn from the data set, and show that performing these restarts at every iteration step corresponds to score matching training. Using intermediate parameter setups in our dual algorithm thereby gives a way to interpolate between maximum likelihood and score matching training. These results are illustrated in simple numerical experiments.

【45】 Jaynes & Shannon's Constrained Ignorance and Surprise 标题:Jaynes&Shannon的狭隘无知和惊讶

作者:Thomas Cailleteau 机构:Sant Job Skolaj-Lise, Kerguestenen Straed, BroAnOriant, Breizh∗, ) 链接:https://arxiv.org/abs/2107.05008 摘要:在这篇简单的文章中,结合在理论和应用物理学中可能的应用,我们提出了一种从纯变分方法出发,利用约束导出香农熵表达式的新方法。基于埃德温T。Jaynes,我们的结果并不是全新的,但是它们产生的背景可能会导致一个非常一致的形式主义,最大熵原理自然出现。在给出了“无知”的一般定义之后,我们用两种方法导出了熵的一般期望表达式。在第一种情况下,一个是有偏见的,有一个模糊的概念形状的熵函数。在第二,我们考虑一般情况下,没有什么是先验已知的。比较了两种思维方式的优点。 摘要:In this simple article, with possible applications in theoretical and applied physics, we suggest an original way to derive the expression of Shannon's entropy from a purely variational approach,using constraints. Based on the work of Edwin T. Jaynes, our results are not fundamentally new but the context in which they are derived might, however, lead to a remarkably consistent formalism,where the maximum entropy principle appears naturally. After having given a general definition of "ignorance" in this framework, we derive the somehow general expected expression for the entropy using two approaches. In the first, one is biased and has a vague idea of the shape of the entropy function. In the second, we consider the general case, where nothing is a priori known. The merits of both ways of thinking are compared.

【46】 Inference for the proportional odds cumulative logit model with monotonicity constraints for ordinal predictors and ordinal response 标题:序数预报器和序数响应单调约束的比例优势累积Logit模型的推断

作者:Javier Espinosa,Christian Hennig 机构:University of Santiago of Chile (USACH), Economics, Economics Department, Chile., Universita di Bologna, Department of Statistical Sciences “Paolo Fortunati”, Italy. 链接:https://arxiv.org/abs/2107.04946 摘要:比例优势累积logit模型(POCLM)是顺序反应的标准回归模型。通过对相应参数的单调性约束,可以结合预测因子的有序性。结果表明,对于无约束模型和约束模型参数空间内参数集的参数,由最优化定义的估计,如极大似然估计,是渐近等价的。这是用来推导渐近置信域和测试的约束模型,涉及到简单的修改有限样本。通过仿真研究了置信域的有限样本覆盖概率。测试关注单个变量的影响、单调性和指定的单调性方向。该方法适用于与学校绩效评估相关的真实数据。 摘要:The proportional odds cumulative logit model (POCLM) is a standard regression model for an ordinal response. Ordinality of predictors can be incorporated by monotonicity constraints for the corresponding parameters. It is shown that estimators defined by optimization, such as maximum likelihood estimators, for an unconstrained model and for parameters in the interior set of the parameter space of a constrained model are asymptotically equivalent. This is used in order to derive asymptotic confidence regions and tests for the constrained model, involving simple modifications for finite samples. The finite sample coverage probability of the confidence regions is investigated by simulation. Tests concern the effect of individual variables, monotonicity, and a specified monotonicity direction. The methodology is applied on real data related to the assessment of school performance.

【47】 Kernel Mean Estimation by Marginalized Corrupted Distributions 标题:边缘污染分布的核均值估计

作者:Xiaobo Xia,Shuo Shan,Mingming Gong,Nannan Wang,Fei Gao,Haikun Wei,Tongliang Liu 机构:The University of Sydney; ,Southeast University;, The University of Melbourne; ,Xidian University;, Hangzhou Dianzi University 链接:https://arxiv.org/abs/2107.04855 摘要:估计再生核Hilbert空间中的核均值是许多核学习算法的关键。给定一个有限样本,目标核均值的标准估计是经验均值。以往的研究表明,收缩方法可以构造出更好的估计量。在这项工作中,我们提出了一种新的核均值估计方法,称为边缘化核均值估计方法,它在已知分布的噪声下估计核均值。理论上,我们证明了边缘化核均值估计在核均值估计中引入了隐正则化。实验结果表明,边缘化核均值估计得到的估计误差比现有估计要小得多。 摘要:Estimating the kernel mean in a reproducing kernel Hilbert space is a critical component in many kernel learning algorithms. Given a finite sample, the standard estimate of the target kernel mean is the empirical average. Previous works have shown that better estimators can be constructed by shrinkage methods. In this work, we propose to corrupt data examples with noise from known distributions and present a new kernel mean estimator, called the marginalized kernel mean estimator, which estimates kernel mean under the corrupted distribution. Theoretically, we show that the marginalized kernel mean estimator introduces implicit regularization in kernel mean estimation. Empirically, we show on a variety of datasets that the marginalized kernel mean estimator obtains much lower estimation error than the existing estimators.

【48】 L2M: Practical posterior Laplace approximation with optimization-driven second moment estimation 标题:L2M:优化驱动二阶矩估计的实用后验拉普拉斯逼近

作者:Christian S. Perone,Roberto Pereira Silveira,Thomas Paula 备注:6 pages, 1 figure, accepted for ICML 2021 UDL Workshop 链接:https://arxiv.org/abs/2107.04695 摘要:深度神经网络的不确定性量化是近年来发展起来的。在这项工作中,我们重温拉普拉斯近似,一个经典的后验近似方法,是计算吸引力。然而,我们不需要计算曲率矩阵,而是证明了在某些正则性条件下,利用梯度二阶矩可以很容易地构造拉普拉斯近似。这个数量已经由Adagrad的许多指数移动平均变量(如Adam和RMSprop)估计,但是传统上在训练之后被丢弃。我们证明了我们的方法(L2M)不需要改变模型或优化,只需几行代码就可以得到合理的结果,并且除了优化器已经计算的内容外,它不需要任何额外的计算步骤,而不引入任何新的超参数。我们希望我们的方法能为深层神经网络中利用优化器已经计算出的量进行不确定性估计开辟新的研究方向。 摘要:Uncertainty quantification for deep neural networks has recently evolved through many techniques. In this work, we revisit Laplace approximation, a classical approach for posterior approximation that is computationally attractive. However, instead of computing the curvature matrix, we show that, under some regularity conditions, the Laplace approximation can be easily constructed using the gradient second moment. This quantity is already estimated by many exponential moving average variants of Adagrad such as Adam and RMSprop, but is traditionally discarded after training. We show that our method (L2M) does not require changes in models or optimization, can be implemented in a few lines of code to yield reasonable results, and it does not require any extra computational steps besides what is already being computed by optimizers, without introducing any new hyperparameter. We hope our method can open new research directions on using quantities already computed by optimizers for uncertainty estimation in deep neural networks.

【49】 Hölder Bounds for Sensitivity Analysis in Causal Reasoning 标题:因果推理中灵敏度分析的Hölder界

作者:Serge Assaad,Shuxi Zeng,Henry Pfister,Fan Li,Lawrence Carin 机构: and make as- 1Department of Electrical & Computer Engineering, DukeUniversity 2Department of Statistical Science, Duke University 备注:Workshop on the Neglected Assumptions in Causal Inference at the International Conference on Machine Learning (ICML), 2021 链接:https://arxiv.org/abs/2107.04661 摘要:考虑到存在未观察到的混杂因素U,我们检验了治疗T对结果Y影响的区间估计。利用H“older不等式,我们根据不可测的混杂程度(即U->T的强度和U->Y的强度),导出了一组关于混杂偏倚E[Y | T=T]-E[Y | do(T=T)]的界。当U独立于T或当U独立于Y给定T时(当没有未观察到的混淆时),这些界限是紧的。我们关注这个界的一个特例,它取决于分布p(U)和p(U | T=T)之间的总变化距离,以及条件期望结果E[Y | U=U,T=T]与平均期望结果E[Y | T=T]的最大偏差(U的所有可能值)。我们讨论了可能的校准策略,以获得治疗效果的区间估计,并使用合成和半合成数据集对界限进行了实验验证。 摘要:We examine interval estimation of the effect of a treatment T on an outcome Y given the existence of an unobserved confounder U. Using H\"older's inequality, we derive a set of bounds on the confounding bias |E[Y|T=t]-E[Y|do(T=t)]| based on the degree of unmeasured confounding (i.e., the strength of the connection U->T, and the strength of U->Y). These bounds are tight either when U is independent of T or when U is independent of Y given T (when there is no unobserved confounding). We focus on a special case of this bound depending on the total variation distance between the distributions p(U) and p(U|T=t), as well as the maximum (over all possible values of U) deviation of the conditional expected outcome E[Y|U=u,T=t] from the average expected outcome E[Y|T=t]. We discuss possible calibration strategies for this bound to get interval estimates for treatment effects, and experimentally validate the bound using synthetic and semi-synthetic datasets.

【50】 The Effects of Invertibility on the Representational Complexity of Encoders in Variational Autoencoders 标题:可逆性对变分自动编码器编码器表征复杂度的影响

作者:Divyansh Pareek,Andrej Risteski 机构:Machine Learning Department, Carnegie Mellon University 备注:34 pages 链接:https://arxiv.org/abs/2107.04652 摘要:训练和使用基于现代神经网络的潜变量生成模型(如变分自动编码器)通常需要同时训练一个生成方向和一个推理(编码)方向,该方向近似于潜变量的后验分布。因此,问题出现了:为了能够准确地模拟给定生成模型的后验分布,推理模型需要有多复杂?在本文中,我们确定了一个重要性质的生成地图影响所需的编码器的大小。我们证明,如果生成映射是“强可逆的”(在某种意义上,我们适当地形式化),那么推理模型不需要复杂得多。相反,我们证明了存在不可逆生成映射,对于不可逆生成映射,编码方向需要指数地大(在计算复杂性的标准假设下)。重要的是,我们不要求生成模型是分层可逆的,这是许多相关文献假设的,并且在实践中使用的许多架构(例如卷积和基于池的网络)都不满足。因此,我们为经验智慧提供了理论支持,即当数据位于低维流形上时,学习深层生成模型更困难。 摘要:Training and using modern neural-network based latent-variable generative models (like Variational Autoencoders) often require simultaneously training a generative direction along with an inferential(encoding) direction, which approximates the posterior distribution over the latent variables. Thus, the question arises: how complex does the inferential model need to be, in order to be able to accurately model the posterior distribution of a given generative model? In this paper, we identify an important property of the generative map impacting the required size of the encoder. We show that if the generative map is "strongly invertible" (in a sense we suitably formalize), the inferential model need not be much more complex. Conversely, we prove that there exist non-invertible generative maps, for which the encoding direction needs to be exponentially larger (under standard assumptions in computational complexity). Importantly, we do not require the generative model to be layerwise invertible, which a lot of the related literature assumes and isn't satisfied by many architectures used in practice (e.g. convolution and pooling based networks). Thus, we provide theoretical support for the empirical wisdom that learning deep generative models is harder when data lies on a low-dimensional manifold.

【51】 Accuracy on the Line: On the Strong Correlation Between Out-of-Distribution and In-Distribution Generalization 标题:线上的精确度:论分布外与分布内综合的强相关性

作者:John Miller,Rohan Taori,Aditi Raghunathan,Shiori Sagawa,Pang Wei Koh,Vaishaal Shankar,Percy Liang,Yair Carmon,Ludwig Schmidt 机构:edu‡Tel Aviv University 链接:https://arxiv.org/abs/2107.04649 摘要:为了使机器学习系统可靠,我们必须了解它们在看不见的分布环境中的性能。在本文中,我们实证表明,在广泛的模型和分布转移中,分布外绩效与分布内绩效密切相关。具体来说,我们展示了CIFAR-10和ImageNet变体的分布内和分布外性能之间的强相关性,这是一项从YCB对象派生的合成姿态估计任务,FMoW荒野中的卫星图像分类和iWildCam荒野中的野生动物分类。模型结构、超参数、训练集大小和训练持续时间之间存在很强的相关性,并且比现有领域适应理论所期望的更精确。为了完成这幅图,我们还调查了相关性较弱的情况,例如CIFAR-10-C和camelon17野生动物组织分类数据集的一些合成分布转移。最后,我们提供了一个基于高斯数据模型的候选理论,说明了分布偏移引起的数据协方差变化如何影响观测到的相关性。 摘要:For machine learning systems to be reliable, we must understand their performance in unseen, out-of-distribution environments. In this paper, we empirically show that out-of-distribution performance is strongly correlated with in-distribution performance for a wide range of models and distribution shifts. Specifically, we demonstrate strong correlations between in-distribution and out-of-distribution performance on variants of CIFAR-10 & ImageNet, a synthetic pose estimation task derived from YCB objects, satellite imagery classification in FMoW-WILDS, and wildlife classification in iWildCam-WILDS. The strong correlations hold across model architectures, hyperparameters, training set size, and training duration, and are more precise than what is expected from existing domain adaptation theory. To complete the picture, we also investigate cases where the correlation is weaker, for instance some synthetic distribution shifts from CIFAR-10-C and the tissue classification dataset Camelyon17-WILDS. Finally, we provide a candidate theory based on a Gaussian data model that shows how changes in the data covariance arising from distribution shift can affect the observed correlations.

【52】 Global sensitivity analysis of (a)symmetric energy harvesters 标题:(A)对称集能器的全局灵敏度分析

作者:João Pedro Norenberg,Americo Cunha Jr,Samuel da Silva,Paulo Sérgio Varoto 机构:Paulo S´ergio Varoto, Received: date Accepted: date 链接:https://arxiv.org/abs/2107.04647 摘要:在实际的能量采集器中,参数变化是不可避免的,并且可以定义系统性能的关键方面,特别是在易受小扰动影响的系统中。通过这种方法,本工作旨在识别(a)具有非线性压电耦合的对称双稳态能量采集器动力学中的最关键参数,同时考虑其物理参数和激励参数的变化。为此,基于Sobol指数的全局灵敏度分析通过条件方差的正交分解来获得与收割机参数有关的恢复功率的依赖性。这项技术量化了与每个参数有关的方差,以及与模型总方差有关的方差。结果表明,激励频率和振幅、非对称偏压角和电场中的压电耦合是影响平均功率的主要因素。研究还表明,在稳定条件下,参数的重要性顺序可以改变。因此,对所分析的系统有了更深入的了解,识别了控制动态行为变化的重要参数,为非线性收获机的鲁棒设计和预测提供了有力的工具。 摘要:Parametric variability is inevitable in actual energy harvesters and can define crucial aspects of the system performance, especially in susceptible systems to small perturbations. In this way, this work aims to identify the most critical parameters in the dynamics of (a)symmetric bistable energy harvesters with nonlinear piezoelectric coupling, considering the variability of their physical and excitation parameters. For this purpose, a global sensitivity analysis based on the Sobol' indices is performed by an orthogonal decomposition in terms of conditional variances to access the dependence of the recovered power concerning the harvester parameters. This technique quantifies the variance concerning each parameter individually and jointly regarding the total variation of the model. The results indicate that the frequency and amplitude of excitation, asymmetric bias angle, and piezoelectric coupling at the electrical domain are the most influential parameters that affect the mean power harvested. It has also been shown that the order of importance of the parameters can change from stable conditions. In possession of this, a better understanding of the system under analysis is obtained, identifying vital parameters that rule the change of dynamic behavior and constituting a powerful tool in the robust design and prediction of nonlinear harvesters.

【53】 Algorithmic Causal Effect Identification with causaleffect 标题:考虑因果关系的算法因果关系识别

作者:Martí Pedemonte,Jordi Vitrià,Álvaro Parafita 机构:Universitat de Barcelona, Department of Mathematics and Computer Science 备注:40 pages, 27 figures 链接:https://arxiv.org/abs/2107.04632 摘要:当我们理解因果关系时,我们作为一个物种的进化向前迈出了一大步。对于某些事件,这些关联可能微不足道,但它们并不在复杂的场景中。为了严格证明某些事件是由其他事件引起的,引入了$do$-算子及其相关规则,将因果理论和因果推理形式化。本报告的主要目标是回顾并在Python中实现一些从观测数据计算条件和非条件因果查询的算法。为此,我们首先介绍了概率论和图论的一些基本背景知识,然后介绍了用于构建算法的因果理论的重要结果。然后,我们深入研究了Shpitser和Pearl在2006年提出的识别算法,并解释了我们在Python中的实现。主识别算法可以看作是$do$-演算规则的重复应用,它最终或者从实验概率返回因果查询的表达式,或者不能识别因果效应,在这种情况下,因果效应是不可识别的。我们将介绍我们新开发的Python库并给出一些使用示例。 摘要:Our evolution as a species made a huge step forward when we understood the relationships between causes and effects. These associations may be trivial for some events, but they are not in complex scenarios. To rigorously prove that some occurrences are caused by others, causal theory and causal inference were formalized, introducing the $do$-operator and its associated rules. The main goal of this report is to review and implement in Python some algorithms to compute conditional and non-conditional causal queries from observational data. To this end, we first present some basic background knowledge on probability and graph theory, before introducing important results on causal theory, used in the construction of the algorithms. We then thoroughly study the identification algorithms presented by Shpitser and Pearl in 2006, explaining our implementation in Python alongside. The main identification algorithm can be seen as a repeated application of the rules of $do$-calculus, and it eventually either returns an expression for the causal query from experimental probabilities or fails to identify the causal effect, in which case the effect is non-identifiable. We introduce our newly developed Python library and give some usage examples.

【54】 Exact simulation of continuous max-id processes 标题:连续max-id过程的精确模拟

作者:Florian Brück 机构: Florian 1 1Technical University Munich 链接:https://arxiv.org/abs/2107.04630 摘要:我们提供了两种精确模拟可交换最大(最小)id随机过程和随机向量的算法。我们的算法只需要模拟有限Poisson随机测度,避免了计算指数测度的条件分布。 摘要:We provide two algorithms for the exact simulation of exchangeable max-(min-)id stochastic processes and random vectors. Our algorithms only require the simulation of finite Poisson random measures and avoid the necessity of computing conditional distributions of exponent measures.

本文参与 腾讯云自媒体同步曝光计划,分享自微信公众号。
原始发表:2021-07-13,如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 arXiv每日学术速递 微信公众号,前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体同步曝光计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档