社区首页 >专栏 >金融/语音/音频处理学术速递[8.17]


发布2021-08-24 16:18:38
发布2021-08-24 16:18:38






【1】 Causal Impact Of European Union Emission Trading Scheme On Firm Behaviour And Economic Performance: A Study Of German Manufacturing Firms 标题:欧盟排污权交易计划对企业行为和经济绩效的因果影响:一项对德国制造企业的研究 链接:https://arxiv.org/abs/2108.07163

作者:Nitish Gupta,Jay Shah,Satwik Gupta,Ruchir Kaul 机构:Indian Institute of Technology Madras 备注:19 pages 摘要:在本文中,我们估计了欧盟排放交易系统对温室气体排放和企业竞争力的因果影响(即平均处理效应,ATT)(主要通过就业率、营业额、,通过将差异中的差异方法与半参数匹配技术和估计器相结合,使用随机生产前沿模型研究欧盟ETS对这些德国制造企业经济绩效的影响。 摘要:In this paper, we estimate the causal impact (i.e. Average Treatment Effect, ATT) of the EU ETS on GHG emissions and firm competitiveness (primarily measured by employment, turnover, and exports levels) by combining a difference-in-differences approach with semi-parametric matching techniques and estimators an to investigate the effect of the EU ETS on the economic performance of these German manufacturing firms using a Stochastic Production Frontier model.

【2】 Dynamic Monopoly Pricing With Multiple Varieties: Trading Up 标题:多品种动态垄断定价:涨价 链接:https://arxiv.org/abs/2108.07146

作者:Stefan Buehler,Nicolas Eschenbaum 机构: University of St, Institute of Economics 摘要:本文研究了一大类同轴和非同轴环境下的动态垄断定价问题。我们表明,定价动态背后的驱动力是卖方的动机,即向上交易消费者以获得更高价值的消费选择。在科斯环境下,消费者可以从静态最优状态向上交易,价格动态会出现,直到所有向上交易的机会都用尽。在非高斯环境下,消费者无法从静态最优状态进行交易,也不会出现定价动态。因此,动态垄断定价的特点是在静态最优条件下检查交易机会。 摘要:This paper studies dynamic monopoly pricing for a broad class of Coasian and Non-Coasian settings. We show that the driving force behind pricing dynamics is the seller's incentive to trade up consumers to higher-valued consumption options. In Coasian settings, consumers can be traded up from the static optimum, and pricing dynamics arise until all trading-up opportunities are exhausted. In Non-Coasian settings, consumers cannot be traded up from the static optimum, and no pricing dynamics arise. Hence, dynamic monopoly pricing can be characterized by checking for trading-up opportunities in the static optimum.

【3】 Study Of German Manufacturing Firms: Causal Impact Of European Union Emission Trading Scheme On Firm Behaviour And Economic Performance 标题:德国制造企业研究:欧盟排污权交易计划对企业行为和经济绩效的因果影响 链接:https://arxiv.org/abs/2108.07116

作者:Nitish Gupta,Ruchir Kaul,Satwik Gupta,Jay Shah 机构:Indian Institute of Technology Madras 备注:23 pages 摘要:基于非参数最近邻匹配的结果表明,在欧盟ETS的第一阶段,欧盟ETS对受监管企业的经济绩效产生了统计上显著的积极影响。逐年分析表明,该影响仅在第一阶段的第一年显著。因此,欧盟ETS在引入时产生了特别强烈的影响。值得注意的是,欧盟ETS对制造业企业的影响并不均匀。我们发现,欧盟ETS对造纸行业受监管企业的经济绩效产生了显著的积极影响。 摘要:The results based on the nonparametric nearest neighbor matching suggest a statistically significant positive effect of the EU ETS on the economic performance of the regulated firms during Phase I of the EU ETS. A year-by-year analysis shows that the effect was only significant during the first year of Phase I. The EU ETS, therefore, had a particularly strong effect when it was introduced. It is important to note that the EU ETS does not homogeneously affect firms in the manufacturing sector. We found a significant positive impact of EU ETS on the economic performance of regulated firms in the paper industry.

【4】 α-Hypergeometric Uncertain Volatility Models and their Connection to 2BSDEs标题:α-超几何不确定波动率模型及其与2BSDE的联系链接:https://arxiv.org/abs/2108.06965

作者:Zaineb Mezdoud,Carsten Hartmann,Mohamed Riad Remita,Omar Kebiri 机构:† Laboratory of probabilities and statistics, University of Badji-Mokhtar Annaba, Algeria, ‡ Institute of Mathematics,Brandenburgische Technische Universität Cottbus-Senftenberg, Germany 备注:15 pages, 1 figure 摘要:在本文中,我们提出了一个具有不确定波动率(UV)的$\alpha$-超几何模型,其中我们推导了期权定价的最坏情况。该方法基于HJB类型的某类非线性偏微分方程(G-HJB方程)之间的联系,该类方程控制UV模型的非线性预期,并为UV模型的困难模型校准问题提供了替代方案,和二阶倒向随机微分方程(2BSDE)。利用G-HJB方程的渐近分析和等价的2BSDE表示,我们推导出了一个极限模型,该模型在UV模型的边界缓慢变化的情况下提供了最坏情况下价格情景的精确描述。分析结果通过使用基于深度学习的基础2BSDE近似值的数值模拟进行验证。 摘要:In this article we propose a $\alpha$-hypergeometric model with uncertain volatility (UV) where we derive a worst-case scenario for option pricing. The approach is based on the connexion between a certain class of nonlinear partial differential equations of HJB-type (G-HJB equations), that govern the nonlinear expectation of the UV model and that provide an alternative to the difficult model calibration problem of UV models, and second-order backward stochastic differential equations (2BSDEs). Using asymptotic analysis for the G-HJB equation and the equivalent 2BSDE representation, we derive a limit model that provides an accurate description of the worst-case price scenario in cases when the bounds of the UV model are slowly varying. The analytical results are tested by numerical simulations using a deep learning based approximation of the underlying 2BSDE.

【5】 Moral-hazard-free insurance: mean-variance premium principle and rank-dependent utility 标题:无道德风险保险:均值-方差保费原理与秩相关效用 链接:https://arxiv.org/abs/2108.06940

作者:Zuo Quan Xu 机构:Aug 摘要:本研究考察了一个帕累托最优保险问题,其中被保险人最大化其秩相关效用,保险人采用均值-方差保费原则。为了消除一些可能的道德风险问题,我们只考虑服从激励相容约束的道德风险免费保险合同。首先将保险问题转化为包含Choquet期望的非凹最大化问题,然后转化为凹分位数优化问题,最后用变分法求解。最优契约由一个具有非局部算子的半线性二阶双障碍常微分方程表示。当概率加权函数具有密度时,提出了一种计算最优契约的有效数值方法。 摘要:This study exams a Pareto optimal insurance problem, where the insured maximizes her rank-dependent utility and the insurer employs the mean-variance premium principle. To eliminate some possible moral hazard issues, we only consider moral-hazard-free insurance contracts that obey the incentive compatibility constraint. The insurance problem is first formulated as a non-concave maximization problem involving Choquet expectation, then turned into a concave quantile optimization problem and finally solved by calculus of variations method. The optimal contract is expressed by a semi-linear second order double-obstacle ordinary differential equation with nonlocal operator. When the probability weighting function has a density, an effective numerical method is proposed to compute the optimal contract.

【6】 G3M Impermanent Loss Dynamics 标题:G3M非永久损耗动力学 链接:https://arxiv.org/abs/2108.06593

作者:Nassib Boueri 备注:8 pages, 9 figures 摘要:几何平均值做市商(G3M),如Uniswap、Sushiswap或Balancer,是新兴分散金融体系的关键组成部分。我们在存在交易费用的情况下为此类自动做市商的财富过程建立了无套利界限,并强调了其所谓的非永久性损失的动态性,这些损失是由于负凸性而产生的,基本上使G3M内投资组合多样化的好处无效。然后,我们转向实证数据,以确定交易费收入在历史上是否足够高,足以抵消非永久性损失,并允许G3M投资表现优于其不断重新平衡的固定组合对手。在考虑非永久性损失时,中间流动性池的净投资回报率为零。然而,投资回报率的横截面离散度很高,池净投资回报率排名数周来显著自相关。这表明G3M基金尚未有效套利,因为代理人可能会事先了解哪些G3M基金可能比其他基金更好。最后,我们将重点放在UniswapV3协议上,该协议引入了集中流动性范围的概念,并表明可以通过利用经典的UniswapV2池来复制这种头寸,同时对冲部分基础代币价格。因此,本文描述的非永久性损失动态也适用于UniswapV3池。 摘要:Geometric Mean Market Makers (G3M) such as Uniswap, Sushiswap or Balancer are key building blocks of the nascent Decentralised Finance system. We establish non-arbitrage bounds for the wealth process of such Automated Market Makers in the presence of transaction fees and highlight the dynamic of their so-called Impermanent Losses, which are incurred due to negative convexity and essentially void the benefits of portfolio diversification within G3Ms. We then turn to empirical data to establish if transaction fee income has historically been high enough to offset Impermanent Losses and allow G3M investments to outperform their continually rebalanced constant-mix portfolio counterparts. It appears that the median liquidity pool had a net nil ROI when taking Impermanent Losses into account. The cross-sectional dispersion of ROI has however been high and the pool net ROI ranking has been significantly autocorrelated for several weeks. This suggests that G3M pools are not yet efficiently arbitraged as agents may access ex-ante knowledge of which G3M pools are likely to be far better investment proposals than others. We finally focus on the UniswapV3 protocol, which introduced the notion of concentrated liquidity ranges and show that such a position can be replicated by leveraging a classic UniswapV2 pool while simultaneously hedging part of the underlying token price exposition. As such, the herein described Impermanent Loss dynamics also apply to UniswapV3 pools.

【7】 From bid-ask credit default swap quotes to risk-neutral default probabilities using distorted expectations 标题:使用扭曲预期从买卖信用违约互换报价到风险中性违约概率 链接:https://arxiv.org/abs/2108.06578

作者:Matteo Michielon,Asma Khedher,Peter Spreij 机构:Quantitative Analysis and Quantitative Development, ABN AMRO Bank N.V., Gustav, Mahlerlaan , PP Amsterdam, The Netherlands., Korteweg-de Vries Institute for Mathematics, University of Amsterdam, Science Park, -, XG Amsterdam, The Netherlands 备注:None 摘要:信用违约互换(CDS)市场报价可以暗示风险中性违约概率。在实践中,中期CDS报价被用作输入,因为它们的风险中性对手是不可见的。我们展示了如何通过在二次曲线金融框架内制定CDS标度问题,直接从买入和卖出报价中暗示风险中性违约概率。假设违约时间的风险中性分布由泊松过程驱动,我们证明,在温和的流动性相关假设下,校准问题允许一个独特的解决方案,也允许联合计算市场的隐含流动性。 摘要:Risk-neutral default probabilities can be implied from credit default swap (CDS) market quotes. In practice, mid CDS quotes are used as inputs, as their risk-neutral counterparts are not observable. We show how to imply risk-neutral default probabilities from bid and ask quotes directly by means of formulating the CDS calibration problem to bid and ask market quotes within the conic finance framework. Assuming the risk-neutral distribution of the default time to be driven by a Poisson process we prove, under mild liquidity-related assumptions, that the calibration problem admits a unique solution that also allows to jointly calculate the implied liquidity of the market.

【8】 Logistics and trade flows in selected ECOWAS Countries: An empirical verification 标题:西非经共体部分国家的物流和贸易流动:一项实证验证 链接:https://arxiv.org/abs/2108.06441

作者:Eriamiatoe Efosa Festus 机构:DEPARTMENT OF ECONOMICS, +,-,-,- 摘要:这项研究调查了物流及其六个组成部分对选定的西非国家经济共同体(西非经共体)国家贸易流动的作用。还调查了其他宏观经济变量对贸易流动的影响。在八年的时间里选出了十个国家。我们将贸易流量分解为进出口贸易。世界银行物流绩效指数被用作衡量物流绩效的指标。LPI有六个组成部分,并且还研究了这些组成部分对贸易流量的影响。固定效应模型用于解释所获得的跨国结果。结果表明,物流对进出口均无显著影响,因此物流对选定西非经共体国家之间的贸易流动不起作用。除了货物到达最终目的地的及时性(CRC)外,物流的组成部分对贸易流没有影响。研究发现,收入与进口呈正相关。汇率、消费和货币供应、储备和关税对进口没有显著影响。相对进口价格与进口量呈显著负相关。国内生产总值对出口贸易有积极和重大的影响。研究还发现,外国直接投资、储蓄、汇率和劳动力对出口的影响不大。最后,我们发现物流不是选定西非经共体国家之间贸易的驱动力。研究报告建议采用单一窗口系统并改进边境管理,以降低与物流相关的成本,从而促进贸易。 摘要:This study investigates the role of logistics and its six components on trade flows in selected Economic Community of West Africa States (ECOWAS) countries. The impact of other macro-economic variables on trade flows was also investigated. Ten countries were selected in eight years period. We decomposed trade flows into import and export trade. The World Bank Logistics performance index was used as a measure of logistics performance. The LPI has six components, and the impact of these components on trade flows were also examined. The fixed-effect model was used to explain the cross-country result that was obtained. The results showed that logistics has no significant impact on both Import and export, thus logistics play no role on trade flows among the selected ECOWAS countries. The components of logistics except Timeliness of shipments in reaching the final destination ( CRC ),have no impact on trade flows. Income was found to be positively related to imports. Exchange rate, consumption and money supply, reserve and tariff have no significant impact on imports. Relative import price has an inverse and significant relationship with imports. GDP has a positive and significant impact on export trade. The study also found FDI, savings, exchange rate and labour to have insignificant impact on exports. Finally, we found that logistics is not a driver of trade among the selected ECOWAS countries. The study recommended the introduction of the single window system and improvement in border management in order to reduce the cost associated with Logistics and thereby enhance trade.

【9】 Policy Evaluation and Temporal-Difference Learning in Continuous Time and Space: A Martingale Approach 标题:连续时空中的策略评估与时差学习:一种鞅方法 链接:https://arxiv.org/abs/2108.06655

作者:Yanwei Jia,Xun Yu Zhou 备注:46 pages, 9 figures 摘要:我们提出了一个统一的框架来研究在连续时间和空间中强化学习的策略评估(PE)和相关的时间差分(TD)方法。我们证明了PE等价于保持过程的鞅条件。从这个角度来看,我们发现均方TD误差近似于鞅的二次变化,因此不是PE的合适目标。我们提出了两种利用鞅特征设计PE算法的方法。第一种方法最小化一个“鞅损失函数”,其解被证明是均方意义下真值函数的最佳逼近。该方法解释了经典的梯度蒙特卡罗算法。第二种方法基于一个称为“鞅正交条件”的方程组和“测试函数”。以不同的方式求解这些方程可以恢复各种经典的TD算法,例如TD($\lambda$)、LSTD和GTD。测试函数的不同选择决定了结果解在何种意义上近似真值函数。此外,我们还证明了当网格尺寸为零时,任何收敛时间离散化算法都收敛到其连续时间对应的算法。我们通过数值实验和应用证明了理论结果和相应的算法。 摘要:We propose a unified framework to study policy evaluation (PE) and the associated temporal difference (TD) methods for reinforcement learning in continuous time and space. We show that PE is equivalent to maintaining the martingale condition of a process. From this perspective, we find that the mean--square TD error approximates the quadratic variation of the martingale and thus is not a suitable objective for PE. We present two methods to use the martingale characterization for designing PE algorithms. The first one minimizes a "martingale loss function", whose solution is proved to be the best approximation of the true value function in the mean--square sense. This method interprets the classical gradient Monte-Carlo algorithm. The second method is based on a system of equations called the "martingale orthogonality conditions" with "test functions". Solving these equations in different ways recovers various classical TD algorithms, such as TD($\lambda$), LSTD, and GTD. Different choices of test functions determine in what sense the resulting solutions approximate the true value function. Moreover, we prove that any convergent time-discretized algorithm converges to its continuous-time counterpart as the mesh size goes to zero. We demonstrate the theoretical results and corresponding algorithms with numerical experiments and applications.


【1】 Convolutive Prediction for Reverberant Speech Separation 标题:混响语音分离的卷积预测 链接:https://arxiv.org/abs/2108.07194

作者:Zhong-Qiu Wang,Gordon Wichern,Jonathan Le Roux 机构:Mitsubishi Electric Research Laboratories (MERL), USA 备注:in IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), 2021 摘要:我们研究了卷积预测(一种用于语音去冗余的线性预测的新形式)在混响条件下用于说话人分离的有效性。其关键思想是首先使用深度神经网络(DNN)来估计每个说话人的直接路径信号,然后识别估计的直接路径信号的延迟和衰减副本。这类副本可能是由于混响造成的,可以直接删除以消除混响,也可以作为另一个DNN的额外功能来执行更好的消除混响和分离。为了识别这些副本,我们在时频(T-F)域中有效地解决了每个频率的线性回归问题,以估计潜在的房间脉冲响应(RIR)。在多信道扩展中,我们对卷积预测的输出执行最小方差无失真响应(MVDR)波束形成。波束形成和去冗余结果用作第二个DNN的额外功能,以执行更好的分离和去冗余。在SMS-WSJ语料库上获得了最新的结果。 摘要:We investigate the effectiveness of convolutive prediction, a novel formulation of linear prediction for speech dereverberation, for speaker separation in reverberant conditions. The key idea is to first use a deep neural network (DNN) to estimate the direct-path signal of each speaker, and then identify delayed and decayed copies of the estimated direct-path signal. Such copies are likely due to reverberation, and can be directly removed for dereverberation or used as extra features for another DNN to perform better dereverberation and separation. To identify such copies, we solve a linear regression problem per frequency efficiently in the time-frequency (T-F) domain to estimate the underlying room impulse response (RIR). In the multi-channel extension, we perform minimum variance distortionless response (MVDR) beamforming on the outputs of convolutive prediction. The beamforming and dereverberation results are used as extra features for a second DNN to perform better separation and dereverberation. State-of-the-art results are obtained on the SMS-WSJ corpus.

【2】 NIST SRE CTS Superset: A large-scale dataset for telephony speaker recognition 标题:NIST SRE CTS超集:用于电话说话人识别的大规模数据集 链接:https://arxiv.org/abs/2108.07118

作者:Seyed Omid Sadjadi 摘要:本文件简要介绍了美国国家标准与技术研究所(NIST)说话人识别评估(SRE)对话电话语音(CTS)超集。创建CTS超集是为了向研究界提供大规模数据集和统一元数据,这些元数据可用于有效训练和开发电话(窄带)说话人识别系统。它包含来自6800多个扬声器的大量电话语音片段,语音持续时间均匀分布在[10s,60s]范围内。这些片段是从用于编译先前SRE数据集(SRE1996-2012)的源语料库中提取的,包括灰胡子语料库以及语言数据联盟(LDC)收集的交换机和混音器系列。除了简要说明外,我们还报告了NIST 2020 CTS说话人识别挑战赛的说话人识别结果,该挑战赛是使用CTS超集训练的系统获得的。结果将作为挑战的参考基线。 摘要:This document provides a brief description of the National Institute of Standards and Technology (NIST) speaker recognition evaluation (SRE) conversational telephone speech (CTS) Superset. The CTS Superset has been created in an attempt to provide the research community with a large-scale dataset along with uniform metadata that can be used to effectively train and develop telephony (narrowband) speaker recognition systems. It contains a large number of telephony speech segments from more than 6800 speakers with speech durations distributed uniformly in the [10s, 60s] range. The segments have been extracted from the source corpora used to compile prior SRE datasets (SRE1996-2012), including the Greybeard corpus as well as the Switchboard and Mixer series collected by the Linguistic Data Consortium (LDC). In addition to the brief description, we also report speaker recognition results on the NIST 2020 CTS Speaker Recognition Challenge, obtained using a system trained with the CTS Superset. The results will serve as a reference baseline for the challenge.

【3】 Cross-modal Spectrum Transformation Network For Acoustic Scene classification 标题:用于声场景分类的交叉模谱变换网络 链接:https://arxiv.org/abs/2108.06401

作者:Yang Liu,Alexandros Neophytou,Sunando Sengupta,Eric Sommerlade 机构: Department of Electrical and Electronic Engineering, University of Surrey, UK, Microsoft Corporation, Reading, UK 备注:None 摘要:具有对数mel谱特征的卷积神经网络(CNNs)在声场景分类任务中显示了良好的效果。然而,这些基于CNN的分类器的性能仍然不足,因为它们不能很好地适用于未知环境。为了解决这个问题,我们引入了一个声学频谱转换网络,其中传统的对数mel频谱转换为想象的视觉特征(IVF)。通过利用视频记录中音频和视频特征之间的关系来学习想象的视觉特征。自动编码器用于将图像编码为视觉特征,变换网络学习如何从日志mel生成想象的视觉特征。我们的模型是在Youtube视频的大数据集上训练的。我们在DCASE和ESC-50的场景分类任务中测试了我们提出的方法,其中我们的方法优于其他光谱特征,特别是对于看不见的环境。 摘要:Convolutional neural networks (CNNs) with log-mel spectrum features have shown promising results for acoustic scene classification tasks. However, the performance of these CNN based classifiers is still lacking as they do not generalise well for unknown environments. To address this issue, we introduce an acoustic spectrum transformation network where traditional log-mel spectrums are transformed into imagined visual features (IVF). The imagined visual features are learned by exploiting the relationship between audio and visual features present in video recordings. An auto-encoder is used to encode images as visual features and a transformation network learns how to generate imagined visual features from log-mel. Our model is trained on a large dataset of Youtube videos. We test our proposed method on the scene classification task of DCASE and ESC-50, where our method outperforms other spectrum features, especially for unseen environments.

【4】 Language-Independent Approach for Automatic Computation of Vowel Articulation Features in Dysarthric Speech Assessment 标题:韵律评估中与语言无关的元音发音特征自动计算方法 链接:https://arxiv.org/abs/2108.06943

作者:Yuanyuan Liu,Nelly Penttilä,Tiina Ihalainen,Juulia Lintula,Rachel Convey,Okko Räsänen 机构:Unit of Computing Sciences, Tampere University, Finland, Dept. Signal Processing and Acoustics, Aalto University, Finland, Dysarthria is a common symptom for people with Parkinson’s disease (PD), which affects respiration, phonation, articulation and 备注:None 摘要:帕金森病(PD)患者元音发音不准确。测量元音发音的声学特征已被证明是PD评估的有效指标。元音工作空间面积(VSA)、元音发音指数(VAI)和共振峰集中率(FCR)的标准临床元音发音特征由三个角元音/a/、/i/和/u/的前两个共振峰推导而来。传统上,在测量元音清晰度之前,需要从语音数据中手动注释角元音。这个过程很耗时。目前的工作旨在通过提出元音发音评估的自动管道,减少PD语音临床分析中的人力。该方法基于使用语言通用音素识别器自动检测角元音,然后对共振峰数据进行统计分析。这种方法消除了对口语内容和语言的先验知识的限制。在芬兰PD语音语料库上的实验结果证明了所提出的自动方法在推导VAI、VSA、FCR和F2i/F2u(元音/i/和/u/的第二共振峰比率)方面的有效性和可靠性。自动计算的参数与手工标注角元音计算的特征高度相关。此外,自动和手动计算的元音发音特征与专家对语音清晰度、语音损伤和沟通障碍总体严重程度的评分具有可比的相关性。该方法的语言独立性在西班牙PD数据库PC-GITA以及TORGO英语构音障碍语音语料库上得到了进一步验证。 摘要:Imprecise vowel articulation can be observed in people with Parkinson's disease (PD). Acoustic features measuring vowel articulation have been demonstrated to be effective indicators of PD in its assessment. Standard clinical vowel articulation features of vowel working space area (VSA), vowel articulation index (VAI) and formants centralization ratio (FCR), are derived the first two formants of the three corner vowels /a/, /i/ and /u/. Conventionally, manual annotation of the corner vowels from speech data is required before measuring vowel articulation. This process is time-consuming. The present work aims to reduce human effort in clinical analysis of PD speech by proposing an automatic pipeline for vowel articulation assessment. The method is based on automatic corner vowel detection using a language universal phoneme recognizer, followed by statistical analysis of the formant data. The approach removes the restrictions of prior knowledge of speaking content and the language in question. Experimental results on a Finnish PD speech corpus demonstrate the efficacy and reliability of the proposed automatic method in deriving VAI, VSA, FCR and F2i/F2u (the second formant ratio for vowels /i/ and /u/). The automatically computed parameters are shown to be highly correlated with features computed with manual annotations of corner vowels. In addition, automatically and manually computed vowel articulation features have comparable correlations with experts' ratings on speech intelligibility, voice impairment and overall severity of communication disorder. Language-independence of the proposed approach is further validated on a Spanish PD database, PC-GITA, as well as on TORGO corpus of English dysarthric speech.

【5】 GC-TTS: Few-shot Speaker Adaptation with Geometric Constraints 标题:GC-TTS:几何约束条件下的Few-Shot说话人自适应 链接:https://arxiv.org/abs/2108.06890

作者:Ji-Hoon Kim,Sang-Hoon Lee,Ji-Hyun Lee,Hong-Gyu Jung,Seong-Whan Lee 机构:KoreaUniversity 备注:Accepted paper in IEEE International Conference on Systems, Man, and Cybernetics (SMC 2021) 摘要:Few-Shot说话人自适应是一种特定的文本到语音(TTS)系统,旨在通过少量的训练数据再现新说话人的声音。尽管已经对Few-Shot说话人自适应系统进行了多次尝试,但根据数据量的不同,在说话人与目标说话人的相似性方面仍然存在差距。为了弥补这一差距,我们提出了GC-TTS,它在显著提高说话人相似度的同时实现了高质量的说话人自适应。具体来说,我们利用两个几何约束来学习区分性说话人表示。在这里,一个TTS模型是为具有足够数据量的基本说话人预先训练的,然后在具有两个几何约束的几分钟数据上为新说话人进行微调。两个几何约束使得该模型能够从有限的数据中提取有区别的说话人嵌入,从而合成可理解的语音。我们讨论并验证了GC-TTS的有效性,并将其与流行的基本方法进行了比较。实验结果表明,GC-TTS仅从几分钟的训练数据中生成高质量的语音,在说话人与目标说话人的相似性方面优于标准技术。 摘要:Few-shot speaker adaptation is a specific Text-to-Speech (TTS) system that aims to reproduce a novel speaker's voice with a few training data. While numerous attempts have been made to the few-shot speaker adaptation system, there is still a gap in terms of speaker similarity to the target speaker depending on the amount of data. To bridge the gap, we propose GC-TTS which achieves high-quality speaker adaptation with significantly improved speaker similarity. Specifically, we leverage two geometric constraints to learn discriminative speaker representations. Here, a TTS model is pre-trained for base speakers with a sufficient amount of data, and then fine-tuned for novel speakers on a few minutes of data with two geometric constraints. Two geometric constraints enable the model to extract discriminative speaker embeddings from limited data, which leads to the synthesis of intelligible speech. We discuss and verify the effectiveness of GC-TTS by comparing it with popular and essential methods. The experimental results demonstrate that GC-TTS generates high-quality speech from only a few minutes of training data, outperforming standard techniques in terms of speaker similarity to the target speaker.


【1】 Language-Independent Approach for Automatic Computation of Vowel Articulation Features in Dysarthric Speech Assessment 标题:韵律评估中与语言无关的元音发音特征自动计算方法 链接:https://arxiv.org/abs/2108.06943

作者:Yuanyuan Liu,Nelly Penttilä,Tiina Ihalainen,Juulia Lintula,Rachel Convey,Okko Räsänen 机构:Unit of Computing Sciences, Tampere University, Finland, Dept. Signal Processing and Acoustics, Aalto University, Finland, Dysarthria is a common symptom for people with Parkinson’s disease (PD), which affects respiration, phonation, articulation and 备注:None 摘要:帕金森病(PD)患者元音发音不准确。测量元音发音的声学特征已被证明是PD评估的有效指标。元音工作空间面积(VSA)、元音发音指数(VAI)和共振峰集中率(FCR)的标准临床元音发音特征由三个角元音/a/、/i/和/u/的前两个共振峰推导而来。传统上,在测量元音清晰度之前,需要从语音数据中手动注释角元音。这个过程很耗时。目前的工作旨在通过提出元音发音评估的自动管道,减少PD语音临床分析中的人力。该方法基于使用语言通用音素识别器自动检测角元音,然后对共振峰数据进行统计分析。这种方法消除了对口语内容和语言的先验知识的限制。在芬兰PD语音语料库上的实验结果证明了所提出的自动方法在推导VAI、VSA、FCR和F2i/F2u(元音/i/和/u/的第二共振峰比率)方面的有效性和可靠性。自动计算的参数与手工标注角元音计算的特征高度相关。此外,自动和手动计算的元音发音特征与专家对语音清晰度、语音损伤和沟通障碍总体严重程度的评分具有可比的相关性。该方法的语言独立性在西班牙PD数据库PC-GITA以及TORGO英语构音障碍语音语料库上得到了进一步验证。 摘要:Imprecise vowel articulation can be observed in people with Parkinson's disease (PD). Acoustic features measuring vowel articulation have been demonstrated to be effective indicators of PD in its assessment. Standard clinical vowel articulation features of vowel working space area (VSA), vowel articulation index (VAI) and formants centralization ratio (FCR), are derived the first two formants of the three corner vowels /a/, /i/ and /u/. Conventionally, manual annotation of the corner vowels from speech data is required before measuring vowel articulation. This process is time-consuming. The present work aims to reduce human effort in clinical analysis of PD speech by proposing an automatic pipeline for vowel articulation assessment. The method is based on automatic corner vowel detection using a language universal phoneme recognizer, followed by statistical analysis of the formant data. The approach removes the restrictions of prior knowledge of speaking content and the language in question. Experimental results on a Finnish PD speech corpus demonstrate the efficacy and reliability of the proposed automatic method in deriving VAI, VSA, FCR and F2i/F2u (the second formant ratio for vowels /i/ and /u/). The automatically computed parameters are shown to be highly correlated with features computed with manual annotations of corner vowels. In addition, automatically and manually computed vowel articulation features have comparable correlations with experts' ratings on speech intelligibility, voice impairment and overall severity of communication disorder. Language-independence of the proposed approach is further validated on a Spanish PD database, PC-GITA, as well as on TORGO corpus of English dysarthric speech.

【2】 GC-TTS: Few-shot Speaker Adaptation with Geometric Constraints 标题:GC-TTS:几何约束条件下的Few-Shot说话人自适应 链接:https://arxiv.org/abs/2108.06890

作者:Ji-Hoon Kim,Sang-Hoon Lee,Ji-Hyun Lee,Hong-Gyu Jung,Seong-Whan Lee 机构:KoreaUniversity 备注:Accepted paper in IEEE International Conference on Systems, Man, and Cybernetics (SMC 2021) 摘要:Few-Shot说话人自适应是一种特定的文本到语音(TTS)系统,旨在通过少量的训练数据再现新说话人的声音。尽管已经对Few-Shot说话人自适应系统进行了多次尝试,但根据数据量的不同,在说话人与目标说话人的相似性方面仍然存在差距。为了弥补这一差距,我们提出了GC-TTS,它在显著提高说话人相似度的同时实现了高质量的说话人自适应。具体来说,我们利用两个几何约束来学习区分性说话人表示。在这里,一个TTS模型是为具有足够数据量的基本说话人预先训练的,然后在具有两个几何约束的几分钟数据上为新说话人进行微调。两个几何约束使得该模型能够从有限的数据中提取有区别的说话人嵌入,从而合成可理解的语音。我们讨论并验证了GC-TTS的有效性,并将其与流行的基本方法进行了比较。实验结果表明,GC-TTS仅从几分钟的训练数据中生成高质量的语音,在说话人与目标说话人的相似性方面优于标准技术。 摘要:Few-shot speaker adaptation is a specific Text-to-Speech (TTS) system that aims to reproduce a novel speaker's voice with a few training data. While numerous attempts have been made to the few-shot speaker adaptation system, there is still a gap in terms of speaker similarity to the target speaker depending on the amount of data. To bridge the gap, we propose GC-TTS which achieves high-quality speaker adaptation with significantly improved speaker similarity. Specifically, we leverage two geometric constraints to learn discriminative speaker representations. Here, a TTS model is pre-trained for base speakers with a sufficient amount of data, and then fine-tuned for novel speakers on a few minutes of data with two geometric constraints. Two geometric constraints enable the model to extract discriminative speaker embeddings from limited data, which leads to the synthesis of intelligible speech. We discuss and verify the effectiveness of GC-TTS by comparing it with popular and essential methods. The experimental results demonstrate that GC-TTS generates high-quality speech from only a few minutes of training data, outperforming standard techniques in terms of speaker similarity to the target speaker.

【3】 Convolutive Prediction for Reverberant Speech Separation 标题:混响语音分离的卷积预测 链接:https://arxiv.org/abs/2108.07194

作者:Zhong-Qiu Wang,Gordon Wichern,Jonathan Le Roux 机构:Mitsubishi Electric Research Laboratories (MERL), USA 备注:in IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), 2021 摘要:我们研究了卷积预测(一种用于语音去冗余的线性预测的新形式)在混响条件下用于说话人分离的有效性。其关键思想是首先使用深度神经网络(DNN)来估计每个说话人的直接路径信号,然后识别估计的直接路径信号的延迟和衰减副本。这类副本可能是由于混响造成的,可以直接删除以消除混响,也可以作为另一个DNN的额外功能来执行更好的消除混响和分离。为了识别这些副本,我们在时频(T-F)域中有效地解决了每个频率的线性回归问题,以估计潜在的房间脉冲响应(RIR)。在多信道扩展中,我们对卷积预测的输出执行最小方差无失真响应(MVDR)波束形成。波束形成和去冗余结果用作第二个DNN的额外功能,以执行更好的分离和去冗余。在SMS-WSJ语料库上获得了最新的结果。 摘要:We investigate the effectiveness of convolutive prediction, a novel formulation of linear prediction for speech dereverberation, for speaker separation in reverberant conditions. The key idea is to first use a deep neural network (DNN) to estimate the direct-path signal of each speaker, and then identify delayed and decayed copies of the estimated direct-path signal. Such copies are likely due to reverberation, and can be directly removed for dereverberation or used as extra features for another DNN to perform better dereverberation and separation. To identify such copies, we solve a linear regression problem per frequency efficiently in the time-frequency (T-F) domain to estimate the underlying room impulse response (RIR). In the multi-channel extension, we perform minimum variance distortionless response (MVDR) beamforming on the outputs of convolutive prediction. The beamforming and dereverberation results are used as extra features for a second DNN to perform better separation and dereverberation. State-of-the-art results are obtained on the SMS-WSJ corpus.

【4】 NIST SRE CTS Superset: A large-scale dataset for telephony speaker recognition 标题:NIST SRE CTS超集:用于电话说话人识别的大规模数据集 链接:https://arxiv.org/abs/2108.07118

作者:Seyed Omid Sadjadi 摘要:本文件简要介绍了美国国家标准与技术研究所(NIST)说话人识别评估(SRE)对话电话语音(CTS)超集。创建CTS超集是为了向研究界提供大规模数据集和统一元数据,这些元数据可用于有效训练和开发电话(窄带)说话人识别系统。它包含来自6800多个扬声器的大量电话语音片段,语音持续时间均匀分布在[10s,60s]范围内。这些片段是从用于编译先前SRE数据集(SRE1996-2012)的源语料库中提取的,包括灰胡子语料库以及语言数据联盟(LDC)收集的交换机和混音器系列。除了简要说明外,我们还报告了NIST 2020 CTS说话人识别挑战赛的说话人识别结果,该挑战赛是使用CTS超集训练的系统获得的。结果将作为挑战的参考基线。 摘要:This document provides a brief description of the National Institute of Standards and Technology (NIST) speaker recognition evaluation (SRE) conversational telephone speech (CTS) Superset. The CTS Superset has been created in an attempt to provide the research community with a large-scale dataset along with uniform metadata that can be used to effectively train and develop telephony (narrowband) speaker recognition systems. It contains a large number of telephony speech segments from more than 6800 speakers with speech durations distributed uniformly in the [10s, 60s] range. The segments have been extracted from the source corpora used to compile prior SRE datasets (SRE1996-2012), including the Greybeard corpus as well as the Switchboard and Mixer series collected by the Linguistic Data Consortium (LDC). In addition to the brief description, we also report speaker recognition results on the NIST 2020 CTS Speaker Recognition Challenge, obtained using a system trained with the CTS Superset. The results will serve as a reference baseline for the challenge.

【5】 Investigating Bias In Automatic Toxic Comment Detection: An Empirical Study 标题:毒物评论自动检测中的偏差调查:一项实证研究 链接:https://arxiv.org/abs/2108.06487

作者:Ayush Kumar,Pratik Kumar 机构:Georgia Institute of Technology, Atlanta, US 摘要:随着在线平台的激增,通过评论和反应,用户在这些平台上的参与度也在激增。这些文字评论中有很大一部分是辱骂、粗鲁和冒犯观众的。有了机器学习系统来检查平台上的评论,训练数据中存在的偏见就会传递到分类器上,导致对一组阶级、宗教和性别的歧视。在这项工作中,我们评估了不同的分类器和特征,以估计这些分类器中的偏差以及它们在毒性分类下游任务中的性能。结果表明,自动毒性评论检测模型性能的改善与这些模型中的偏差的缓解正相关。在我们的工作中,有注意机制的LSTM被证明是比CNN模型更好的建模策略。进一步的分析表明,在毒性评价检测的训练模型上,fasttext嵌入略优于手套嵌入。更深入的分析揭示了这样一个发现,即这种自动模型特别偏向于特定的身份群体,即使该模型具有较高的AUC分数。最后,为了减轻毒性检测模型中的偏差,用毒性亚型辅助任务训练的多任务设置被证明是有用的,导致AUC得分增加高达0.26%(6%相对)。 摘要:With surge in online platforms, there has been an upsurge in the user engagement on these platforms via comments and reactions. A large portion of such textual comments are abusive, rude and offensive to the audience. With machine learning systems in-place to check such comments coming onto platform, biases present in the training data gets passed onto the classifier leading to discrimination against a set of classes, religion and gender. In this work, we evaluate different classifiers and feature to estimate the bias in these classifiers along with their performance on downstream task of toxicity classification. Results show that improvement in performance of automatic toxic comment detection models is positively correlated to mitigating biases in these models. In our work, LSTM with attention mechanism proved to be a better modelling strategy than a CNN model. Further analysis shows that fasttext embeddings is marginally preferable than glove embeddings on training models for toxicity comment detection. Deeper analysis reveals the findings that such automatic models are particularly biased to specific identity groups even though the model has a high AUC score. Finally, in effort to mitigate bias in toxicity detection models, a multi-task setup trained with auxiliary task of toxicity sub-types proved to be useful leading to upto 0.26% (6% relative) gain in AUC scores.

【6】 Cross-modal Spectrum Transformation Network For Acoustic Scene classification 标题:用于声场景分类的交叉模谱变换网络 链接:https://arxiv.org/abs/2108.06401

作者:Yang Liu,Alexandros Neophytou,Sunando Sengupta,Eric Sommerlade 机构: Department of Electrical and Electronic Engineering, University of Surrey, UK, Microsoft Corporation, Reading, UK 备注:None 摘要:具有对数mel谱特征的卷积神经网络(CNNs)在声场景分类任务中显示了良好的效果。然而,这些基于CNN的分类器的性能仍然不足,因为它们不能很好地适用于未知环境。为了解决这个问题,我们引入了一个声学频谱转换网络,其中传统的对数mel频谱转换为想象的视觉特征(IVF)。通过利用视频记录中音频和视频特征之间的关系来学习想象的视觉特征。自动编码器用于将图像编码为视觉特征,变换网络学习如何从日志mel生成想象的视觉特征。我们的模型是在Youtube视频的大数据集上训练的。我们在DCASE和ESC-50的场景分类任务中测试了我们提出的方法,其中我们的方法优于其他光谱特征,特别是对于看不见的环境。 摘要:Convolutional neural networks (CNNs) with log-mel spectrum features have shown promising results for acoustic scene classification tasks. However, the performance of these CNN based classifiers is still lacking as they do not generalise well for unknown environments. To address this issue, we introduce an acoustic spectrum transformation network where traditional log-mel spectrums are transformed into imagined visual features (IVF). The imagined visual features are learned by exploiting the relationship between audio and visual features present in video recordings. An auto-encoder is used to encode images as visual features and a transformation network learns how to generate imagined visual features from log-mel. Our model is trained on a large dataset of Youtube videos. We test our proposed method on the scene classification task of DCASE and ESC-50, where our method outperforms other spectrum features, especially for unseen environments.

本文参与 腾讯云自媒体同步曝光计划,分享自微信公众号。
原始发表:2021-08-17,如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 arXiv每日学术速递 微信公众号,前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体同步曝光计划  ,欢迎热爱写作的你一起参与!

0 条评论
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档