Loading [MathJax]/jax/output/CommonHTML/config.js
首页
学习
活动
专区
圈层
工具
发布
首页
学习
活动
专区
圈层
工具
MCP广场
社区首页 >专栏 >SP Module 3 – Digital Speech Signals

SP Module 3 – Digital Speech Signals

作者头像
杨丝儿
发布于 2022-11-10 08:28:32
发布于 2022-11-10 08:28:32
3550
举报
文章被收录于专栏:杨丝儿的小站杨丝儿的小站

Time domain

Sound is a wave of pressure travelling through a medium, such as air. We can plot the variation in pressure (captured by microphone) against time to visualise the waveform.

Sound source

Air flow from the lungs is the power source for generating a basic source of sound either using the vocal folds or at a constriction made anywhere in the vocal tract.

somehthing about pressure with our vocal folds, the air flow is slow, its only the power source of sound, the pressure change is the key generating sounds, repeat pulse of sound.

Periodic signal

The vocal folds block air flow from the lungs, burst open under pressure to create a glottal pulse, then rapidly close. This repeats, creating a periodic signal.

Pitch

Periodic signals are perceived as having a pitch. The physical property of fundamental frequency relates to the perceptual quantity of pitch.

a musical note, logarithmic none linear, with a base 2

Digital signal

To do speech processing with a computer, we need to convert sound first to an analogue electrical signal, and then to a digital representation of that signal.

sample of a waveform (analogue wave), sampling rate (or sampling frequency, digitized time) and quantization (or bit depth, the digitized amplitude) are the things determine the quality of sound.

Aliasing, the wave generated with sampling rate at a frequency lower than the original analog signal. To avoid aliasing, we have to remove all analogue sounds which has a higher frequency than the sampling rate.

Short-term analysis

Because speech sounds change over time, we need to analyse only short regions of the signal. We convert the speech signal into a sequence of frames.

To define a frame of the waveform, we have window function, cutting out of waveform.

Different window function leading to different results. If we simply use a 0/1 window function, and we analysed this signal we’d not only be analysing the speech but also those artefacts. So, we can use tapered windows, it’s cut out with a window function that tapers towards the edges. Think of that as a fade-in and a fade-out.

Series expansion

Speech is hard to analyse directly in the time domain. So we need to convert it to the frequency domain using Fourier analysis, which is a special case of series expansion.

To reconstruct the original analogue sounds, we can add together an infinite number of terms to get exactly the original signal.

However, there’s a finite amount of information, we only need a finite number of basis functions to exactly reconstruct it. Another way of saying that is that these basis functions are also digital signals, and the highest possible frequency one is the one at the Nyquist frequency, which is half the sampling rate.

What we do is simply calculate the coefficient of every possible frequency, and add them up to reconstruct the original signal.

One application of this is removing noise or not useful information by stop adding terms, and we get a smoother curve.

Fourier analysis

We can express any signal as a sum of sine waves that form a series. This takes us from the time domain to the frequency domain.

Spectrum is magnitude (dB) over frequency(kHz).

The basis functions are orthogonal, which means coefficients related are unique.

Frequency domain

We complete our understanding of Fourier analysis with a look at the phase of the component sine waves, and the effect of changing the analysis frame duration.

We neglect phase information during wave reconstruction. Where the wave start is not a big matter, because basis functions will synchronized sometime later.

The larger the analysis frame size, the more the basis functions.

The frequency domain remove the amplitude information. Or we can interpret that as we decompose time domain waveform to frequency domain and amplitude information.

Summary

After pitch we have prosody, refer to collectively the fundamental frequency, the duration, and the amplitude of speech sounds (sometimes also voice quality). when we attempt to generate synthetic speech, we’ll have to give it an appropriate prosody if we want it to sound natural.

After frequency domain, the next steps involve finding, in the frequency domain, some evidence of the periodicity in the speech signal: the harmonics. And Spectral envelope is the other half, answering what the vocal tract does to that sound source.

Origin: Module 3 – Digital Speech Signals Translate + Edit: YangSier (Homepage)

本文参与 腾讯云自媒体同步曝光计划,分享自作者个人站点/博客。
原始发表:2022-10-01,如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 作者个人站点/博客 前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体同步曝光计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
暂无评论
推荐阅读
编辑精选文章
换一批
SP Modules Review Contents
Speech production involves three systems in the body: the respiratory system, the phonation system, and the articulation system (Figure 1.2).
杨丝儿
2022/11/15
7480
SP Modules Review Contents
SP Module 4 the Source-Filter Model
In the frequency domain, periodic signals have harmonic structure: they contain energy only at multiples of their fundamental frequency.
杨丝儿
2022/11/10
4210
SP Module 4 the Source-Filter Model
SP Module 2 – Acoustics of Consonants and Vowels
The waveform and a definition of the fundamental period.
杨丝儿
2022/11/10
2670
SP Module 2 – Acoustics of Consonants and Vowels
SP Module 6 Speech Synthesis – Waveform Generation and Connected Speech
Phones are not a suitable unit for waveform concatenation, so we used diphones, which capture co-articulation.
杨丝儿
2022/11/24
4580
SP Module 6 Speech Synthesis – Waveform Generation and Connected Speech
信号与系统领域的英语单词
这是去年暑假帮老师给下一届学弟学妹们整理的一份英文单词表,因为在上数字信号处理这门课时,我们所有的讲义和教材都是英文的,老师希望整理出来给学生们记忆。而我 9 月份又要重新上一遍这门必修课,整理出来也便于自己记忆。
caoqi95
2019/03/27
1.9K0
信号与系统领域的英语单词
金融/语音/音频处理学术速递[10.19]
【1】 Sector Volatility Prediction Performance Using GARCH Models and Artificial Neural Networks 标题:基于GARCH模型和人工神经网络的行业波动性预测性能 链接:https://arxiv.org/abs/2110.09489
公众号-arXiv每日学术速递
2021/10/21
1.3K0
AnalogML - a power-intelligent analog computing architecture
Most choices in life have tradeoffs. If you want to renovate your house, you might find a superb general contractor who is expensive or a meh general contractor who’s cheap, but how often can you find a general contractor who does both excellent work and
用户6026865
2022/09/02
3140
AnalogML -  a power-intelligent analog computing architecture
金融/语音/音频处理学术速递[6.17]
【1】 The Economic Impact of Critical National Infrastructure Failure Due to Space Weather 标题:空间天气导致的国家重大基础设施故障的经济影响
公众号-arXiv每日学术速递
2021/07/02
1K0
金融/语音/音频处理学术速递[11.11]
【1】 The Local Economic Impact of Mineral Mining in Africa: Evidence from Four Decades of Satellite Imagery 标题:非洲矿业对当地经济的影响:来自40年卫星图像的证据 链接:https://arxiv.org/abs/2111.05783
公众号-arXiv每日学术速递
2021/11/17
3270
Simulation of SAW & BAW Resonators for RF Filters
In this blog post, we’ll have a look at some of the basics of electronic resonators such as: what is a resonator, the types of resonators that are most used on the market, the basic physical principles, the most important characteristics, and how to get all of that with simulation using OnScale.
海大指南针
2022/12/20
5570
Simulation of SAW & BAW Resonators for RF Filters
金融/语音/音频处理学术速递[10.20]
【1】 Continual self-training with bootstrapped remixing for speech enhancement 标题:用于语音增强的自举混音连续自我训练 链接:https://arxiv.org/abs/2110.10103
公众号-arXiv每日学术速递
2021/10/22
6800
【论文推荐】最新八篇生成对抗网络相关论文—离散数据生成、设计灵感、语音波形合成、去模糊、视觉描述、语音转换、对齐方法、注意力
【导读】专知内容组整理了最近八篇生成对抗网络(Generative Adversarial Networks )相关文章,为大家进行介绍,欢迎查看! 1.Correlated discrete data generation using adversarial training(使用对抗训练的相关离散数据生成) ---- ---- 作者:Shreyas Patel,Ashutosh Kakadiya,Maitrey Mehta,Raj Derasari,Rahul Patel,Ratnik Gandhi 机
WZEARW
2018/04/13
1.1K0
【论文推荐】最新八篇生成对抗网络相关论文—离散数据生成、设计灵感、语音波形合成、去模糊、视觉描述、语音转换、对齐方法、注意力
SP Module 8 Speech Recognition & Feature Engineering
Gaussian distribution of classification result of feature vector
杨丝儿
2022/11/24
2450
SP Module 8 Speech Recognition & Feature Engineering
金融/语音/音频处理学术速递[11.10]
q-fin金融,共计5篇 cs.SD语音,共计10篇 eess.AS音频处理,共计10篇 1.q-fin金融: 【1】 Do Firearm Markets Comply with Firearm R
公众号-arXiv每日学术速递
2021/11/17
5010
端到端声源分离研究:现状、进展和未来
什么是端到端音源分离呢?罗艺老师首先介绍了端到端音源分离的定义。从名称来看,端到端的含义是模型输入源波形后直接输出目标波形,不需要进行傅里叶变换将时域信号转换至频域;音源分离的含义是将混合语音中的两个或多个声源分离出来。
深蓝学院
2020/09/07
2.9K0
端到端声源分离研究:现状、进展和未来
金融/语音/音频处理学术速递[11.8]
Update!H5支持摘要折叠,体验更佳!点击阅读原文访问arxivdaily.com,涵盖CS|物理|数学|经济|统计|金融|生物|电气领域,更有搜索、收藏等功能! q-fin金融,共计3篇 cs.
公众号-arXiv每日学术速递
2021/11/17
3170
☀️手把手教你Python+matplotlib模拟锁相放大器的原理以及工作过程☀️《❤️记得收藏❤️》
我们先讲讲锁相放大器的基本结构示于下方图 ,包括信号通道、参考通道、相敏检测器 PSD 和低通滤波器 LPF 等。 各个模块的基本功能描述如下:
苏州程序大白
2021/09/26
1.6K0
金融/语音/音频处理学术速递[12.14]
【1】 Multi-Asset Spot and Option Market Simulation 标题:多资产现货和期权市场仿真 链接:https://arxiv.org/abs/2112.06823
公众号-arXiv每日学术速递
2021/12/17
6940
金融/语音/音频处理学术速递[7.13]
【1】 Investor Behavior Modeling by Analyzing Financial Advisor Notes: A Machine Learning Perspective 标题:基于机器学习的财务顾问笔记分析投资者行为建模
公众号-arXiv每日学术速递
2021/07/27
9330
金融/语音/音频处理学术速递[9.9]
【1】 Behavioral Bias Benefits: Beating Benchmarks By Bundling Bouncy Baskets 标题:行为偏差的好处:通过捆绑弹力篮子来击败基准 链接:https://arxiv.org/abs/2109.03740
公众号-arXiv每日学术速递
2021/09/16
6050
推荐阅读
相关推荐
SP Modules Review Contents
更多 >
领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档