Phones are not a suitable unit for waveform concatenation, so we used diphones, which capture co-articulation.
Diphone starts at the middle of one phone and ends at the middle of the other.
Coarticulation is the overlapping of adjacent articulations or the influence of the target phoneme on surrounding phonemes. Middles of phones are more stable in their spectral properties than the edges, because of coarticulation. So, concatenating diphones should lead to smoother joins
Concatenation of waveforms is a simple way of making synthetic speech, but we need to take care about how we do it.
Cross-fading between two waveforms is an effective way to avoid some of the artefacts of concatenation.
This fundamental building block of speech waveforms offers a route to source-filter separation in the time domain.
Overlap of pitch period or impulse signal is observed.
extract pitch period (with taper window) for each pitch mark, and we make the time for each pitch period twice the T0T_0T0.
overlap to get the reconstruction signal similar to the original one. the whole process is called copy sentences.
Applying overlap-add techniques to pitch period waveforms allows the modification of F0 and duration without changing the phone identity.
Time-domain pitch-synchronous overlap-and-add
Pitch period closer to each other
Pitch period far apart from each other
make a copy of one pitch period and insert to the sequence.
delete one pitch period
Diphone synthesis:
Unit selection:
Choice of units to concatenate depends on:
A non-mathematical illustration of the equivalence of convolution (in the time domain), multiplication of magnitude spectra, and addition of log magnitude spectra.
扫码关注腾讯云开发者
领取腾讯云代金券
Copyright © 2013 - 2025 Tencent Cloud. All Rights Reserved. 腾讯云 版权所有
深圳市腾讯计算机系统有限公司 ICP备案/许可证号:粤B2-20090059 深公网安备号 44030502008569
腾讯云计算(北京)有限责任公司 京ICP证150476号 | 京ICP备11018762号 | 京公网安备号11010802020287
Copyright © 2013 - 2025 Tencent Cloud.
All Rights Reserved. 腾讯云 版权所有