6.4语音合成 语音合成,又称文语转换(Text To Speech, TTS),是一种可以将任意输入文本转换成相应语音的技术。是将人类语音用人工的方式所产生,能将任意文字信息实时转化为标准流畅的语音朗读出来,相当于给机器装上了人工嘴巴。它涉及声学、语言学、数字信号处理、计算机科学等多个学科技术,是信息处理领域的一项前沿技术,解决的主要问题就是如何将文字信息转化为可听的声音信息,也即让机器像人一样开口说话。使用语音合成对声纹识别攻击应该属于语音合成一个极端的应用,首先通过大量数据进行预训练,训练出效果较好的TTS系统(可能包括vocoder,synthesizer,encode,decode),然后对目标人物语音数据进行收集,标注,制作相关的训练集,对该系统模型进行fine-tuning,最终产出一个对目标人员高度拟合的TTS系统,基于此系统进行语音合成,对声纹识别系统进行攻击,该方法对声纹识别系统欺骗能达到较好的效果。下图是语音合成的一个通用结构:
6.5对抗样本攻击
6.结论
References
D Sztahó,G Szaszák,A Beke.Deep learning methods in speaker recognition: a revie Guangke Chen, Sen Chen, Lingling Fan, Xiaoning Du, Zhe Zhao, Fu Song, Yang Liu.Who is Real Bob? Adversarial Attacks on Speaker Recognition Systems Zhongxin Bai, Xiao-Lei Zhang. Speaker Recognition Based on Deep Learning: An Overview Rohan Kumar Das1, Xiaohai Tian1, Tomi Kinnunen2and Haizhou Li .The Attacker’s Perspective on Automatic Speaker Verification: An Overview Zhaoxi Mu,Xinyu Yang,Yizhuo Dong. Review of end-to-end speech synthesis technology based on deep learning Shipra J. Arora,Rishi Pal Singh. Automatic Speech Recognition: A Review http://smarthome.qianjia.com/html/2021-10/11_384248.html https://blog.csdn.net/u013625492/article/details/109715387 https://blog.csdn.net/qq_41571456/category_9625985.html https://blog.csdn.net/qq_40168949/article/details/88424878 https://zhuanlan.zhihu.com/p/34440000 https://www.jianshu.com/p/19d34b19517b https://zhuanlan.zhihu.com/p/67563275?ivk_sa=1024320u https://blog.csdn.net/qq_36653505/article/details/85082746 https://www.zhihu.com/question/53707809/answer/316946465 https://blog.csdn.net/jojozhangju/article/details/78637118 https://blog.csdn.net/qq_36653505/article/details/85082746 http://www.doc88.com/p-20759537276528.html https://www.zhihu.com/tardis/sogou/qus/30141460 https://blog.csdn.net/YZhang0108/article/details/105862743 https://baijiahao.baidu.com/s?id=1666466767203759018&wfr=spider&for=pc https://tieba.baidu.com/p/6420955423 https://blog.csdn.net/weixin_39059031/article/details/106181409