Style-Code Method for Multi-Style Parametric Text-to-Speech Synthesis
Keywords:
text-to-speech synthesis, expressive speech synthesis, deep neural networks, speech style, style code, one-hot vectorAbstract
Modern text-to-speech systems generally achieve good intelligibility. The one of the main drawbacks of these systems is the lack of expressiveness in comparison to natural human speech. It is very unpleasant when automated system conveys positive and negative message in completely the same way. The introduction of parametric methods in speech synthesis gave possibility to easily change speaker characteristics and speaking styles. In this paper a simple method for incorporating styles into synthesized speech by using style codes is presented. The proposed method requires just a couple of minutes of target style and moderate amount of neutral speech. It is successfully applied to both hidden Markov models and deep neural networks-based synthesis, giving style code as additional input to the model. Listening tests confirmed that better style expressiveness is achieved by deep neural networks synthesis compared to hidden Markov model synthesis. It is also proved that quality of speech synthesized by deep neural networks in a certain style is comparable with the speech synthesized in neutral style, although the neutral-speech-database is about 10 times bigger. DNN based TTS with style codes are further investigated by comparing the quality of speech produced by single-style modeling and multi-style modeling systems. Objective and subjective measures confirmed that there is no significant difference between these two approaches.References
Published
2018-10-01
How to Cite
Suzić, S., Delić, T., Ostrogonac, S., Đurić, S., & Pekar, D. (2018). Style-Code Method for Multi-Style Parametric Text-to-Speech Synthesis. SPIIRAS Proceedings, 5(60), 216-240. https://doi.org/10.15622/sp.60.8
Section
Artificial Intelligence, Knowledge and Data Engineering
Authors who publish with this journal agree to the following terms:
Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).