Skip to main content
Fig. 7 | Computational Cognitive Science

Fig. 7

From: Transforming an embodied conversational agent into an efficient talking head: from keyframe-based animation to multimodal concatenation synthesis

Fig. 7

Schematic representation of the auditory-visual speech synthesis system. Given a new sentence to pronounce, the program acts as a MaryTTS client and asks for an audio signal corresponding to this sentence and to the list of phonemes with their duration (provided by the prosodic module embedded in the MaryTTS software). From the list of phonemes, a second program performs the visual synthesis. It searches the best series of diphones given selection and concatenation costs in the multimodal dictionary. This series is processed to match the expected duration and minimizes the gaps at each boundary. Finally, the acoustic signal and the variation of the articulatory parameters are passed to the 3D Player which animates the ECA accordingly

Back to article page