Episode: 2792 Title: HPR2792: Playing around with text to speech synthesis on Linux Source: https://hub.hackerpublicradio.org/ccdn.php?filename=/eps/hpr2792/hpr2792.mp3 Transcribed: 2025-10-19 16:53:30 --- This is HBR episode 2007-192 entitled, Playing Around with Tech to Speech synthesis on Linux and in part on the series, soundcapes. It is hosted by your own pattern and in about 20 minutes long and currently in a clean flag. The summary is, Playing around with different Tech to Speech programs to see what is possible. This episode of HBR is brought to you by an Honesthost.com. Get 15% discount on all shared hosting with the offer code HBR15. That's HBR15. Better web hosting that's Honest and Fair at An Honesthost.com. Well, hello everybody. This is me again. You're on better in your friendly neighborhood geek, whatever. I'm still self-employed doing Linux Python programming. See that main geek stuff. Everything from mainframes via AS-400s. I'm referencing here previous podcasts to Raspberry Pi's and well what not. This one is about Tech to Speech synthesis. I found myself contemplating about the intro to every HBR recording. You know, the electronic voice, we all know and love. Well, the latter is for me actually a little question. Is this the best there is? I don't know. I mean, we can choose to use it as a community for history sake or simply because we like the geeky sound of it. I don't care. But I was just wondering, is this the best there is on Linux? And well, I don't know. Why not fumble around a bit and see what I will find? So I downloaded a bunch of Tech to Speech, TTS, software packages. I downloaded some software from GitHub repositories and I started playing. Well, to really compare things, we need to have a reference to our quest. So the first one of course is to make a recording with eSpeak program and it is clearly the one in usage HBR. It sounds like a robot but at the same time it's pretty good to understand. I'll be it that the sound is rather mechanical. So this is what eSpeak makes of the intro string that I'm using. This is HBR episode 2792 entitled Playing Around with Tech to Speech Synthesis on Linux and in part on the series soundcapes. It is posted by your own partner and in about 20 minutes long and carry my clean flag. Well, I think you've recognized this sound, right? So what I did is I made a small shell script that you can find in the show notes. I'll paste it there and started to test out other programs as well. I defined the introductory string to this podcast as my reference text and made several Tech to Speech programs saying it. It seems that eSpeak can also do some Scottish and since I fell in love with the city of Edinburgh, I wanted to give it a shot. So here goes. This is HBR episode 2792 entitled Playing Around with Tech to Speech Synthesis on Linux and in part on the series soundcapes. It is posted by your own partner and in about 20 minutes long and carry my clean flag. Well, what's there to say? I mean, yeah, it does. It does have a distant sounding Scottish accent but it's way in the distance. It's it's at the horizon. So it's well, I wouldn't exactly call it exercise. Okay. Anyway, is there more? Well, yeah, eSpeak also has three male us voices and well, I'm just putting them one after the other. So and so I won't blabble a long time about it. But again, here is eSpeak with a male us voices one, two, three. This is HBR episode 2792 entitled Playing Around with Tech to Speech Synthesis on Linux and in part of the series soundcapes. It is posted by your own partner and in about 20 minutes and carry the clean flag. This is HBR episode 2792 entitled Playing Around with Tech to Speech Synthesis on Linux and in part of the series soundcapes. It is posted by your own partner and in about 20 minutes and carry the clean flag. This is HBR episode 2792 entitled Playing Around with Tech to Speech Synthesis on Linux and in part of the series soundcapes. It is posted by your own partner and in about 20 minutes and carry the clean flag. Well, I guess it's best not to dwell too much about this I think. Clearly from the eSpeak point of view, HBR has chosen the best alternative there is and all the others are okay. Yeah, pathetic is the word that comes to mind and I'm dutch. So I'm not even native. I guess you can think of other words to to value this this quality sound. Okay. Well, the next program I found was F-light F-L-I-T-E and as you can hear, it's not really an improvement but you know, for comparison's sake, here goes again. This is HBR episode 2792 entitled Playing Around with Tech to Speech Synthesis on Linux and is part of the series soundcapes. It is hosted by your own partner and is about 20 minutes long and carries a clean flag. Yeah, I know. Well, the upside to this is I hope that someday I will be just as old as this guy. I mean, 180, you know, so I can spend every day after retirement hacking into computers for the rest of my life for another 100 years. I mean, what's not to like? Okay, but although this sounds pretty, well, okay, not so good, there's also a female alternative and it's well, although better still a long way from pleasant and well, being the sucker for a nice womanly voice that I am, I'm going to treat you to this one as well. This is HBR episode 2792 entitled Playing Around with Tech to Speech Synthesis on Linux and is part of the series soundcapes. It is hosted by your own partner and is about 20 minutes long and carries a clean flag. I know. This is definitely not somebody I would fall in love with but it's, well, you can hear that woman that at least. Okay, so after this, I wanted to try out a program called Festival. Everybody says it's the best around so let's try this out. It clearly is an advanced system where you can play with speech synthesis to your heart's content and it's, well, it can be pretty complex if you wanted to. But I don't want to do that. I don't have the knowledge how to tweak such a system, nor do I have the inclination. I just want to compare some programs and see here how they sound. Now Festival comes with several voices. I tried out a bunch of them and this is what I got. The first voice, well, it does not bowed well. So again, here goes. This is HBR episode 2792 entitled Playing Around with Tech to Speech Synthesis on Linux and is part of the series soundscapes. It is hosted by Ern Beton and is about 20 minutes long and carries a clean flag. Yeah, I know. It's in a way it's reminiscent of the old guy's voice we heard earlier, but it's a little slower. It's a little clearer and well, it doesn't do it for me yet. Anyway, the second one of Festival is a female voice and I'll tell you this in advance, it sounds a whole lot better. This is HBR episode 2792 and titled Playing Around with Text to Speech Synthesis on Linux and is part of the series soundscapes. It is hosted by Ern Beton and is about 20 minutes long and carries a clean flag. I know. It's still not this female voice that you that would tickle your fans, I believe they say, or maybe it doesn't tickle. I don't know. Anyway, using the voice selection, I also tried out four different US male voices and just for the sake of brevity, I will put them here again one after the other. So I'll shut up for a while and let you enjoy these four voices. This is HBR episode 2792 entitled Playing Around with Text to Speech Synthesis on Linux and is part of the series soundscapes. It is hosted by Ern Beton and is about 20 minutes long and carries a clean flag. This is HBR episode 2792 entitled Playing Around with Text to Speech Synthesis on Linux and is part of the series soundscapes. It is hosted by Ern Beton and is about 20 minutes long and carries a clean flag. This is HBR episode 2792 entitled Playing Around with Text to Speech Synthesis on Linux and is part of the series soundscapes. It is hosted by Ern Beton and is about 20 minutes long and carries a clean flag. This is HBR episode 2792 entitled Playing Around with Text to Speech Synthesis on Linux and is part of the series soundscapes. It is hosted by Ern Beton and is about 20 minutes long and carries a clean flag. Yeah, for all this festival program is cracked up to be. I find it boring out of my skull the way they sound. But there is a funny thing to this because you can even make festivals sing. Well, don't get your hopes up. I mean, even I sing better, but mechanical singing is possible. So just a little skill of festivities coming your way. Yeah, it's it's the life of the party and the greatest hits album is just around the corner almost done. Well, then so that's it for festival. Well, then I stumbled upon this next gem. It's it's from the Knusstep project and program is simply called Say. That's it. Say. Downside is that it has no command line argument to say to save a file, but it's open source. So if you like to save to a file, it's pretty easy to implement. But I find a Ubuntu PPI called Sound Recorder. And that is what it does. It just records whatever you send out through your speaker line and put it into a way file. So I started Sound Recorder, hit the record button, and then issued a say command on the command line and voila. After hitting stop recording, I had managed to grab myself the audio. And the result, as you might expect, sounds just like this. This is the HPR episode 2792 entitled playing around with X to speech synthesis. Fun Linux. And as part of the three soundscapes, it is hosted by Erwin Barton and is about 20 minutes long and carries a clean flag. So it's clearly not perfect, but well, like I said for simple say, it's I mean, it's it's a three-letter command. What else did you expect? So next I started, well, next I started looking using a search engine because I was at the end of my ropes looking at whatever was available in the standard distribution of links I was using. Ubuntu in this case. So I started using a search engine. Was there maybe something not in the standard repositories? That's how I find, for instance, the Merlin project on GitHub. It needs G++ and Python and after compilation, it even takes over six hours to have Python and NumBuy due to machine learn a model to use. After all this time, expectations were high. I mean, it took I believe some whatever sound voice sounds and put them into a machine learning model and after that, wow, I would expect a pretty decent sound. So I tried to get it to speak my own strings, but I just couldn't get it to work. And after fumbling around with it for several hours, I finally gave up. Yeah, there is a command line command there that you should just feed a text file and it would speak, but it didn't work. So here's one of their own standard tests, well, demonstration way files. And well, you judge for yourself. He read his fragments aloud. Typhoid did I tell you, but she had become an automaton. At the best, they were necessary accessories. You were making them talk shop, Ruth charged him. Yeah, what can I say? I know, I was equally unimpressed. Well, that's when I ran into Mary text to speak. Mary text to speak is written in Java. So after a download from GitHub and reading the readme, I was able to build the thing after that I had to start a GUI tool and select languages and voices I wanted to download. So for English, UK and US voices that meant downloading almost one gig of data. After that, I could fire up the Mary applications web server. I opened a browser window and I got me a screen and started typing in a text box, hitting the speak button generated an audio file or audio stream. And a right click of that link save, save as got me a way file. So I was fumbling a bit, but anyway, so it's got a building web server. You started, you open a browser window, point to towards the web server. There is a text box on the screen, you type in some text, you hit the speak button and it generates audio stream, right? And if you right click, you can save as and that's the way to get a way file. Well, the first voice that you can choose from is in English, as in UK, female voice and well, you won't believe what you're going to hear now. This is HPR episode 2790 to entitled, playing around with text to speech synthesis on Linux and is part of the series SoundScapes. It is hosted by Jerem Botten and is about 20 minutes long and carries a clean flag. Well, now, that's a voice that I could listen to. I'm not going to say this is the woman I want to marry, except for the fact that she's digital and lives there somewhere in in in in Bitland, but it's it's I do like the sound. It's it's almost human. It's clear and it's got some human intonation into it. The second one, well, it's not really better, but for the sake of completeness, let's hear this one as well. So this is the second female English women, well, female so women voice. This is HPR episode 2790 to entitled, playing around with text to speech synthesis on Linux and is part of the series SoundScapes. It is hosted by Jerem Botten and is about 20 minutes long and carries a clean flag. Yeah, I know. It's it's not perfect, but at this stage, we are going to discuss more the the fact that it's maybe too expressive than anything else. And compared to what we've heard so far, I think we we we're becoming spoiled. So there's more in a married text speech. Like I said, downloadable from GitHub. Let wait, you know, I'll just let you hear some some other files that I made and without me blabbering in between. So after a few seconds, here we go. This is HPR episode 2790 to entitled, playing around with text to speech synthesis on Linux and is part of the series SoundScapes. It is hosted by Jerem Botten and is about 20 minutes long and carries a clean flag. Yeah, I know. It just sounds like a grandmother, a young grandmother, a grandmother and a grandmother nonetheless. Well, there's also a UK meal voice, but well, you decide for yourself. This is HPR episode 2790 to entitled, playing around with text to speech synthesis on Linux and is part of the series SoundScapes. It is hosted by Jerem Botten and is about 20 minutes long and carries a clean flag. Well, so much for the UK meal voice. Then there's one last one because we're coming to the end and it's a US female voice and well, it sounds pretty business oriented. I don't know what it is, but that's just how I think she sounds. And it's at the same time it's the last wave file that I have for you. So let's do this one more time. This is HPR episode 2790 to entitled, playing around with text to speech synthesis on Linux and is part of the series SoundScapes. It is hosted by Jerem Botten and is about 20 minutes long and carries a clean flag. At the same time, she sounds really, really depressing. So using this kind of speech synthesis in a voice recognition system, all the mental health hotline, I don't think that's a really good idea. Probably best to use real humans anyway. You know, the one with a beating heart and some sense and feeling and stuff. The funny thing is when I made this podcast, I had to devise a string to test all the voices. And I guessed it would take me about 20 minutes to make this recording. And well, guess what? I even managed to make this one approximately 20 minutes long. So that's more luck than wisdom as we say in the Netherlands. This is it for this time. I hope you liked it. I hope you liked all the voices. The script I use is in the show notes and take care. Have a nice day. You've been listening to Hacker Public Radio at HackerPublicRadio.org. We are a community podcast network that releases shows every weekday, Monday through Friday. Today's show, like all our shows, was contributed by an HPR listener like yourself. If you ever thought of recording a podcast and click on our contributing, to find out how easy it really is. Hacker Public Radio was founded by the Digital Dog Pound and the Infonomicon Computer Club and is part of the binary revolution at binrev.com. If you have comments on today's show, please email the host directly, leave a comment on the website or record a follow-up episode yourself. Unless otherwise status, today's show is released under Creative Commons, Attribution, ShareLite, 3.0 license.