Episode: 2792
Title: HPR2792: Playing around with text to speech synthesis on Linux
Source: https://hub.hackerpublicradio.org/ccdn.php?filename=/eps/hpr2792/hpr2792.mp3
Transcribed: 2025-10-19 16:53:30

---

This is HBR episode 2007-192 entitled,
Playing Around with Tech to Speech
synthesis on Linux and in part on the series, soundcapes.
It is hosted by your own pattern and in about 20 minutes long and currently in a clean flag.
The summary is,
Playing around with different Tech to Speech programs to see what is possible.
This episode of HBR is brought to you by an Honesthost.com.
Get 15% discount on all shared hosting with the offer code HBR15.
That's HBR15.
Better web hosting that's Honest and Fair at An Honesthost.com.
Well, hello everybody. This is me again. You're on better in your friendly neighborhood geek,
whatever. I'm still self-employed doing Linux Python programming.
See that main geek stuff. Everything from mainframes via AS-400s.
I'm referencing here previous podcasts to Raspberry Pi's and well what not.
This one is about Tech to Speech synthesis.
I found myself contemplating about the intro to every HBR recording.
You know, the electronic voice, we all know and love.
Well, the latter is for me actually a little question.
Is this the best there is? I don't know.
I mean, we can choose to use it as a community for history sake or simply because we like the geeky
sound of it. I don't care. But I was just wondering, is this the best there is on Linux?
And well, I don't know. Why not fumble around a bit and see what I will find?
So I downloaded a bunch of Tech to Speech, TTS, software packages.
I downloaded some software from GitHub repositories and I started playing.
Well, to really compare things, we need to have a reference to our quest.
So the first one of course is to make a recording with eSpeak program and it is clearly the
one in usage HBR. It sounds like a robot but at the same time it's pretty good to understand.
I'll be it that the sound is rather mechanical. So this is what eSpeak makes of the intro
string that I'm using. This is HBR episode 2792 entitled Playing Around with Tech to Speech
Synthesis on Linux and in part on the series soundcapes. It is posted by your own partner and in about
20 minutes long and carry my clean flag. Well, I think you've recognized this sound, right?
So what I did is I made a small shell script that you can find in the show notes. I'll paste it
there and started to test out other programs as well. I defined the introductory string to this
podcast as my reference text and made several Tech to Speech programs saying it.
It seems that eSpeak can also do some Scottish and since I fell in love with the city of Edinburgh,
I wanted to give it a shot. So here goes. This is HBR episode 2792 entitled Playing Around with
Tech to Speech Synthesis on Linux and in part on the series soundcapes. It is posted by your
own partner and in about 20 minutes long and carry my clean flag. Well, what's there to say?
I mean, yeah, it does. It does have a distant sounding Scottish accent but it's way in the
distance. It's it's at the horizon. So it's well, I wouldn't exactly call it exercise. Okay.
Anyway, is there more? Well, yeah, eSpeak also has three male us voices and well,
I'm just putting them one after the other. So and so I won't blabble a long time about it. But
again, here is eSpeak with a male us voices one, two, three.
This is HBR episode 2792 entitled Playing Around with Tech to Speech Synthesis on Linux
and in part of the series soundcapes. It is posted by your own partner and in about 20 minutes
and carry the clean flag. This is HBR episode 2792 entitled Playing Around with Tech
to Speech Synthesis on Linux and in part of the series soundcapes. It is posted by your own
partner and in about 20 minutes and carry the clean flag. This is HBR episode 2792 entitled Playing
Around with Tech to Speech Synthesis on Linux and in part of the series soundcapes. It is posted
by your own partner and in about 20 minutes and carry the clean flag. Well, I guess it's best
not to dwell too much about this I think. Clearly from the eSpeak point of view, HBR has chosen
the best alternative there is and all the others are okay. Yeah, pathetic is the word that comes
to mind and I'm dutch. So I'm not even native. I guess you can think of other words to
to value this this quality sound. Okay. Well, the next program I found was F-light F-L-I-T-E
and as you can hear, it's not really an improvement but you know, for comparison's sake,
here goes again. This is HBR episode 2792 entitled Playing Around with Tech to Speech Synthesis
on Linux and is part of the series soundcapes. It is hosted by your own partner and is about 20
minutes long and carries a clean flag. Yeah, I know. Well, the upside to this is I hope that someday
I will be just as old as this guy. I mean, 180, you know, so I can spend every day after retirement
hacking into computers for the rest of my life for another 100 years. I mean, what's not to like?
Okay, but although this sounds pretty, well, okay, not so good, there's also a female alternative
and it's well, although better still a long way from pleasant and well, being the sucker for a
nice womanly voice that I am, I'm going to treat you to this one as well. This is HBR episode 2792
entitled Playing Around with Tech to Speech Synthesis on Linux and is part of the series soundcapes.
It is hosted by your own partner and is about 20 minutes long and carries a clean flag.
I know. This is definitely not somebody I would fall in love with but it's, well, you can hear
that woman that at least. Okay, so after this, I wanted to try out a program called Festival.
Everybody says it's the best around so let's try this out. It clearly is an advanced system where
you can play with speech synthesis to your heart's content and it's, well, it can be pretty complex
if you wanted to. But I don't want to do that. I don't have the knowledge how to tweak such a
system, nor do I have the inclination. I just want to compare some programs and see
here how they sound. Now Festival comes with several voices. I tried out a bunch of them
and this is what I got. The first voice, well, it does not bowed well. So again, here goes.
This is HBR episode 2792 entitled Playing Around with Tech to Speech Synthesis on Linux
and is part of the series soundscapes. It is hosted by Ern Beton and is about 20 minutes long
and carries a clean flag. Yeah, I know. It's in a way it's reminiscent of the old guy's voice
we heard earlier, but it's a little slower. It's a little clearer and well, it doesn't do it for
me yet. Anyway, the second one of Festival is a female voice and I'll tell you this in
advance, it sounds a whole lot better. This is HBR episode 2792 and titled Playing Around with
Text to Speech Synthesis on Linux and is part of the series soundscapes. It is hosted by
Ern Beton and is about 20 minutes long and carries a clean flag. I know. It's still not
this female voice that you that would tickle your fans, I believe they say, or maybe it doesn't
tickle. I don't know. Anyway, using the voice selection, I also tried out four different US
male voices and just for the sake of brevity, I will put them here again one after the other.
So I'll shut up for a while and let you enjoy these four voices.
This is HBR episode 2792 entitled Playing Around with Text to Speech Synthesis on Linux and is
part of the series soundscapes. It is hosted by Ern Beton and is about 20 minutes long and
carries a clean flag. This is HBR episode 2792 entitled Playing Around with Text to Speech
Synthesis on Linux and is part of the series soundscapes. It is hosted by Ern Beton and is about
20 minutes long and carries a clean flag. This is HBR episode 2792 entitled Playing Around with
Text to Speech Synthesis on Linux and is part of the series soundscapes. It is hosted by Ern Beton
and is about 20 minutes long and carries a clean flag. This is HBR episode 2792 entitled Playing
Around with Text to Speech Synthesis on Linux and is part of the series soundscapes. It is hosted
by Ern Beton and is about 20 minutes long and carries a clean flag.
Yeah, for all this festival program is cracked up to be. I find it boring out of my skull
the way they sound. But there is a funny thing to this because you can even make festivals sing.
Well, don't get your hopes up. I mean, even I sing better, but mechanical singing is
possible. So just a little skill of festivities coming your way.
Yeah, it's it's the life of the party and the greatest hits album is just around the corner
almost done. Well, then so that's it for festival. Well, then I stumbled upon this next gem. It's
it's from the Knusstep project and program is simply called Say. That's it. Say. Downside is
that it has no command line argument to say to save a file, but it's open source. So if you like
to save to a file, it's pretty easy to implement. But I find a Ubuntu PPI called Sound Recorder.
And that is what it does. It just records whatever you send out through your speaker line
and put it into a way file. So I started Sound Recorder, hit the record button,
and then issued a say command on the command line and voila. After hitting stop recording,
I had managed to grab myself the audio. And the result, as you might expect, sounds just like this.
This is the HPR episode 2792 entitled playing around with X to speech synthesis. Fun Linux.
And as part of the three soundscapes, it is hosted by Erwin Barton and is about 20 minutes long
and carries a clean flag. So it's clearly not perfect, but well, like I said for simple say,
it's I mean, it's it's a three-letter command. What else did you expect?
So next I started, well, next I started looking using a search engine because I was at the end
of my ropes looking at whatever was available in the standard distribution of links I was using.
Ubuntu in this case. So I started using a search engine. Was there maybe something not in the
standard repositories? That's how I find, for instance, the Merlin project on GitHub. It needs
G++ and Python and after compilation, it even takes over six hours to have Python and NumBuy
due to machine learn a model to use. After all this time, expectations were high. I mean, it took
I believe some whatever sound voice sounds and put them into a machine learning model and
after that, wow, I would expect a pretty decent sound. So I tried to get it to speak my own strings,
but I just couldn't get it to work. And after fumbling around with it for several hours,
I finally gave up. Yeah, there is a command line command there that you should just feed a text file
and it would speak, but it didn't work. So here's one of their own standard tests,
well, demonstration way files. And well, you judge for yourself.
He read his fragments aloud. Typhoid did I tell you, but she had become an automaton.
At the best, they were necessary accessories. You were making them talk shop, Ruth charged him.
Yeah, what can I say? I know, I was equally unimpressed. Well, that's when I ran into Mary
text to speak. Mary text to speak is written in Java. So after a download from GitHub and reading
the readme, I was able to build the thing after that I had to start a GUI tool and select
languages and voices I wanted to download. So for English, UK and US voices that meant downloading
almost one gig of data. After that, I could fire up the Mary applications web server. I opened
a browser window and I got me a screen and started typing in a text box, hitting the speak button
generated an audio file or audio stream. And a right click of that link save, save as got me a
way file. So I was fumbling a bit, but anyway, so it's got a building web server. You started,
you open a browser window, point to towards the web server. There is a text box on the screen,
you type in some text, you hit the speak button and it generates audio stream, right?
And if you right click, you can save as and that's the way to get a way file.
Well, the first voice that you can choose from is in English, as in UK, female voice and well,
you won't believe what you're going to hear now. This is HPR episode 2790 to entitled,
playing around with text to speech synthesis on Linux and is part of the series SoundScapes.
It is hosted by Jerem Botten and is about 20 minutes long and carries a clean flag.
Well, now, that's a voice that I could listen to. I'm not going to say this is the woman I want to
marry, except for the fact that she's digital and lives there somewhere in in in in Bitland,
but it's it's I do like the sound. It's it's almost human. It's clear and it's got some human
intonation into it. The second one, well, it's not really better, but for the sake of completeness,
let's hear this one as well. So this is the second female English women, well, female so women voice.
This is HPR episode 2790 to entitled, playing around with text to speech synthesis on Linux
and is part of the series SoundScapes. It is hosted by Jerem Botten and is about 20 minutes long
and carries a clean flag. Yeah, I know. It's it's not perfect, but at this stage, we are going to
discuss more the the fact that it's maybe too expressive than anything else. And compared to what
we've heard so far, I think we we we're becoming spoiled. So there's more in a married text speech.
Like I said, downloadable from GitHub. Let wait, you know, I'll just let you hear some some
other files that I made and without me blabbering in between. So after a few seconds, here we go.
This is HPR episode 2790 to entitled, playing around with text to speech synthesis on Linux
and is part of the series SoundScapes. It is hosted by Jerem Botten and is about 20 minutes long
and carries a clean flag. Yeah, I know. It just sounds like a grandmother, a young grandmother,
a grandmother and a grandmother nonetheless. Well, there's also a UK meal voice, but
well, you decide for yourself. This is HPR episode 2790 to entitled, playing around with text
to speech synthesis on Linux and is part of the series SoundScapes. It is hosted by Jerem Botten
and is about 20 minutes long and carries a clean flag. Well, so much for the UK meal voice.
Then there's one last one because we're coming to the end and it's a US female voice and
well, it sounds pretty business oriented. I don't know what it is, but that's just how I think she
sounds. And it's at the same time it's the last wave file that I have for you.
So let's do this one more time. This is HPR episode 2790 to entitled, playing around with text
to speech synthesis on Linux and is part of the series SoundScapes. It is hosted by Jerem Botten
and is about 20 minutes long and carries a clean flag. At the same time, she sounds really,
really depressing. So using this kind of speech synthesis in a voice recognition system,
all the mental health hotline, I don't think that's a really good idea. Probably best to use
real humans anyway. You know, the one with a beating heart and some sense and feeling and stuff.
The funny thing is when I made this podcast, I had to devise a string to test all the voices.
And I guessed it would take me about 20 minutes to make this recording. And well, guess what?
I even managed to make this one approximately 20 minutes long. So that's more luck than wisdom as
we say in the Netherlands. This is it for this time. I hope you liked it. I hope you liked all the
voices. The script I use is in the show notes and take care. Have a nice day.
You've been listening to Hacker Public Radio at HackerPublicRadio.org.
We are a community podcast network that releases shows every weekday, Monday through Friday.
Today's show, like all our shows, was contributed by an HPR listener like yourself.
If you ever thought of recording a podcast and click on our contributing,
to find out how easy it really is. Hacker Public Radio was founded by the Digital Dog
Pound and the Infonomicon Computer Club and is part of the binary revolution at binrev.com.
If you have comments on today's show, please email the host directly, leave a comment on the website
or record a follow-up episode yourself. Unless otherwise status, today's show is released under
Creative Commons, Attribution, ShareLite, 3.0 license.