- MCP server with stdio transport for local use - Search episodes, transcripts, hosts, and series - 4,511 episodes with metadata and transcripts - Data loader with in-memory JSON storage 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
190 lines
16 KiB
Plaintext
190 lines
16 KiB
Plaintext
Episode: 2792
|
|
Title: HPR2792: Playing around with text to speech synthesis on Linux
|
|
Source: https://hub.hackerpublicradio.org/ccdn.php?filename=/eps/hpr2792/hpr2792.mp3
|
|
Transcribed: 2025-10-19 16:53:30
|
|
|
|
---
|
|
|
|
This is HBR episode 2007-192 entitled,
|
|
Playing Around with Tech to Speech
|
|
synthesis on Linux and in part on the series, soundcapes.
|
|
It is hosted by your own pattern and in about 20 minutes long and currently in a clean flag.
|
|
The summary is,
|
|
Playing around with different Tech to Speech programs to see what is possible.
|
|
This episode of HBR is brought to you by an Honesthost.com.
|
|
Get 15% discount on all shared hosting with the offer code HBR15.
|
|
That's HBR15.
|
|
Better web hosting that's Honest and Fair at An Honesthost.com.
|
|
Well, hello everybody. This is me again. You're on better in your friendly neighborhood geek,
|
|
whatever. I'm still self-employed doing Linux Python programming.
|
|
See that main geek stuff. Everything from mainframes via AS-400s.
|
|
I'm referencing here previous podcasts to Raspberry Pi's and well what not.
|
|
This one is about Tech to Speech synthesis.
|
|
I found myself contemplating about the intro to every HBR recording.
|
|
You know, the electronic voice, we all know and love.
|
|
Well, the latter is for me actually a little question.
|
|
Is this the best there is? I don't know.
|
|
I mean, we can choose to use it as a community for history sake or simply because we like the geeky
|
|
sound of it. I don't care. But I was just wondering, is this the best there is on Linux?
|
|
And well, I don't know. Why not fumble around a bit and see what I will find?
|
|
So I downloaded a bunch of Tech to Speech, TTS, software packages.
|
|
I downloaded some software from GitHub repositories and I started playing.
|
|
Well, to really compare things, we need to have a reference to our quest.
|
|
So the first one of course is to make a recording with eSpeak program and it is clearly the
|
|
one in usage HBR. It sounds like a robot but at the same time it's pretty good to understand.
|
|
I'll be it that the sound is rather mechanical. So this is what eSpeak makes of the intro
|
|
string that I'm using. This is HBR episode 2792 entitled Playing Around with Tech to Speech
|
|
Synthesis on Linux and in part on the series soundcapes. It is posted by your own partner and in about
|
|
20 minutes long and carry my clean flag. Well, I think you've recognized this sound, right?
|
|
So what I did is I made a small shell script that you can find in the show notes. I'll paste it
|
|
there and started to test out other programs as well. I defined the introductory string to this
|
|
podcast as my reference text and made several Tech to Speech programs saying it.
|
|
It seems that eSpeak can also do some Scottish and since I fell in love with the city of Edinburgh,
|
|
I wanted to give it a shot. So here goes. This is HBR episode 2792 entitled Playing Around with
|
|
Tech to Speech Synthesis on Linux and in part on the series soundcapes. It is posted by your
|
|
own partner and in about 20 minutes long and carry my clean flag. Well, what's there to say?
|
|
I mean, yeah, it does. It does have a distant sounding Scottish accent but it's way in the
|
|
distance. It's it's at the horizon. So it's well, I wouldn't exactly call it exercise. Okay.
|
|
Anyway, is there more? Well, yeah, eSpeak also has three male us voices and well,
|
|
I'm just putting them one after the other. So and so I won't blabble a long time about it. But
|
|
again, here is eSpeak with a male us voices one, two, three.
|
|
This is HBR episode 2792 entitled Playing Around with Tech to Speech Synthesis on Linux
|
|
and in part of the series soundcapes. It is posted by your own partner and in about 20 minutes
|
|
and carry the clean flag. This is HBR episode 2792 entitled Playing Around with Tech
|
|
to Speech Synthesis on Linux and in part of the series soundcapes. It is posted by your own
|
|
partner and in about 20 minutes and carry the clean flag. This is HBR episode 2792 entitled Playing
|
|
Around with Tech to Speech Synthesis on Linux and in part of the series soundcapes. It is posted
|
|
by your own partner and in about 20 minutes and carry the clean flag. Well, I guess it's best
|
|
not to dwell too much about this I think. Clearly from the eSpeak point of view, HBR has chosen
|
|
the best alternative there is and all the others are okay. Yeah, pathetic is the word that comes
|
|
to mind and I'm dutch. So I'm not even native. I guess you can think of other words to
|
|
to value this this quality sound. Okay. Well, the next program I found was F-light F-L-I-T-E
|
|
and as you can hear, it's not really an improvement but you know, for comparison's sake,
|
|
here goes again. This is HBR episode 2792 entitled Playing Around with Tech to Speech Synthesis
|
|
on Linux and is part of the series soundcapes. It is hosted by your own partner and is about 20
|
|
minutes long and carries a clean flag. Yeah, I know. Well, the upside to this is I hope that someday
|
|
I will be just as old as this guy. I mean, 180, you know, so I can spend every day after retirement
|
|
hacking into computers for the rest of my life for another 100 years. I mean, what's not to like?
|
|
Okay, but although this sounds pretty, well, okay, not so good, there's also a female alternative
|
|
and it's well, although better still a long way from pleasant and well, being the sucker for a
|
|
nice womanly voice that I am, I'm going to treat you to this one as well. This is HBR episode 2792
|
|
entitled Playing Around with Tech to Speech Synthesis on Linux and is part of the series soundcapes.
|
|
It is hosted by your own partner and is about 20 minutes long and carries a clean flag.
|
|
I know. This is definitely not somebody I would fall in love with but it's, well, you can hear
|
|
that woman that at least. Okay, so after this, I wanted to try out a program called Festival.
|
|
Everybody says it's the best around so let's try this out. It clearly is an advanced system where
|
|
you can play with speech synthesis to your heart's content and it's, well, it can be pretty complex
|
|
if you wanted to. But I don't want to do that. I don't have the knowledge how to tweak such a
|
|
system, nor do I have the inclination. I just want to compare some programs and see
|
|
here how they sound. Now Festival comes with several voices. I tried out a bunch of them
|
|
and this is what I got. The first voice, well, it does not bowed well. So again, here goes.
|
|
This is HBR episode 2792 entitled Playing Around with Tech to Speech Synthesis on Linux
|
|
and is part of the series soundscapes. It is hosted by Ern Beton and is about 20 minutes long
|
|
and carries a clean flag. Yeah, I know. It's in a way it's reminiscent of the old guy's voice
|
|
we heard earlier, but it's a little slower. It's a little clearer and well, it doesn't do it for
|
|
me yet. Anyway, the second one of Festival is a female voice and I'll tell you this in
|
|
advance, it sounds a whole lot better. This is HBR episode 2792 and titled Playing Around with
|
|
Text to Speech Synthesis on Linux and is part of the series soundscapes. It is hosted by
|
|
Ern Beton and is about 20 minutes long and carries a clean flag. I know. It's still not
|
|
this female voice that you that would tickle your fans, I believe they say, or maybe it doesn't
|
|
tickle. I don't know. Anyway, using the voice selection, I also tried out four different US
|
|
male voices and just for the sake of brevity, I will put them here again one after the other.
|
|
So I'll shut up for a while and let you enjoy these four voices.
|
|
This is HBR episode 2792 entitled Playing Around with Text to Speech Synthesis on Linux and is
|
|
part of the series soundscapes. It is hosted by Ern Beton and is about 20 minutes long and
|
|
carries a clean flag. This is HBR episode 2792 entitled Playing Around with Text to Speech
|
|
Synthesis on Linux and is part of the series soundscapes. It is hosted by Ern Beton and is about
|
|
20 minutes long and carries a clean flag. This is HBR episode 2792 entitled Playing Around with
|
|
Text to Speech Synthesis on Linux and is part of the series soundscapes. It is hosted by Ern Beton
|
|
and is about 20 minutes long and carries a clean flag. This is HBR episode 2792 entitled Playing
|
|
Around with Text to Speech Synthesis on Linux and is part of the series soundscapes. It is hosted
|
|
by Ern Beton and is about 20 minutes long and carries a clean flag.
|
|
Yeah, for all this festival program is cracked up to be. I find it boring out of my skull
|
|
the way they sound. But there is a funny thing to this because you can even make festivals sing.
|
|
Well, don't get your hopes up. I mean, even I sing better, but mechanical singing is
|
|
possible. So just a little skill of festivities coming your way.
|
|
Yeah, it's it's the life of the party and the greatest hits album is just around the corner
|
|
almost done. Well, then so that's it for festival. Well, then I stumbled upon this next gem. It's
|
|
it's from the Knusstep project and program is simply called Say. That's it. Say. Downside is
|
|
that it has no command line argument to say to save a file, but it's open source. So if you like
|
|
to save to a file, it's pretty easy to implement. But I find a Ubuntu PPI called Sound Recorder.
|
|
And that is what it does. It just records whatever you send out through your speaker line
|
|
and put it into a way file. So I started Sound Recorder, hit the record button,
|
|
and then issued a say command on the command line and voila. After hitting stop recording,
|
|
I had managed to grab myself the audio. And the result, as you might expect, sounds just like this.
|
|
This is the HPR episode 2792 entitled playing around with X to speech synthesis. Fun Linux.
|
|
And as part of the three soundscapes, it is hosted by Erwin Barton and is about 20 minutes long
|
|
and carries a clean flag. So it's clearly not perfect, but well, like I said for simple say,
|
|
it's I mean, it's it's a three-letter command. What else did you expect?
|
|
So next I started, well, next I started looking using a search engine because I was at the end
|
|
of my ropes looking at whatever was available in the standard distribution of links I was using.
|
|
Ubuntu in this case. So I started using a search engine. Was there maybe something not in the
|
|
standard repositories? That's how I find, for instance, the Merlin project on GitHub. It needs
|
|
G++ and Python and after compilation, it even takes over six hours to have Python and NumBuy
|
|
due to machine learn a model to use. After all this time, expectations were high. I mean, it took
|
|
I believe some whatever sound voice sounds and put them into a machine learning model and
|
|
after that, wow, I would expect a pretty decent sound. So I tried to get it to speak my own strings,
|
|
but I just couldn't get it to work. And after fumbling around with it for several hours,
|
|
I finally gave up. Yeah, there is a command line command there that you should just feed a text file
|
|
and it would speak, but it didn't work. So here's one of their own standard tests,
|
|
well, demonstration way files. And well, you judge for yourself.
|
|
He read his fragments aloud. Typhoid did I tell you, but she had become an automaton.
|
|
At the best, they were necessary accessories. You were making them talk shop, Ruth charged him.
|
|
Yeah, what can I say? I know, I was equally unimpressed. Well, that's when I ran into Mary
|
|
text to speak. Mary text to speak is written in Java. So after a download from GitHub and reading
|
|
the readme, I was able to build the thing after that I had to start a GUI tool and select
|
|
languages and voices I wanted to download. So for English, UK and US voices that meant downloading
|
|
almost one gig of data. After that, I could fire up the Mary applications web server. I opened
|
|
a browser window and I got me a screen and started typing in a text box, hitting the speak button
|
|
generated an audio file or audio stream. And a right click of that link save, save as got me a
|
|
way file. So I was fumbling a bit, but anyway, so it's got a building web server. You started,
|
|
you open a browser window, point to towards the web server. There is a text box on the screen,
|
|
you type in some text, you hit the speak button and it generates audio stream, right?
|
|
And if you right click, you can save as and that's the way to get a way file.
|
|
Well, the first voice that you can choose from is in English, as in UK, female voice and well,
|
|
you won't believe what you're going to hear now. This is HPR episode 2790 to entitled,
|
|
playing around with text to speech synthesis on Linux and is part of the series SoundScapes.
|
|
It is hosted by Jerem Botten and is about 20 minutes long and carries a clean flag.
|
|
Well, now, that's a voice that I could listen to. I'm not going to say this is the woman I want to
|
|
marry, except for the fact that she's digital and lives there somewhere in in in in Bitland,
|
|
but it's it's I do like the sound. It's it's almost human. It's clear and it's got some human
|
|
intonation into it. The second one, well, it's not really better, but for the sake of completeness,
|
|
let's hear this one as well. So this is the second female English women, well, female so women voice.
|
|
This is HPR episode 2790 to entitled, playing around with text to speech synthesis on Linux
|
|
and is part of the series SoundScapes. It is hosted by Jerem Botten and is about 20 minutes long
|
|
and carries a clean flag. Yeah, I know. It's it's not perfect, but at this stage, we are going to
|
|
discuss more the the fact that it's maybe too expressive than anything else. And compared to what
|
|
we've heard so far, I think we we we're becoming spoiled. So there's more in a married text speech.
|
|
Like I said, downloadable from GitHub. Let wait, you know, I'll just let you hear some some
|
|
other files that I made and without me blabbering in between. So after a few seconds, here we go.
|
|
This is HPR episode 2790 to entitled, playing around with text to speech synthesis on Linux
|
|
and is part of the series SoundScapes. It is hosted by Jerem Botten and is about 20 minutes long
|
|
and carries a clean flag. Yeah, I know. It just sounds like a grandmother, a young grandmother,
|
|
a grandmother and a grandmother nonetheless. Well, there's also a UK meal voice, but
|
|
well, you decide for yourself. This is HPR episode 2790 to entitled, playing around with text
|
|
to speech synthesis on Linux and is part of the series SoundScapes. It is hosted by Jerem Botten
|
|
and is about 20 minutes long and carries a clean flag. Well, so much for the UK meal voice.
|
|
Then there's one last one because we're coming to the end and it's a US female voice and
|
|
well, it sounds pretty business oriented. I don't know what it is, but that's just how I think she
|
|
sounds. And it's at the same time it's the last wave file that I have for you.
|
|
So let's do this one more time. This is HPR episode 2790 to entitled, playing around with text
|
|
to speech synthesis on Linux and is part of the series SoundScapes. It is hosted by Jerem Botten
|
|
and is about 20 minutes long and carries a clean flag. At the same time, she sounds really,
|
|
really depressing. So using this kind of speech synthesis in a voice recognition system,
|
|
all the mental health hotline, I don't think that's a really good idea. Probably best to use
|
|
real humans anyway. You know, the one with a beating heart and some sense and feeling and stuff.
|
|
The funny thing is when I made this podcast, I had to devise a string to test all the voices.
|
|
And I guessed it would take me about 20 minutes to make this recording. And well, guess what?
|
|
I even managed to make this one approximately 20 minutes long. So that's more luck than wisdom as
|
|
we say in the Netherlands. This is it for this time. I hope you liked it. I hope you liked all the
|
|
voices. The script I use is in the show notes and take care. Have a nice day.
|
|
You've been listening to Hacker Public Radio at HackerPublicRadio.org.
|
|
We are a community podcast network that releases shows every weekday, Monday through Friday.
|
|
Today's show, like all our shows, was contributed by an HPR listener like yourself.
|
|
If you ever thought of recording a podcast and click on our contributing,
|
|
to find out how easy it really is. Hacker Public Radio was founded by the Digital Dog
|
|
Pound and the Infonomicon Computer Club and is part of the binary revolution at binrev.com.
|
|
If you have comments on today's show, please email the host directly, leave a comment on the website
|
|
or record a follow-up episode yourself. Unless otherwise status, today's show is released under
|
|
Creative Commons, Attribution, ShareLite, 3.0 license.
|