hpr-knowledge-base/hpr_transcripts/hpr0021.txt

Episode: 21
Title: HPR0021: The Festival Speech Synthesis System
Source: https://hub.hackerpublicradio.org/ccdn.php?filename=/eps/hpr0021/hpr0021.mp3
Transcribed: 2025-10-07 10:22:48

---


Hello, this is Hacker Public Radio and my name is Dave and I'll be your host for today.
And I am going to talk about festival today.
Festival is a multilingual, synthetic speech synthesizer package.
It was developed at the Center for Speech Technology Research at the University of Edinburgh
and Carnegie Mellon University, amongst other places, there are a couple of other sites
that probably had something to do with it as well.
And it's licensed under ABSD-style license.
There are other pieces of the puzzle, not just festival gets lumped in in conversations
like this or topics like this.
First box is another suite of tools that make the building of synthetic voices for festival
more systematic and better documented.
And there are, I guess, voice packages that come with festival or that are designed for
festival and those that are additional voices developed by different entities that work
with festival.
To come to mind, the Inbrella Project and the HTS Project, and these can be used as back
ends for festival, although they both of these will require the use of a separate engine
to work.
And then, I guess, walking in with the MIPS, we have some front ends for festival.
Flight, F-L-I-T-E, I think it's how that's spelled, is a small footprint speech synthesis
based on festival, developed at Carnegie Mellon University, and it's built with Festbox.
And then there's some GUI front ends, TK Festival, of course, built with Tickle and Carnival,
which is one I have not used, but the screenshots are really nice.
It is a really nice looking graphical front end for festival.
So that's sort of the family of applications I'm going to be talking about.
Now, most people, when they think of computerized speech synthesis, or think of one of two things,
or one of a couple of our three things, they will think of accessibility in sight, less
computer users, people that need to have screen readers and need to have text read to them
because they can't see it, or they will think of the 1968 Masterpiece 2001 and Hal 9000,
or they will think of, probably some of the things that they thought of that, 1961, IBM
717 series, I don't know what it was, 740 series, computer-saving or daisy bail bicycle
built for two, which was what Hal 9000 sang.
I am not a sightless computer user, but I am one that enjoys having his computer talk
to him. The very first computer I ever bought was a Windows computer, and I bought and
paid for, and my dad bought me a T.S. or 80 when I was an early adolescent, but the early
90s, I bought a PC and it came with Windows 3.0 or 3.1 or something, and that was an application
that would read text, and I just remember thinking that was really funny to him, a computer
-wise voice, say the word banana, that's the end of there, but I find it hard to believe
that I am so unlike most people that I would be alone in enjoying hearing my computer speak
to me. If you are anything like me, you probably have a personal relationship with your computers
anyway. I have been known to have pictures of my computers hanging up on my office at
work, snapshot, so to speak. Anyway, it's something that geeks do, I think, is talk to their
computers, or especially have their computers talk back to them. It's just a, another neat
thing we can do with our computers and a very useful thing too. The reason I started
using festival on a routine basis was because of the podcast I do from my car. I am recording
this from my car. I am recording audio for stuff like this from my car is most convenient
for me, was not convenient about that, is I don't have a computer in front of me, and
so invariably I will forget something. I will have a minimal amount of notes in front
of me and I will leave something out or I will get something wrong. Instead of re-recording
it all or having to record something extra when I get home, I can top something up and
then have a festival translate the text to a way file that I can import an audacity
and include in the podcast. In addition to corrections and stuff, I can pre-prepare for
the podcast. I can pre-prepare the show notes, I can pre-prepare the opening and closing,
that kind of thing, or I can correct something like I said, or making a dendom to the podcast
after I have recorded it. It has been a real time saver for me using a festival. Before
I started doing any kind of podcast, which wasn't that long ago anyway, when I first started
using Linux, I remember there was a program called Say Date, and there is one, say time is
the one I use, say RTI, you made it, you would pop in say time and it would tell you what
time it was, you know, audibly. I remember thinking that was pretty neat. But before
getting into what you can do with festival, other than some of the things I already alluded
to, I guess a question a lot of people would have is how do you install it. What is, it's
really easy to install. It comes with your distribution anyway more than likely. Ubuntu
comes with version 1.4.3. I think there is a beta version, not a beta when you repose
it. I know all for Debian or Ubuntu, but it's version 1.95 or, say, a release counter beta
version of what the festival people are calling version 2, and that's what is available
from source, but as a tall ball. But as far as Debian and Ubuntu go, version 1.4.3 is
what's currently in the repose. And I know for a fact this end, you know, the fedora
repose, and the susur repose, and the mandraic mandraiva, gen2, picture distribution is
there. If it's not, like I said, you can install from source. It's not that big a deal.
But the packages I have installed on Ubuntu laptop are festival. FestLex-CMU. This is
the dictionary file that developed the Carnegie Mellon University. FestLex-PosLex, and that
is a festival lexicon file showing, for part of speech, POS-LEX, part of speech lexicon.
FestBox-KDLP16K is one of two American mail voice packages that you can install. There
are, well, it's one of two different types, and there's different rates, or do you want
it eight kilohertz and one to 16 kilohertz, and for both versions of this. But there's
I think there's a KDLP16K, as well as the KALPC16K, I think. Anyway, those are just the
standard voices, and of course there's voices for lots of languages. There's a multi-lingual
program, so more likely you can find your language. I'd be enough out of the most of the
repos, you're only going to get American mail voices, American, or any English or UK mail
voices. I know there's an Italian female voice, but there's not a lot of female voices in
the Debian and Ubuntu repos at least. Anyway, so what can you do with festival? Well,
in a Ubuntu system, and probably a Debian system, both not much unless you add a line
or two to your dot festival RC file and your home directory. This will be in the show notes
unhesitant to read it, but it's two parent-edical statements that start with parameter dots
set followed by space. The first one is in parentheses parameter dot set space, single
quote audio underscore command space, then followed by end of quotes, the command that you
won't audacity, and excuse me, festival to use, to play the audio. So if you want to
use also, which I recommend doing, if you want to be able to have a music player open
a new special at the same time, you'd put in something like a play, I guess it's also
play, followed by and the appropriate parameters. You can go to the Gentoo Wiki and do a search
for speech D, how to, and you will find the parameters to put in. I will put them in
the show notes so you will know how to get this to work. It's not good radio to read it.
In the second parent-edical statement, it would just be parameter dot set space, single
quote audio method. That's the parameter you've set in the previous parentheses space, single
quote audio underscore command in parentheses, I guess I said. Not the best radio, but as
a very lesser if you want to be able to use festival with also. Now that you got it
working, what can you do with it? You can, for instance, make festival read instant messages
to you from within game. I am not sure if there is a pigeon plugin for this, I'm sure there
is, but it's not in the Ubuntu repos, I know that. There is a game plugin for festival,
it's probably called something like game dash festival, and it is literally a five minute
setup once you install it. You just go into game and you enable the plugin. More or less
it under KDE, there is the K text to speech manager, which includes a little parrot that
sits down in your status bar, system bar, and anything you copy to the clipboard, it
will worry back to you. There is also an app called K say it, and one called K mouth
that I have never played with. Then of course there is the command line, festival. You
type festival, you get presented with a festival prompt where you can set the default voice
or set the speech at the volume and then have it echo back commands or say things. That's
one way to use it, I don't often use it, I'll have it used it that way in years, so I'm
not going to talk about that. But just from the command line, you can pop the output of
a command to festival. The command on switch we'll use for festival is festival space dash
dash TTS, where TTS stands for text to speech. You could echo and then double quote some
clever text, pop festival dash, festival space dash, dash TTS, whatever you put, whatever
clever text you put in the prompt saying that double quotes will get spoken to you or
you heard spoken by festival over your speakers. You can tap a file that way, I think you
need the capital A switch, cat, space dash capital A, file.txt, pop, again festival space dash
dash TTS will read to you the text in the file.txt file. A really useful thing to do,
no most of you have read a man page or two, but if you would like the man page to be read to you,
while you do something else, while you multitask, while you actually listen to the man page,
tell you what to do. You have your fingers at the weight and the indeterminal ready to top what
you hear. For instance, you could top man, space, crime, for instance, pop festival,
space dash dash TTS again. So you can more as pop the, you know, spain command to this,
it's going to input stuff to standard out. You could pop the date command with the appropriate
parameters to have it tell you to date, tell you the date. Not much need to do that when you have
programs like say date or say time. I think the say date program will even tell you your uptime
in addition to the date. So that's pretty handy. What, my hand is pretty neat. What else can you
do? You could create a shortcut if you use a desktop manager that allows you to create desktop
shortcuts. I don't, but if you do, you could create an icon on your desktop and that points to
the festival dash, space dash TTS command and you could drag a text file to it,
conceivably drag an email to it with some tweaking, maybe some an HTML page. I don't know,
some of the stuff I've not tried, but that's the kind of stuff you can do with it. One thing I
have done with festival, other than, there's a time saving device for my podcast, is I took a book,
a book that was in the public domain, a book that was considered to be one of its public domain.
That's literally freeware. The book is called Underground, written by Juliet Drafis. And we
researched by Juliet Sand. I think he works at Harvard. I think, anyway, it's a book about some
hackers in Australia in the, I guess, late 80s, early 90s. And it's a very good book. It's a
true story. It's a documentary style book. And I enjoyed reading it where they made the text of
the book available online in text files and I divided it up into chapters. And I used text to
waive, which is another portion of festival, to create an audio book. I offer every chapter. I
did it in MP3 and I, it was very time consuming. I actually did it on vacation from Dominican
Republic year before last, I think. I would take the text of a chapter and I would use text to waive
and the command off the top of my head for text to waive. I'm thinking it's something like
text to waive. I've forgotten. You can, you can do this with, you know, you could, I can text
to waive dash dash help will tell you, but it is something along the lines of text to waive
in the name of the text file and a dash over output in the name of the waive file that you're
going to create. And then you follow this with a dash e-vail and then inside quotes, double quotes,
inside parentheses, you put the name of the voice you want to use from my, from my podcast and
further reading of the book I use one of the HTS voices, CMU underscore US underscore SLT underscore
Arctic underscore HTS is the voice I used it. Hi, this is voice CMU US SLT Arctic HTS just in case
Dave was an error trying to quote the command that created this audio. Here is the command once again
text to waive my file dot t x t dash o my file dot waive dash e-vail open quote open parentheses
voice underscore CMU underscore US underscore SLT underscore Arctic underscore HTS close parentheses
close double quotes. This is my podcast, but I know it's one, I've named that voice one,
but that's just me, that's not the official name of the voice, but that command I used to
create an audio book from a bunch of text files. This is something that you want to be sort of
careful with that you decide to create an audio book using festival. It's best if you have a book
broken up in the chapters because that's what's going to load the entire text file in the memory
and it's going to create a waive file using that and it's it's not it's going to use that
volume memory if b file is going to be too big. I was lucky to underground is a book that was
right in the chapters. It had been the whole book and I tried to load you know six or 700 pages
worth of text. It would have probably taken up all two gig of RAM on my system. I'm just guessing,
but between just a simple text to speech command on switch and the text away program that comes
with festivals, the sky's the limit is to what you can what you can see doing with this,
especially the text to speech part. I mean you could, there was demo pages on the way up where
you know people have what is that the festival site and some of these other sites where you
can demo some of the voices. There's a dialogue box you type text into you pick up a voice
in the drop down list and you can hear that text spoken in a different voice. That's that's this
PHP. There's I just stumbled upon a site the other day the regular way not using a stumble upon
that that would read books to you. But you could it was I think it would take a book and you could
turn into an audio file that you can think is subscribed to the RSS feed. It's pretty neat
side. I forgot the name of it. But then you could like listen to Project Gutenberg books
translated from text to speech and you could subscribe to that audio file is it with RSS feed
readers so like a podcast that was pretty. But I'm sure they use something like festival,
it's not festival it was a commercial text to speech package. So there's a lot that you can do
with it. One of the questions I get asked a lot is how do you change voices festival comes with
a handful of voices and like I said most of the American voices are male and most people
are probably like me you know they find it somewhat intriguing to have their computer talk
in the first place but if you're married like me there's something
what's the word there's there's there's something really nice about having your computer talk
doing a female voice knowing that you can tell her exactly what to do and she'll do it. She's
a computer but she's telling a female voice it's something that made me no need to do very often
anyway uh for you on how you change voices with it there uh another thing you can do it is I've
never tried this but I imagine it would be pretty easy is uh like the the command line
text-based browser links L-Y-N-X have a dump feature D-U-M-P so you could you could open up a web
page with links using the the dump argument or links space-dump the name of the HTML file and then
redirect that with another the greater than sign to a file so it put us supposed to do is take that
HTML file and translate it to ASCII text which I'm pretty sure festival can handle pretty good
except for maybe some of our non-standard characters so that's the way maybe to convert nine you know
a web page that is in HTML if there's something festival could read to you. I like text-based
weather forecast because they open up better in links and console window and that's something
it could easily be parked and I could have festival read me the weather forecast. There's there's
lots of things you can do with this and pretty sure asterisk has a has an option to use festival
instead of a recorded voice. Speaking of asterisk there is a commercial text-to-speech package for
for this available for Linux that I think asterisk uses and I've forgotten the name of it against
with a C I think and Alison Smith the woman that does the voices for asterisk she she isn't in her
voice too. I think it's called a diaphone I mean she said she they've set the size for voice so
as her voice synthesizes it is a computer like voice it sounds like her that you can use with this
commercial package and then I've forgotten the name of it. Sorry for the interruption what
Dave is struggling to remember is the commercial text-to-speech system called Kepp Stryl CEP S-T-R-A-L.
But there's two or three commercial alternatives too festival and not I'm a really expensive there's
I guess there's T.T.Santh which used to be IBM text-to-speech she used to be via voice
by around $40 there's there's a couple more than I don't think that I want to remember but
they're in between the $29 dollar price range as well but that was tangent through me off oh yeah
the before I get that change of voices there's other things you can envision doing with sad and
alt friends and she you know you could remove non-standard characters using sad or you could
take the output of a log file this is this you can't use an alt in this print you know like
one column of it you know if you want to know who was online you could you could use alt to figure
that out have it really to you so that's pretty neat but like I said one of the main questions I
get frequently is how do you get the voice of what this is my podcast called in which is this the
CMU US SLT Arctic HTS file for voice and let's CMU underscore US underscore SLT underscore Arctic
underscore HTS and this Arctic clock is in Tundra this voice file is can be found at HTS.SP.NITEC.AC.JP
one sure at that web page you want to look for the release archive because what's on their main
page are voice files for the latest version of our assets I keep saying on that if the latest
version is vessel which is version 2.0 beta or 1.95 what comes with most distributions I think is
1.43 so you've got to release archive at that web page and you'll find the CMU US SLT Arctic HTS
file and the SLT file is the US female voice there's there's another one but that's that's the one
I think sounds the best and it's about a two-meg download and it will include the the HTS
engine that you'll need to run this voice as well as the voice files themselves and
and installing this it isn't that easy it's not it's not hard it's not hard right which
of you download the tarball and if you're unzip it or untaught and look at it you'll see
that there's a I forget the directory structure up top of my head but with down a couple of levels
and you got like a it may be ball slash we have slash CMU underscore US underscore SLT
underscore Arctic underscore HTS so it won't be the top level directory but the first two levels
will be empty but like three levels down or so at least two levels now you'll see that directory
then the CMU blah blah blah that one is the one that you will want to copy if you have
festival installed into I think it's user user share festival voices English so that's user
slash user slash share slash special slash voices slash English that's where you'll want to put
that directory once you do that you'll be able to using text to wave select that voice with the
the command line switch that's evalving in parentheses the name of that voice in parentheses
and that may have to be in quotes I mean of course forgive me but that that's how you do that
I think what maybe makes that voice sound better is the way that it's built it's built with
it's built with a different engine in it's built with what is called HMM which stands for
hidden Markov model in its a statistical method that's I've read is like the simplest
form of a dynamic Bayesian system but it's a statistical system where I won't try to explain
it because I'm not completely up on it but it's it's often used in creating voice files but it's
particularly good at the calls that the hidden part of it you know there's there's inputs or
outputs for you but they're called that can change to get you not aware of which one I can't
explain it like I said I'm in the car you need to spend a while since I face statistics and my
wife is gone hey uh I have to give the name of this road it's road to chase live on in the
premises yeah no okay okay okay okay
invably happens when you record audio in a car on your way home from work is your wife calls you
and I sort of forgotten where I was but yeah that the hidden Markov model it's I don't know
if it's superior or not but I know this SLT voice this uh this sounds really good and again you can
find that it HTS.S.P. and I take that AC.JP I think and I take is like the Nagio
Nagio Institute of Technology is a college in Japan evidently anyway uh I guess something I
got left to say is using that voice using that uh CMU US SLT Arctic HTS voice using text the
wave I get and you know I get the result I won and I get an audio file that's been translated
front text but I also get a sub file in a cordon every time I don't know why I'm not investigated
uh it's that the cordon is clean up accuracy after down I had that I'm not accumulating a bunch of
no don'ts and wasting this space or anything and it's not doing any any damage at all it's
still creating my file for me and everything works so I guess your mileage may vary anyway I am
I've rambled on long enough and that's going to wrap it up for this episode of HPR much I have a good day
hello this is voice CMU US BBL Arctic HTS hello this is voice US one I'm roll they forgot
dimension but I'm roll engine binary and voice files can be found at the follow in url tcts.fbms.c.v
dot v slash synthesis slash mvrl dot html
hello I am the default festival voice thanks for downloading hacker public radio have a nice day
thank you for listening to hacker public radio
hpr is mastered by caro.net so head on over to clr.o.nc for all of those of you