Episode: 21 Title: HPR0021: The Festival Speech Synthesis System Source: https://hub.hackerpublicradio.org/ccdn.php?filename=/eps/hpr0021/hpr0021.mp3 Transcribed: 2025-10-07 10:22:48 --- Hello, this is Hacker Public Radio and my name is Dave and I'll be your host for today. And I am going to talk about festival today. Festival is a multilingual, synthetic speech synthesizer package. It was developed at the Center for Speech Technology Research at the University of Edinburgh and Carnegie Mellon University, amongst other places, there are a couple of other sites that probably had something to do with it as well. And it's licensed under ABSD-style license. There are other pieces of the puzzle, not just festival gets lumped in in conversations like this or topics like this. First box is another suite of tools that make the building of synthetic voices for festival more systematic and better documented. And there are, I guess, voice packages that come with festival or that are designed for festival and those that are additional voices developed by different entities that work with festival. To come to mind, the Inbrella Project and the HTS Project, and these can be used as back ends for festival, although they both of these will require the use of a separate engine to work. And then, I guess, walking in with the MIPS, we have some front ends for festival. Flight, F-L-I-T-E, I think it's how that's spelled, is a small footprint speech synthesis based on festival, developed at Carnegie Mellon University, and it's built with Festbox. And then there's some GUI front ends, TK Festival, of course, built with Tickle and Carnival, which is one I have not used, but the screenshots are really nice. It is a really nice looking graphical front end for festival. So that's sort of the family of applications I'm going to be talking about. Now, most people, when they think of computerized speech synthesis, or think of one of two things, or one of a couple of our three things, they will think of accessibility in sight, less computer users, people that need to have screen readers and need to have text read to them because they can't see it, or they will think of the 1968 Masterpiece 2001 and Hal 9000, or they will think of, probably some of the things that they thought of that, 1961, IBM 717 series, I don't know what it was, 740 series, computer-saving or daisy bail bicycle built for two, which was what Hal 9000 sang. I am not a sightless computer user, but I am one that enjoys having his computer talk to him. The very first computer I ever bought was a Windows computer, and I bought and paid for, and my dad bought me a T.S. or 80 when I was an early adolescent, but the early 90s, I bought a PC and it came with Windows 3.0 or 3.1 or something, and that was an application that would read text, and I just remember thinking that was really funny to him, a computer -wise voice, say the word banana, that's the end of there, but I find it hard to believe that I am so unlike most people that I would be alone in enjoying hearing my computer speak to me. If you are anything like me, you probably have a personal relationship with your computers anyway. I have been known to have pictures of my computers hanging up on my office at work, snapshot, so to speak. Anyway, it's something that geeks do, I think, is talk to their computers, or especially have their computers talk back to them. It's just a, another neat thing we can do with our computers and a very useful thing too. The reason I started using festival on a routine basis was because of the podcast I do from my car. I am recording this from my car. I am recording audio for stuff like this from my car is most convenient for me, was not convenient about that, is I don't have a computer in front of me, and so invariably I will forget something. I will have a minimal amount of notes in front of me and I will leave something out or I will get something wrong. Instead of re-recording it all or having to record something extra when I get home, I can top something up and then have a festival translate the text to a way file that I can import an audacity and include in the podcast. In addition to corrections and stuff, I can pre-prepare for the podcast. I can pre-prepare the show notes, I can pre-prepare the opening and closing, that kind of thing, or I can correct something like I said, or making a dendom to the podcast after I have recorded it. It has been a real time saver for me using a festival. Before I started doing any kind of podcast, which wasn't that long ago anyway, when I first started using Linux, I remember there was a program called Say Date, and there is one, say time is the one I use, say RTI, you made it, you would pop in say time and it would tell you what time it was, you know, audibly. I remember thinking that was pretty neat. But before getting into what you can do with festival, other than some of the things I already alluded to, I guess a question a lot of people would have is how do you install it. What is, it's really easy to install. It comes with your distribution anyway more than likely. Ubuntu comes with version 1.4.3. I think there is a beta version, not a beta when you repose it. I know all for Debian or Ubuntu, but it's version 1.95 or, say, a release counter beta version of what the festival people are calling version 2, and that's what is available from source, but as a tall ball. But as far as Debian and Ubuntu go, version 1.4.3 is what's currently in the repose. And I know for a fact this end, you know, the fedora repose, and the susur repose, and the mandraic mandraiva, gen2, picture distribution is there. If it's not, like I said, you can install from source. It's not that big a deal. But the packages I have installed on Ubuntu laptop are festival. FestLex-CMU. This is the dictionary file that developed the Carnegie Mellon University. FestLex-PosLex, and that is a festival lexicon file showing, for part of speech, POS-LEX, part of speech lexicon. FestBox-KDLP16K is one of two American mail voice packages that you can install. There are, well, it's one of two different types, and there's different rates, or do you want it eight kilohertz and one to 16 kilohertz, and for both versions of this. But there's I think there's a KDLP16K, as well as the KALPC16K, I think. Anyway, those are just the standard voices, and of course there's voices for lots of languages. There's a multi-lingual program, so more likely you can find your language. I'd be enough out of the most of the repos, you're only going to get American mail voices, American, or any English or UK mail voices. I know there's an Italian female voice, but there's not a lot of female voices in the Debian and Ubuntu repos at least. Anyway, so what can you do with festival? Well, in a Ubuntu system, and probably a Debian system, both not much unless you add a line or two to your dot festival RC file and your home directory. This will be in the show notes unhesitant to read it, but it's two parent-edical statements that start with parameter dots set followed by space. The first one is in parentheses parameter dot set space, single quote audio underscore command space, then followed by end of quotes, the command that you won't audacity, and excuse me, festival to use, to play the audio. So if you want to use also, which I recommend doing, if you want to be able to have a music player open a new special at the same time, you'd put in something like a play, I guess it's also play, followed by and the appropriate parameters. You can go to the Gentoo Wiki and do a search for speech D, how to, and you will find the parameters to put in. I will put them in the show notes so you will know how to get this to work. It's not good radio to read it. In the second parent-edical statement, it would just be parameter dot set space, single quote audio method. That's the parameter you've set in the previous parentheses space, single quote audio underscore command in parentheses, I guess I said. Not the best radio, but as a very lesser if you want to be able to use festival with also. Now that you got it working, what can you do with it? You can, for instance, make festival read instant messages to you from within game. I am not sure if there is a pigeon plugin for this, I'm sure there is, but it's not in the Ubuntu repos, I know that. There is a game plugin for festival, it's probably called something like game dash festival, and it is literally a five minute setup once you install it. You just go into game and you enable the plugin. More or less it under KDE, there is the K text to speech manager, which includes a little parrot that sits down in your status bar, system bar, and anything you copy to the clipboard, it will worry back to you. There is also an app called K say it, and one called K mouth that I have never played with. Then of course there is the command line, festival. You type festival, you get presented with a festival prompt where you can set the default voice or set the speech at the volume and then have it echo back commands or say things. That's one way to use it, I don't often use it, I'll have it used it that way in years, so I'm not going to talk about that. But just from the command line, you can pop the output of a command to festival. The command on switch we'll use for festival is festival space dash dash TTS, where TTS stands for text to speech. You could echo and then double quote some clever text, pop festival dash, festival space dash, dash TTS, whatever you put, whatever clever text you put in the prompt saying that double quotes will get spoken to you or you heard spoken by festival over your speakers. You can tap a file that way, I think you need the capital A switch, cat, space dash capital A, file.txt, pop, again festival space dash dash TTS will read to you the text in the file.txt file. A really useful thing to do, no most of you have read a man page or two, but if you would like the man page to be read to you, while you do something else, while you multitask, while you actually listen to the man page, tell you what to do. You have your fingers at the weight and the indeterminal ready to top what you hear. For instance, you could top man, space, crime, for instance, pop festival, space dash dash TTS again. So you can more as pop the, you know, spain command to this, it's going to input stuff to standard out. You could pop the date command with the appropriate parameters to have it tell you to date, tell you the date. Not much need to do that when you have programs like say date or say time. I think the say date program will even tell you your uptime in addition to the date. So that's pretty handy. What, my hand is pretty neat. What else can you do? You could create a shortcut if you use a desktop manager that allows you to create desktop shortcuts. I don't, but if you do, you could create an icon on your desktop and that points to the festival dash, space dash TTS command and you could drag a text file to it, conceivably drag an email to it with some tweaking, maybe some an HTML page. I don't know, some of the stuff I've not tried, but that's the kind of stuff you can do with it. One thing I have done with festival, other than, there's a time saving device for my podcast, is I took a book, a book that was in the public domain, a book that was considered to be one of its public domain. That's literally freeware. The book is called Underground, written by Juliet Drafis. And we researched by Juliet Sand. I think he works at Harvard. I think, anyway, it's a book about some hackers in Australia in the, I guess, late 80s, early 90s. And it's a very good book. It's a true story. It's a documentary style book. And I enjoyed reading it where they made the text of the book available online in text files and I divided it up into chapters. And I used text to waive, which is another portion of festival, to create an audio book. I offer every chapter. I did it in MP3 and I, it was very time consuming. I actually did it on vacation from Dominican Republic year before last, I think. I would take the text of a chapter and I would use text to waive and the command off the top of my head for text to waive. I'm thinking it's something like text to waive. I've forgotten. You can, you can do this with, you know, you could, I can text to waive dash dash help will tell you, but it is something along the lines of text to waive in the name of the text file and a dash over output in the name of the waive file that you're going to create. And then you follow this with a dash e-vail and then inside quotes, double quotes, inside parentheses, you put the name of the voice you want to use from my, from my podcast and further reading of the book I use one of the HTS voices, CMU underscore US underscore SLT underscore Arctic underscore HTS is the voice I used it. Hi, this is voice CMU US SLT Arctic HTS just in case Dave was an error trying to quote the command that created this audio. Here is the command once again text to waive my file dot t x t dash o my file dot waive dash e-vail open quote open parentheses voice underscore CMU underscore US underscore SLT underscore Arctic underscore HTS close parentheses close double quotes. This is my podcast, but I know it's one, I've named that voice one, but that's just me, that's not the official name of the voice, but that command I used to create an audio book from a bunch of text files. This is something that you want to be sort of careful with that you decide to create an audio book using festival. It's best if you have a book broken up in the chapters because that's what's going to load the entire text file in the memory and it's going to create a waive file using that and it's it's not it's going to use that volume memory if b file is going to be too big. I was lucky to underground is a book that was right in the chapters. It had been the whole book and I tried to load you know six or 700 pages worth of text. It would have probably taken up all two gig of RAM on my system. I'm just guessing, but between just a simple text to speech command on switch and the text away program that comes with festivals, the sky's the limit is to what you can what you can see doing with this, especially the text to speech part. I mean you could, there was demo pages on the way up where you know people have what is that the festival site and some of these other sites where you can demo some of the voices. There's a dialogue box you type text into you pick up a voice in the drop down list and you can hear that text spoken in a different voice. That's that's this PHP. There's I just stumbled upon a site the other day the regular way not using a stumble upon that that would read books to you. But you could it was I think it would take a book and you could turn into an audio file that you can think is subscribed to the RSS feed. It's pretty neat side. I forgot the name of it. But then you could like listen to Project Gutenberg books translated from text to speech and you could subscribe to that audio file is it with RSS feed readers so like a podcast that was pretty. But I'm sure they use something like festival, it's not festival it was a commercial text to speech package. So there's a lot that you can do with it. One of the questions I get asked a lot is how do you change voices festival comes with a handful of voices and like I said most of the American voices are male and most people are probably like me you know they find it somewhat intriguing to have their computer talk in the first place but if you're married like me there's something what's the word there's there's there's something really nice about having your computer talk doing a female voice knowing that you can tell her exactly what to do and she'll do it. She's a computer but she's telling a female voice it's something that made me no need to do very often anyway uh for you on how you change voices with it there uh another thing you can do it is I've never tried this but I imagine it would be pretty easy is uh like the the command line text-based browser links L-Y-N-X have a dump feature D-U-M-P so you could you could open up a web page with links using the the dump argument or links space-dump the name of the HTML file and then redirect that with another the greater than sign to a file so it put us supposed to do is take that HTML file and translate it to ASCII text which I'm pretty sure festival can handle pretty good except for maybe some of our non-standard characters so that's the way maybe to convert nine you know a web page that is in HTML if there's something festival could read to you. I like text-based weather forecast because they open up better in links and console window and that's something it could easily be parked and I could have festival read me the weather forecast. There's there's lots of things you can do with this and pretty sure asterisk has a has an option to use festival instead of a recorded voice. Speaking of asterisk there is a commercial text-to-speech package for for this available for Linux that I think asterisk uses and I've forgotten the name of it against with a C I think and Alison Smith the woman that does the voices for asterisk she she isn't in her voice too. I think it's called a diaphone I mean she said she they've set the size for voice so as her voice synthesizes it is a computer like voice it sounds like her that you can use with this commercial package and then I've forgotten the name of it. Sorry for the interruption what Dave is struggling to remember is the commercial text-to-speech system called Kepp Stryl CEP S-T-R-A-L. But there's two or three commercial alternatives too festival and not I'm a really expensive there's I guess there's T.T.Santh which used to be IBM text-to-speech she used to be via voice by around $40 there's there's a couple more than I don't think that I want to remember but they're in between the $29 dollar price range as well but that was tangent through me off oh yeah the before I get that change of voices there's other things you can envision doing with sad and alt friends and she you know you could remove non-standard characters using sad or you could take the output of a log file this is this you can't use an alt in this print you know like one column of it you know if you want to know who was online you could you could use alt to figure that out have it really to you so that's pretty neat but like I said one of the main questions I get frequently is how do you get the voice of what this is my podcast called in which is this the CMU US SLT Arctic HTS file for voice and let's CMU underscore US underscore SLT underscore Arctic underscore HTS and this Arctic clock is in Tundra this voice file is can be found at HTS.SP.NITEC.AC.JP one sure at that web page you want to look for the release archive because what's on their main page are voice files for the latest version of our assets I keep saying on that if the latest version is vessel which is version 2.0 beta or 1.95 what comes with most distributions I think is 1.43 so you've got to release archive at that web page and you'll find the CMU US SLT Arctic HTS file and the SLT file is the US female voice there's there's another one but that's that's the one I think sounds the best and it's about a two-meg download and it will include the the HTS engine that you'll need to run this voice as well as the voice files themselves and and installing this it isn't that easy it's not it's not hard it's not hard right which of you download the tarball and if you're unzip it or untaught and look at it you'll see that there's a I forget the directory structure up top of my head but with down a couple of levels and you got like a it may be ball slash we have slash CMU underscore US underscore SLT underscore Arctic underscore HTS so it won't be the top level directory but the first two levels will be empty but like three levels down or so at least two levels now you'll see that directory then the CMU blah blah blah that one is the one that you will want to copy if you have festival installed into I think it's user user share festival voices English so that's user slash user slash share slash special slash voices slash English that's where you'll want to put that directory once you do that you'll be able to using text to wave select that voice with the the command line switch that's evalving in parentheses the name of that voice in parentheses and that may have to be in quotes I mean of course forgive me but that that's how you do that I think what maybe makes that voice sound better is the way that it's built it's built with it's built with a different engine in it's built with what is called HMM which stands for hidden Markov model in its a statistical method that's I've read is like the simplest form of a dynamic Bayesian system but it's a statistical system where I won't try to explain it because I'm not completely up on it but it's it's often used in creating voice files but it's particularly good at the calls that the hidden part of it you know there's there's inputs or outputs for you but they're called that can change to get you not aware of which one I can't explain it like I said I'm in the car you need to spend a while since I face statistics and my wife is gone hey uh I have to give the name of this road it's road to chase live on in the premises yeah no okay okay okay okay invably happens when you record audio in a car on your way home from work is your wife calls you and I sort of forgotten where I was but yeah that the hidden Markov model it's I don't know if it's superior or not but I know this SLT voice this uh this sounds really good and again you can find that it HTS.S.P. and I take that AC.JP I think and I take is like the Nagio Nagio Institute of Technology is a college in Japan evidently anyway uh I guess something I got left to say is using that voice using that uh CMU US SLT Arctic HTS voice using text the wave I get and you know I get the result I won and I get an audio file that's been translated front text but I also get a sub file in a cordon every time I don't know why I'm not investigated uh it's that the cordon is clean up accuracy after down I had that I'm not accumulating a bunch of no don'ts and wasting this space or anything and it's not doing any any damage at all it's still creating my file for me and everything works so I guess your mileage may vary anyway I am I've rambled on long enough and that's going to wrap it up for this episode of HPR much I have a good day hello this is voice CMU US BBL Arctic HTS hello this is voice US one I'm roll they forgot dimension but I'm roll engine binary and voice files can be found at the follow in url tcts.fbms.c.v dot v slash synthesis slash mvrl dot html hello I am the default festival voice thanks for downloading hacker public radio have a nice day thank you for listening to hacker public radio hpr is mastered by caro.net so head on over to clr.o.nc for all of those of you