Episode: 401 Title: HPR0401: web2speech Source: https://hub.hackerpublicradio.org/ccdn.php?filename=/eps/hpr0401/hpr0401.mp3 Transcribed: 2025-10-07 19:50:31 --- Whee! Hello everybody, my name is Ken Fallon and today's episode is going to be on converting text into speech, particularly URLs. The reason I wanted to do this was there was a lot of Wikipedia articles that I wanted to look up and I was going to be outside, so I thought well, I can put them on my MP3 player and listen to them outside. But since I've had this idea, I've used it to read long man pages, read articles that people have posted on their blogs and just generally use some background reading on the CIA world factbook and that sort of thing. Some websites work better than others, if there's going to be a lot of graphics obviously and a lot of tables, it's not going to be a lot of use to you. However, if it's a simply commentary and text, it'll come out quite nice. One good tip would be to go to the website using a fairly basic program like e-links or dillow, which is more graphical viewer, and you'll see what the page is going to look like. If it's a WordPress blog, for instance, all the menus will be put down at the bottom and yeah, it looks quite nice, which means that all the menus and stuff when they are spoken back to you will be at the end of the episode, so you can already audio file and you can flick to the next one. Okay, let's get down to it. The reason I want to talk to you about this is, first of all, it's about the philosophy, the unix philosophy of having small things that do a particular task and chaining them together. That's the first thing. And secondly, it's to explain the practical uses of standard input, standard output, redirecting and that sort of stuff. Now I could have used for Wikipedia a program website called PDFon, which will actually do this, but since I've actually started this Wikipedia text-to-speech blog, which I did a few weeks ago, I've modified the script heavily so that I can use it for any web page and convert it into different formats, put in command line switches, you know, I can specify the file type and that sort of thing, file name and whether I want to override the file name and that sort of stuff. Everything starts off quite simple and then you can expand it out. Anyway, they called, as it stands now, will be also in the show notes for this episode or at least a link to it. Let's begin. First of all, I want to talk about standard input, standard output and standard error. Standard input is typically your keyboard and your mouse. Standard output tends to be your screen or perhaps a printer and standard error usually tends to be your screen if there's a neural message. So this kind of normal, unique stuff. What is kind of cool is that you can take the output of one and redirect it into another and you can take the R or you could also take the output and pipe it into another. There's a slight difference here. Everything is the greater than sign. So if you do an LS, which is a directory listing and you use the greater than sign and then you specify a file name. So LS.text, for instance, instead of sending the output to your screen, it's going to send it to the text file called LS.text, fair enough. Now, see, you have another directory and you want to do LS. On that directory as well, you can change it to that directory and you want to append it to the file. You simply type LS, greater than greater than LS.text and it will be appended to the end of the file. So that's redirected. What we are going to do here is we're going to be piping the output of standard output into standard input. Most programs like LS and Grap or whatever, use standard output and standard input as there as where they take the file in from standard input and the standard output to standard output. However, some of the programs that we're going to be using like WGET or whatever, don't do this and you need to specify it. So if you open up the man pages for these programs and search for STD out or standard out, you'll see the switches if they're necessary. The Unix philosophy is that you have a lot of small programs that you can put together to make a more complex one. Your first task when you have an idea, there's an itch that you want to scratch, is how you're going to break that big task up into smaller little sub-tasks that you can then work on and find a tool that will help you accomplish those. In our task here, we want to go to a web page and we want to convert that into a Nog file for instance that we can play in my portable media player. So task number one is we want to get the web page. When we download the web page, that's going to be a HTML format, so we need to convert the HTML into standard text. When we convert that standard text in through a speech synthesizer, and usually a speech synthesizer will only give you the option to output to a WAV file. So then we want something else that will convert the WAV file into an AUK file. Now under Linux there are many programs and for any particular task, there's probably going to have a choice of two or three different programs that you can use. For example, getting the website, I can use WGET or I can use curl or I could even use a telnet with some expect commands, but I'm going to use WGET because that's the one I'm most familiar with. When I'm selecting HTML to text as my command to convert the HTML into text, I'm going to use eSpeak instead of festival because I found eSpeak to be easier to install, works better with standard input and standard output I found. For the conversion from WAV to AUG, I could use SOX, but I'm actually going to use FFM PEC because SOX has a non-bogue where it doesn't support MP3 files out of the box for legal reasons. So I've got my commands, WGET, HTML to text, eSpeak and FFM PEC. And what I'm going to do is I'm going to have these programs pipe from one command into the next. So I'm going to need to look into the man pages for all of these commands and make sure that they all redirect the standard output and that they can accept input from standard input. By default, WGET will save a downloaded file instead of displaying it on standard output. So we need to use the dash O command, that's a capital O space dash, to tell it, to send the standard output as opposed to just saving it in the file. The same with HTML to text, we need to specify the dash lowercase or space minus sign. With eSpeak, the format is dash dash STDAUT and with FFM PEC, we're actually want to save it to a file, but with FFM PEC, we need to tell it to listen to standard input and that is dash I, space and the minus sign. All the other programs by default will listen on standard input, so we're good to go there. Other commands when I chain them together, the only other special thing that I needed to do was in the HTML to text command, it has by default, when you download a page, if there's bold or italics, it will add some special encoding characters that are understood by page programs like less and more and they sound very choppy when you play it through eSpeak. So some of the other options I added were the dash no BS, space dash ASCII, to strip out those special characters and to convert everything into ASCII code. So you will get the URL dash greater than all, space minus, space, the pipe sign, space, HTML to text, space dash no BS, space dash ASCII, space, lore castle, space dash, pipe that into eSpeak, space dash dash STDAUT, pipe that into FFM PEC dash, space dash I, space dash, and then output file dot OGG, and when you do that, it will go get the web page, convert it to text for you, and output it to an art file. Now you can do that on the command line, which is what I did for quite a while, but you can also make a script around that using a, you know, what do you want to look up on Wikipedia and you can read in line, and then you can put in a WGAS, HTTP code for such, for such EN.wikipedia.org, for such wiki, for such double quotes, dollar opens quickly, bracket line, close quickly bracket, double quotes again, and that will go off to Wikipedia and it will send whatever you typed in as the answer to that, it will send it off to Wikipedia Wikipedia will return with the correct URL. That I found was very good for what I needed to do with the Wikipedia text at all list of abbreviations and terms that I wanted to look up. So I was able to pipe all those into a script and it would go into a text file, pipe the text file into this script which I put into a loop and then I was able to convert all these things into org and put them on my portable media player. However since then I found that going to Wikipedia I'm more or less looking up a URL the whole time so it might be somebody's blog, it might be a URL to my man page, it might be a how-to document, it might be the world CIA factbook on some country and so I've instead expanded it out so that it's now called web2speech and you can specify the format and options in URL so by default web2speech and the URL will just convert it into org or whatever is defined as the default format for your player and you'll find a link to that program in the show notes for this episode and I'd appreciate your feedback and comments. Okay I hope you find this useful. If not tune in tomorrow and expect to hear another exciting episode of how-to-public radio. Thank you very much and goodbye.