Episode: 1284 Title: HPR1284: Blather Speech Recognition for Linux: Interview with Jezra Source: https://hub.hackerpublicradio.org/ccdn.php?filename=/eps/hpr1284/hpr1284.mp3 Transcribed: 2025-10-17 22:56:46 --- Hey everybody, this is John Culp in Lafayette, Louisiana. And I am doing a special episode for Hacker Public Radio here. The last couple I recorded were solos and the one before that was an interview and the one before that was an interview, but getting back to the interviews now. And with me is programmer extraordinaire Jezra. Hi everybody, this is Jezra, I'm in Petaluma, California and I'm talking to John Culp of Remumble and it's pretty good. Yeah, it sounds pretty good. We're mumbling and we're also going to be blathering. Yeah. So, Jezra, since you are the lead developer of Blather, why don't you tell everybody what the heck it is? There is a Python application that wraps around, let's see, Python is a Blather application that uses G-streamer to wrap around pocket sphinx, which is a speech recognition engine created, I believe, by Carnegie Mellon University. And by doing this wrapping around the speech recognition engine, Blather is capable of running commands when someone who is running Blather speaks a certain sentence or string of words. Yeah, that pretty much sums it up. You know, when you first mentioned something, I remember seeing a notice from you on Identica or our status net instances or something saying just kind of impassing that you were doing something with speech recognition and this piqued my interest immediately because speech recognition is really, really important to me because a few years ago, I had issues with my wrist and I still do, I got carpal tunnel syndrome and I actually had surgery on my left wrist to fix it and it's helped somewhat but I still cannot be typing all the time. It really hurts if I do too much typing and so speech recognition has become crucial to me for any kind of dictation or anything like that, like long emails, documents I have to create at work or anything like that. And up until you came out with Blather, I had to boot into Windows to use either Dragon Naturally Speaking or to use the built-in Microsoft speech recognition program, which is actually not all that bad, I mean the functionality is pretty good and the same goes for Dragon Naturally Speaking but the problem with both of them is, well, two problems, one, they both seem to bring the system to a grinding halt in terms of resources, they're real resource hogs. The other is that you're basically stuck with whatever they give you. If they say to switch applications, you have to use switch to this, switch to that, then you're stuck with that but the beauty of Blather is that if you know what you're doing a little bit like I do for scripting and so forth, you can set up basic functionality to do all of that stuff that the other ones do except make the commands you actually want to say, not the ones they tell you that you have to say. And I will say that what I've seen of the videos that you've posted of you using Blather that is above and beyond anything I ever dreamed of would be done with the software that I wrote. So what was your original vision for this thing? My original vision was really a joke that I've had with my brother for maybe 10 years in that I just want to be able to come home from work, walk into my place and say computer, play Black Sabbath and then of course have the machine play some Black Sabbath. This has never been something that I actually just went ahead and did because I never looked into speech recognition in Linux as much as I should have. I always thought it was oh it's sort of behind and what's currently available on say a Macintosh or the Windows operating system. And then I saw a tutorial on I believe it was the G streamer site about using pockets finks from both Vela and from Python. And I thought oh wow that is it, that is going to let me walk into my house say computer, play Black Sabbath because I like Star Trek the next generation. I watch a lot of Star Trek and they just walk into the place say computer, play Black Sabbath computer, play music and that's what I wanted. That hands free I'm home, play some damn music for me. Love it. And is that working for you right now? I mean the video that I saw you post was I believe you were making your string of LEDs do various things with voice commands. I've actually had a problem with the speech recognition picking up Sabbath. And I don't call my computer computer computers have names computers are anthropomorphized like everything else I I have everything is to me is a pit my car my motorcycle my computers. I don't say hey car let's go to work my car has a name I talk to it. The motorcycle has a name I talk to it musical instruments have names we talk to them. Computers have names we talk to them as if they are a child or a pet. And so to me I have a home network and that home network is who I talk to and I say Neowley because my nickname of my network is Neowley it's not the name of my network. And so for me it's Neowley play black Sabbath and Sabbath doesn't get picked up properly and I don't know why. You might be able to go into the dictionary file and tweak it a little bit. I mean that's kind of granular tweaking there I mean what I do the word bladder actually it has trouble picking up that word bladder both bladder itself and the dictation tool that I use the Google web speech API tool it always thinks I'm saying bladder and so I've basically I've got a command where I want to turn off bladder I just say kill bladder and so and I've just learned if I wanted to do something with bladder I have to say bladder and actually one of my little text manipulation commands is called fix bladder and so if I am if I am doing dictation in one of those little Google speech windows and I have to use the word bladder it almost always probably 80% of the time they think I said bladder and so I will give a command that says fix bladder and it does a series of keystrokes that will select everything copy it to the clipboard pipe that out through stream editor said which will then replace all instances of the word bladder with bladder and put it back on the screen. One thing I've tried doing when there's a command that is just not being recognized or picked up by whatever machine is set up to pick up and is running bladder is to change the string to be more phonemic. So that instead of like if I were saying enough I wouldn't write E-N-O-U-G-H I would type E-N-U-W-F because it would be the phonemes that I want and that would probably match up a bit better than the bastard language known as English. You know one thing I've found also with this thing is that sometimes I have to speak more quickly than I think I might need to, I will have to say the string of words fairly quickly. Really quickly. Yeah or else it will think I've already stopped speaking and that it's a separate string that it's looking for. And right there you need to restart bladder. Restart bladder? Yes. So I've noticed that when bladder starts sometimes it will as you said think you've ended your sentence even though you haven't or it will think you've paused even though you haven't. So something like Niali lights on should would be Niali lights on and that is not the way human speak, but that's the way that the pockets thinks library is picking up the speech that I'm well speaking. It's the way it is and I found that actually restarting it tends to put it a bit more in line with what I want it to do. Now when you say restart do you mean actually quit out of the whole thing and restart it or just check the little check box on and off? I mean I mean pretend it's a Windows machine, turn it off, turn it back on again. I don't have to do that very much actually, I normally can keep it going for hours at the time and it works just fine. Well in that case, shit. I slower on HPR. I was wondering how long it would take before we got to the explicit label there so there it is. You know I thought of starting the episode with some swearing just to keep the pace going. Yeah. I guess it just didn't happen. Well you know, that's all right. There was something I was going to, I'd lost my train of thought there. Man are you driving? It happens. Are you standing out in the driveway or something? No, I'm sitting in my living room which is right next to one of the main thoroughfares in this little area of town. Okay. And so the threshold on mumble is set just to the edge of my voice. So if a car goes by, it's most likely going to get picked up by mumble. However, the microphone that I'm using to record with audacity might not pick up that audio which is a feature of this nice cheap karaoke microphone. Oh one of those things, that's funny. My daughter has a karaoke machine and I'm looking right now at one of those little dynamic microphones like you're talking about and yeah, those normally won't pick up stuff from much further away than two or three feet I would think. I'm using a little clip-on condenser mic here so it'll pick up various ambiance sounds in my detached office. I actually turned off one of my servers over there to try and reduce the amount of sound in the room. Turned off the air conditioner as well so I hope it doesn't get too hot in here. Yeah, I would think that turning off the air conditioner in Louisiana in summertime may not necessarily be the best idea. Well, it's certainly not a good idea during the day but after dark like right now it's probably okay. I think I'll survive to the end of our conversation. Awesome. So I certainly hope that you will be posting the links to the videos that you've made of you using Bladder. I have at least the big long intro video where I've got the slides going and the picture of you and all that and the music in the beginning and I figure from there people can, you know, if they want to see more they can click on related things or look at my list of videos. Did you get a chance to watch the one I made today yet? You know, I haven't. I've been driving all day long. Oh man, bummer. It's pretty amazing. The thing I did today is, I don't know, it gets it where it's almost as good as dragging naturally speaking for my purposes. What I did was I created a command that will, whenever I say the word dictation box, it will open up a new instance of chromium with a pixel size of 600 by 400 and it is opened up as an app. So it comes up in its own little window. It just pops out a little box and I do a series of virtual mouse movements and clicks and so forth to turn on the microphone and start listening almost immediately. And so you talk and talk and talk and it uses the Google Web Speech API and you can see your words just spitting out onto the screen as you talk. And then when you are done talking, you just say, stop talking and that's the command that will tell it to do another virtual mouse click on the button that stops it and so that stops listening and then one more command called transfer text will copy all that and then flip back to the previous app you were in when you said dictation box and paste all the text in there and it works great. I mean, to me, it works better than the drag and naturally speaking one in terms of system performance. It's very fast. What it doesn't do is it doesn't allow you to do the kind of very detailed editing of text with your voice. That's something that is not all that possible with this system. Although I do actually do quite a bit of text editing with my voice using the speech recognizer app that I don't remember who the guy is who made it but it's a little add on that you can get for Chromium and I will, if I need to work on a big block of text, I will put it in there and then I can just select a word and then give it a command like capitalize this or make uppercase and that will make it all caps or it will capitalize it. I can have some commands now where I can select a what is the sequence something like I select a URL and copy it to the clipboard and then I will go back to something I'm working on let's say some HTML and then I'll select the text in there that I want to be the link text and then I'll give a command that says insert hyperlink and what it will do is write out all the HTML code that you need to put in the URL and then put the link text in between the two brackets and yeah just it simplifies a lot of the that kind of stuff that I have to do frequently when I'm working on my syllabi for classes or doing various other things. I can't tell you how many keystrokes Blather has saved me I mean it must be in the hundreds of thousands by now but yeah and that is amazing because that was totally not my intention when I wrote the software. I wanted to do some home automation I wanted to send commands to web APIs of machines about my house and you've taken that and I'm going to say you totally hacked it and that's totally awesome. That's fucking awesome is what it is you took something that wasn't really meant well I guess it was meant to send commands that's what Blather was meant to do take someone's voice convert that into a command and run that command my vision of it was something very simple and you took that ball and you fucking ran with it and you made it this thing that is I'm blown away I'm absolutely blown away you know you might have done the same thing with it if you had the problem that I have my problem is repetitive strain injuries repetitive stress injuries if you were always looking for ways to reduce keystrokes the way I am you probably would have seen the same possibilities here I immediately saw this as an accessibility tool not as a fun to and I use it for fun too I've got a bunch of little scripts and commands that I use where I ask it like what time is it and it will run my what time is it script and it'll do a coin flip that like a virtual coin flip and depending on whether it gets a zero or a one it will either tell me what time it is or it'll give me a smart ass remark like time to get a move on or something like that it chooses it chooses from a list of predefined responses and it shuffles them and chooses one randomly I've got ones where I'll ask it what's for dinner and I've got a text file that has 15 or 20 possibilities of what might be for dinner and it shuffles all those and chooses the top one and then a voice a text to speech engine will speak it to me and now and that is home automation when you have the computer speaking to you yeah it's fun too you know I might do a an HPR episode like a real short one where I have a conversation with my computer using mumble and I can I can keep it going a pretty good while because I've got it doing all kinds of things I can ask it how's the weather you know what's for dinner how are you today what time is it you know I've got all these commands and it puts back a different response almost every time then I think you should go ahead and do that I almost certainly will yeah you know I got admit when you first posted this I had heard of Sphinx before because I'd done research on speech recognition you know when I realized that my problem really was not going to go away and I knew that I would have to come up with some kind of speech recognition solution I tried desperately not to have to resort to windows and so I looked into various things on Linux and I found Sphinx but Sphinx is not something you can just use you know you got a you needed bladder you needed something to give you a way to start it and have the very complex long command that has to be run to to use it and so I mean that's why bladder was so important to me because I knew about Sphinx I just had no idea how to use it and when I first saw what bladder was and I saw the sample configuration file that said I don't know echo this echo that I mean I thought oh my gosh this is not going to help me at all but then I thought about I said you know what I bet I can do some stuff with this and so I started by figuring out how to make it switch back and forth between different windows I already knew some of the command line tools to switch windows the WMCTRL command I've used in scripts before and that's really good at flipping back and forth between various windows on your desktop and it works either in gnome or in open box which is what I'm on right now and so once I got that going I thought well you know I can at least start applications I can switch to them I can quit them and before long I had ways to do series of keystrokes like select all copy paste you know switch to this window put something there and once you start the ball rolling you know that you start seeing possibilities where other people might not see them so that bladder has just been awesome to me I was wondering whether you know to me this really is a great accessibility tool it makes Linux and a speech recognition and dictation and all that in conjunction with the Google web speech tool that's really really important for the dictation part of it but it makes it where I really don't think I will have to boot into windows anymore you do have to do quite a bit of configuration but maybe all of the stuff that I've done could serve as some kind of sample configuration for someone else and you know they can use those commands or they can keep the command part but change the sentence part you know to suit them yeah that's the to me that one of the greatest things about bladder is just the fact that I get to tell it what I want to say for something to happen it's really I find that when I did boot into windows after having used bladder for a while I got so annoyed that I couldn't make it do what I wanted to I had to make it do what they said that I could make it do that makes sense it absolutely does make sense and I may attribute that to vernacular such that in Redmond Washington people may speak a certain way and they would expect to speak to their computer in a certain way and they would expect someone in another part of the world to speak to their computer the same way that people in Redmond Washington speak to their computer and you're saying that bladder allows you to speak the way you speak to your machines exactly and it's not that the what they the commands they have say for desktop navigation are not sensible they are I mean to switch between one window and another you have to say you know switch to Firefox switch to Thunderbird but it's not the easiest thing to say the switch to itch to itch to I mean it's a very strange sound and it gets cumbersome I instead of doing that I say go to Thunderbird go to Firefox go to Chromium go to Hey Buddy to me that's much easier to say and it works perfectly and you're okay with using the two and the reason I ask is I was having problems controlling my lights and I'd say lights fade to blue would be the command but I found myself saying lights fade blue all the time and I was wondering why the command wouldn't run and then I realized that's not the command that I have and I've just been sort of shortening it for speed reasons yeah most of the time when you have the word to t o in one of these commands you have to say it very fast like if I say go to Hey Buddy that's basically how I have to say it go to Hey Buddy go to Thunderbird you don't say go to this go to that because it won't recognize it it'll it'll think you said the word t w o or something and it also doesn't handle pauses and that's one problem that I have with my anthropomorphized household and network and that the first thing I say is the name of the device I'm talking to and then I will pause and give it a command yeah so instead of saying well more naturally I would say Niaoli lights fade red which should fade will actually would do nothing because it's Niaoli lights fade to red but that space between Niaoli and fade is going to be recognized as an end of a sentence yeah that doesn't work so in the unnatural way I would have to say Niaoli lights fade to red see and I find that very unnatural and one of the problems that you're going to have with that is Niaoli is not the easiest thing to say actually for me it flows pretty well and it gets picked up very easily from all of the devices I use that are running blather including my Nokia N900 running the sweet sweet Linux Mamo operating system nice yeah it's awesome and I still use it and a lot of people still do I had to get that get put that one in there I wish I had one of those things man it seeing you run these like what you run like cute apps or GTK apps or something on it you can run Python things on it isn't that right oh yeah I've run a GTK apps written in Python or Vela I'll run QT apps written using PySide which is Python for QT I'm running blather on it and blather has a UI for both QT and for GTK and I could use either of those yeah that's awesome that would be so cool you could probably find one at the goodwill for about 10 bucks because you're John you can do that I will certainly keep my eye out for it hey have you one of the the cool tricks that I figured out in launching mumble is setting environment variables in the launching spirit hold on timeout timeout you blather not mumble sorry blather my bad to come to completely different voice applications with a very generic name regarding voice my bad I totally didn't catch that but yeah so when I'm starting up blather I have to use a script because I have to set the G streamer library location and at first that was the only environment variable I set in there but then I pretty soon realized that I could clean up my commands file quite a bit by setting some of very frequently used long commands as an environment variable in there and so I do that with the xvkbd command where you have to have a whole string of options after it so my environment variable keypress equals like xvkbd space hyphen secure space hyphen hyphen tech you know all these things and also I use it to set my text to speech engine currently set at you'll be glad to hear with the arctic voice in what's it festival beautiful voice beautiful voice you know she doesn't pronounce stuff right a lot of the times but it's a beautiful voice yeah like my name I know e speak will pronounce your name right every time e speak will pronounce my name right every time e speak is also good for low-end machines like the raspberry pie yeah however I still prefer the voice of festival it sounds more like the computer on start track I guess you're right there and it fest it's a really nice voice I'm not gonna argue with you there and I'm using it right now because lately e speak has like it'll work for a while and then it'll deteriorate into a bunch of static e buzzes and I really don't know why something to do with puls audio or something I'm really not sure what happens but it doesn't happen with festival so I just switched to that but some of my computers what I do is when I change my commands file I have a script that not only will update the dictionary and all that stuff but it also our syncs with files on two different remote computers and those like one in my daughter's room and one in my office at work and those computers don't necessarily have festival voice installed and so that's why originally in my configuration file or my commands file I had festival or e speak written in the command itself but then as soon as I sync that commands file up with my work computer it didn't work anymore because that one didn't have festival so that's when I got the idea to set the voice in environment variable so I can have a different one on this computer than I do at work and just in the command file just have the in there that's you know dollar sign voice I pipe it through that and then whatever is set as the voice will be used when the command is run that works much much better for syncing the same command set over multiple computers now when I use text to speech I write a script called speak string and then I just simply run speak string and then all the other words that follow it are the string and that's what gets spoken and in speak string the script I will decide okay well on this machine I'm going to use festival and then on this machine I'm going to use e speak and that way when I'm sending the command I can just say oh when this happens e speak or when this when some some command is issued run speak string followed by this series of of words and then it's up to the machine itself or actually me as the programmer on the machine to say okay well this script is run use e speak so if I'm on the Raspberry Pi I'm going to use e speak if I'm on a machine that's nice I'll use festival okay so do you in the script does it check the host name or something to decide like if host name equals this then voice equals that or how do you do it oh no it's it's it's just a bash script right speak string and then in the in the script itself is um use it it will be the command either using e speak to speak the string that is the input or festival okay and so if I don't have that script on a machine then I obviously have to just create it or I will get some sort of error that there's no script of that name okay yeah that's the sort of problem that drove me to use the environment variable for choosing the voice because I actually are sync up my entire duck config slash bladder directory with different computers and so when I add a new text file that has like a bunch of responses like if my son asks me what can we do I'm bored I can give the command what can we do I'm bored and it will choose from a list of 10 things now if I add something to my list and I do my update bladder command it will you know sync up that data file to all the computers that I have and so my work computer can do all the same commands that this one can but it'll use a different voice when I run the command awkward silence Jesper you there I'm still here I don't know if we dropped out or what oh no that's what in radio is referred to as the awkward silence or was it dead space dead area yeah yeah dead air that's not like that I'm not really a podcaster okay nor an oddcaster and being able to fill in that that space can be interesting yeah well this has never been a problem for me you know I'm trained as a historical musicologist and I'm a professor and so I make my living by blathering on about all kinds of stuff and my students are probably more than happy when I shut up at the end of the class period oh and for people who don't know blather means incoherent and babble right pretty much what my students think when I speak to them I'm sure well yeah your your students all get an F right I know yeah yeah you're very keen on failing all of my students well it's not my fault they're doing a shitty job being shitty students and getting bad grades boy you're you're just brutal now yes I am well do we have anything else to talk about with respect to blather I mean how would you suggest people get started with it I would say anyone interested in using blather should go to the guitarist code repository for blather offhand I have no idea what it is I think it's probably guitarist slash blather I've already put a link in the show notes for that oh awesome yeah I've got a link to your site to the guitarist site to the sphinx site to the LM tool on the CMU sphinx site got all kinds of links up there and a link to my video one of the things I've found about making podcasts for hpr is that I I'm much more likely to go through with it and finish it if I make the show notes first before I do any of the recording because the recording is kind of the fun part and the show notes is kind of the tedious part and so before we did this interview here I spent the last couple of days putting together show notes and so it's pretty much ready but yeah well that's awesome yeah one of the links is to the code and so you can download that I put together a list of dependencies for I think they must be the Debian packages and it's not the easiest thing to get going there's no single package to install to make blather work you do have to work at it a little bit and especially the Vader component on Debian can be a little problematic is that the live g-streamer pockets sphinx library I don't remember what it's called but on a couple of machines where I've installed blather I've had a little bit of a hard time getting it to find the pockets sphinx g-streamer libraries but I wrote a little blog post on that problem and so maybe that will help out well if you have a blog post to the to the solution to the problem you should definitely put a link to that in the show notes because anyone interested in running it on Debian is going to need to know how to install it or especially if it's if it's an issue otherwise it's usually when I write code it's I'll go get the code and blah blah don't ask me how to install it blah blah I just want to write code blah blah blah blah blah blah blah blah blah well I'm glad that I the first machine I tried it on was arch because arch by default seems to install much more like many more libraries than Debian does like all the development libraries and stuff like that and on Archit it almost just worked when I started out on Debian I had to work a little harder but it is possible and I have on my paste bin site I think I have it on there a list of all the dependencies that I'm pretty sure you need to make to make bladder work and so maybe I should link to that as well I guess people could just poke around my paste bin site and say I've got like many iterations of my commands file there too so there are tons of examples of different kinds of commands you can use and some of them are essentially scripts that are written out in one great big long line if the scripts get too complicated then I will save them in a special scripts directory but if they're not too complicated I like to keep them in the config files so that way I can share it with people a little easier and I'd like to say when you first asked me about doing a recording about bladder all I could think was man I just wanted to listen to Black Sabbath and turn my lights on and off that's it well in reality what you've done is create for me what's probably the most important tool on my desktop now because it allows me to be so productive without having to use windows and so I got a thank you for that man seems like most of the code I run on my machine is yours like I use hey buddy for all my social networking use mutton chop to play music I use sap to play audio and now the mother of the mall is bladder for me I mean it's it's so important to me you can't even imagine if it went away right now I think I would get down and cry like I would be so upset if I couldn't use bladder anymore well one it's on a public repository two it's gplv3 there three I'm deleting it as we speak oh you're running faster all right well on that note maybe we should call it quits for this episode we're up to more than 35 minutes here but all righty well then hey hpr listeners thanks for listening and do do do do do do do do do do do do do anyone who has the oh sorry one thing I need to say to the hpr listeners this is a question if you have a copy of the sheet music to the hpr theme song don't give it the Jesuit don't don't don't do it because he's gonna play it on bagpipes give it to me I want it if you have it in in lily pond or in mxl or any digital format send it to me don't don't do it because he's gonna put it on bagpipes and then he's gonna make us listen to it maybe Jesuit I can't thank you enough for talking to me and especially for writing bladder you are the man oh you're making me blush luckily I got sideburns of tides that's very cool all right man I'm very welcome I will talk to you later see you on line man all righty take care everybody okay bye you have been listening to Hacker Public Radio at Hacker Public Radio those are we are a community podcast network that releases shows every week day Monday through Friday today's show like all our shows was contributed by a hpr listener like yourself if you ever consider recording a podcast then visit our website to find out how easy it really is Hacker Public Radio was founded by the digital dot pound and the economical and computer cloud hpr is funded by the binary revolution at binref.com all binref projects are crowd-sponsored by lunar pages from shared hosting to custom private clouds go to lunar pages.com for all your hosting needs unless otherwise stasis today's show is released under a creative commons attribution share a line free those own license