hpr-knowledge-base/hpr_transcripts/hpr1284.txt

Episode: 1284
Title: HPR1284: Blather Speech Recognition for Linux: Interview with Jezra
Source: https://hub.hackerpublicradio.org/ccdn.php?filename=/eps/hpr1284/hpr1284.mp3
Transcribed: 2025-10-17 22:56:46

---

Hey everybody, this is John Culp in Lafayette, Louisiana.
And I am doing a special episode for Hacker Public Radio here.
The last couple I recorded were solos and the one before that was an interview and the
one before that was an interview, but getting back to the interviews now.
And with me is programmer extraordinaire Jezra.
Hi everybody, this is Jezra, I'm in Petaluma, California and I'm talking to John Culp
of Remumble and it's pretty good.
Yeah, it sounds pretty good.
We're mumbling and we're also going to be blathering.
Yeah.
So, Jezra, since you are the lead developer of Blather, why don't you tell everybody what
the heck it is?
There is a Python application that wraps around, let's see, Python is a Blather application
that uses G-streamer to wrap around pocket sphinx, which is a speech recognition engine
created, I believe, by Carnegie Mellon University.
And by doing this wrapping around the speech recognition engine, Blather is capable of
running commands when someone who is running Blather speaks a certain sentence or string
of words.
Yeah, that pretty much sums it up.
You know, when you first mentioned something, I remember seeing a notice from you on
Identica or our status net instances or something saying just kind of impassing that you
were doing something with speech recognition and this piqued my interest immediately because
speech recognition is really, really important to me because a few years ago, I had issues
with my wrist and I still do, I got carpal tunnel syndrome and I actually had surgery on
my left wrist to fix it and it's helped somewhat but I still cannot be typing all the time.
It really hurts if I do too much typing and so speech recognition has become crucial to
me for any kind of dictation or anything like that, like long emails, documents I have
to create at work or anything like that.
And up until you came out with Blather, I had to boot into Windows to use either Dragon
Naturally Speaking or to use the built-in Microsoft speech recognition program, which is
actually not all that bad, I mean the functionality is pretty good and the same goes for Dragon Naturally
Speaking but the problem with both of them is, well, two problems, one, they both seem
to bring the system to a grinding halt in terms of resources, they're real resource hogs.
The other is that you're basically stuck with whatever they give you.
If they say to switch applications, you have to use switch to this, switch to that, then
you're stuck with that but the beauty of Blather is that if you know what you're doing
a little bit like I do for scripting and so forth, you can set up basic functionality
to do all of that stuff that the other ones do except make the commands you actually want
to say, not the ones they tell you that you have to say.
And I will say that what I've seen of the videos that you've posted of you using Blather
that is above and beyond anything I ever dreamed of would be done with the software that
I wrote.
So what was your original vision for this thing?
My original vision was really a joke that I've had with my brother for maybe 10 years in
that I just want to be able to come home from work, walk into my place and say computer,
play Black Sabbath and then of course have the machine play some Black Sabbath.
This has never been something that I actually just went ahead and did because I never looked
into speech recognition in Linux as much as I should have.
I always thought it was oh it's sort of behind and what's currently available on say a
Macintosh or the Windows operating system.
And then I saw a tutorial on I believe it was the G streamer site about using pockets
finks from both Vela and from Python.
And I thought oh wow that is it, that is going to let me walk into my house say computer,
play Black Sabbath because I like Star Trek the next generation.
I watch a lot of Star Trek and they just walk into the place say computer, play Black Sabbath
computer, play music and that's what I wanted.
That hands free I'm home, play some damn music for me.
Love it.
And is that working for you right now?
I mean the video that I saw you post was I believe you were making your string of LEDs
do various things with voice commands.
I've actually had a problem with the speech recognition picking up Sabbath.
And I don't call my computer computer computers have names computers are anthropomorphized
like everything else I I have everything is to me is a pit my car my motorcycle my computers.
I don't say hey car let's go to work my car has a name I talk to it.
The motorcycle has a name I talk to it musical instruments have names we talk to them.
Computers have names we talk to them as if they are a child or a pet.
And so to me I have a home network and that home network is who I talk to and I say Neowley
because my nickname of my network is Neowley it's not the name of my network.
And so for me it's Neowley play black Sabbath and Sabbath doesn't get picked up properly
and I don't know why.
You might be able to go into the dictionary file and tweak it a little bit.
I mean that's kind of granular tweaking there I mean what I do the word bladder actually
it has trouble picking up that word bladder both bladder itself and the dictation tool
that I use the Google web speech API tool it always thinks I'm saying bladder and so
I've basically I've got a command where I want to turn off bladder I just say kill bladder
and so and I've just learned if I wanted to do something with bladder I have to say bladder
and actually one of my little text manipulation commands is called fix bladder and so if
I am if I am doing dictation in one of those little Google speech windows and I have to
use the word bladder it almost always probably 80% of the time they think I said bladder
and so I will give a command that says fix bladder and it does a series of keystrokes that
will select everything copy it to the clipboard pipe that out through stream editor said which
will then replace all instances of the word bladder with bladder and put it back on the
screen.
One thing I've tried doing when there's a command that is just not being recognized
or picked up by whatever machine is set up to pick up and is running bladder is to
change the string to be more phonemic.
So that instead of like if I were saying enough I wouldn't write E-N-O-U-G-H I would type
E-N-U-W-F because it would be the phonemes that I want and that would probably match up
a bit better than the bastard language known as English.
You know one thing I've found also with this thing is that sometimes I have to speak
more quickly than I think I might need to, I will have to say the string of words fairly
quickly.
Really quickly.
Yeah or else it will think I've already stopped speaking and that it's a separate string
that it's looking for.
And right there you need to restart bladder.
Restart bladder?
Yes.
So I've noticed that when bladder starts sometimes it will as you said think you've ended
your sentence even though you haven't or it will think you've paused even though you
haven't.
So something like Niali lights on should would be Niali lights on and that is not the
way human speak, but that's the way that the pockets thinks library is picking up the
speech that I'm well speaking.
It's the way it is and I found that actually restarting it tends to put it a bit more
in line with what I want it to do.
Now when you say restart do you mean actually quit out of the whole thing and restart it
or just check the little check box on and off?
I mean I mean pretend it's a Windows machine, turn it off, turn it back on again.
I don't have to do that very much actually, I normally can keep it going for hours at
the time and it works just fine.
Well in that case, shit.
I slower on HPR.
I was wondering how long it would take before we got to the explicit label there so there
it is.
You know I thought of starting the episode with some swearing just to keep the pace
going.
Yeah.
I guess it just didn't happen.
Well you know, that's all right.
There was something I was going to, I'd lost my train of thought there.
Man are you driving?
It happens.
Are you standing out in the driveway or something?
No, I'm sitting in my living room which is right next to one of the main thoroughfares
in this little area of town.
Okay.
And so the threshold on mumble is set just to the edge of my voice.
So if a car goes by, it's most likely going to get picked up by mumble.
However, the microphone that I'm using to record with audacity might not pick up that audio
which is a feature of this nice cheap karaoke microphone.
Oh one of those things, that's funny.
My daughter has a karaoke machine and I'm looking right now at one of those little dynamic
microphones like you're talking about and yeah, those normally won't pick up stuff from
much further away than two or three feet I would think.
I'm using a little clip-on condenser mic here so it'll pick up various ambiance sounds
in my detached office.
I actually turned off one of my servers over there to try and reduce the amount of sound
in the room.
Turned off the air conditioner as well so I hope it doesn't get too hot in here.
Yeah, I would think that turning off the air conditioner in Louisiana in summertime may
not necessarily be the best idea.
Well, it's certainly not a good idea during the day but after dark like right now it's probably
okay.
I think I'll survive to the end of our conversation.
Awesome.
So I certainly hope that you will be posting the links to the videos that you've made of
you using Bladder.
I have at least the big long intro video where I've got the slides going and the picture
of you and all that and the music in the beginning and I figure from there people can, you know,
if they want to see more they can click on related things or look at my list of videos.
Did you get a chance to watch the one I made today yet?
You know, I haven't.
I've been driving all day long.
Oh man, bummer.
It's pretty amazing.
The thing I did today is, I don't know, it gets it where it's almost as good as dragging
naturally speaking for my purposes.
What I did was I created a command that will, whenever I say the word dictation box, it
will open up a new instance of chromium with a pixel size of 600 by 400 and it is opened
up as an app.
So it comes up in its own little window.
It just pops out a little box and I do a series of virtual mouse movements and clicks
and so forth to turn on the microphone and start listening almost immediately.
And so you talk and talk and talk and it uses the Google Web Speech API and you can see
your words just spitting out onto the screen as you talk.
And then when you are done talking, you just say, stop talking and that's the command that
will tell it to do another virtual mouse click on the button that stops it and so that
stops listening and then one more command called transfer text will copy all that and
then flip back to the previous app you were in when you said dictation box and paste all
the text in there and it works great.
I mean, to me, it works better than the drag and naturally speaking one in terms of system
performance.
It's very fast.
What it doesn't do is it doesn't allow you to do the kind of very detailed editing of
text with your voice.
That's something that is not all that possible with this system.
Although I do actually do quite a bit of text editing with my voice using the speech
recognizer app that I don't remember who the guy is who made it but it's a little add
on that you can get for Chromium and I will, if I need to work on a big block of text,
I will put it in there and then I can just select a word and then give it a command like
capitalize this or make uppercase and that will make it all caps or it will capitalize
it.
I can have some commands now where I can select a what is the sequence something like
I select a URL and copy it to the clipboard and then I will go back to something I'm working
on let's say some HTML and then I'll select the text in there that I want to be the
link text and then I'll give a command that says insert hyperlink and what it will do
is write out all the HTML code that you need to put in the URL and then put the link text
in between the two brackets and yeah just it simplifies a lot of the that kind of stuff
that I have to do frequently when I'm working on my syllabi for classes or doing various
other things.
I can't tell you how many keystrokes Blather has saved me I mean it must be in the hundreds
of thousands by now but yeah and that is amazing because that was totally not my intention
when I wrote the software.
I wanted to do some home automation I wanted to send commands to web APIs of machines about
my house and you've taken that and I'm going to say you totally hacked it and that's totally
awesome.
That's fucking awesome is what it is you took something that wasn't really meant well
I guess it was meant to send commands that's what Blather was meant to do take someone's
voice convert that into a command and run that command my vision of it was something very
simple and you took that ball and you fucking ran with it and you made it this thing that
is I'm blown away I'm absolutely blown away you know you might have done the same thing
with it if you had the problem that I have my problem is repetitive strain injuries repetitive
stress injuries if you were always looking for ways to reduce keystrokes the way I am you
probably would have seen the same possibilities here I immediately saw this as an accessibility
tool not as a fun to and I use it for fun too I've got a bunch of little scripts and commands
that I use where I ask it like what time is it and it will run my what time is it script
and it'll do a coin flip that like a virtual coin flip and depending on whether it gets
a zero or a one it will either tell me what time it is or it'll give me a smart ass remark
like time to get a move on or something like that it chooses it chooses from a list
of predefined responses and it shuffles them and chooses one randomly I've got ones
where I'll ask it what's for dinner and I've got a text file that has 15 or 20 possibilities
of what might be for dinner and it shuffles all those and chooses the top one and then
a voice a text to speech engine will speak it to me and now and that is home automation
when you have the computer speaking to you yeah it's fun too you know I might do a an
HPR episode like a real short one where I have a conversation with my computer using
mumble and I can I can keep it going a pretty good while because I've got it doing all kinds
of things I can ask it how's the weather you know what's for dinner how are you today what
time is it you know I've got all these commands and it puts back a different response almost
every time then I think you should go ahead and do that I almost certainly will yeah you know
I got admit when you first posted this I had heard of Sphinx before because I'd done research
on speech recognition you know when I realized that my problem really was not going to go away
and I knew that I would have to come up with some kind of speech recognition solution I tried
desperately not to have to resort to windows and so I looked into various things on Linux and I
found Sphinx but Sphinx is not something you can just use you know you got a you needed bladder
you needed something to give you a way to start it and have the very complex long command that has
to be run to to use it and so I mean that's why bladder was so important to me because I knew
about Sphinx I just had no idea how to use it and when I first saw what bladder was and I saw the
sample configuration file that said I don't know echo this echo that I mean I thought oh my gosh this
is not going to help me at all but then I thought about I said you know what I bet I can do some
stuff with this and so I started by figuring out how to make it switch back and forth between
different windows I already knew some of the command line tools to switch windows the WMCTRL
command I've used in scripts before and that's really good at flipping back and forth between
various windows on your desktop and it works either in gnome or in open box which is what I'm on
right now and so once I got that going I thought well you know I can at least start applications I can
switch to them I can quit them and before long I had ways to do series of keystrokes like select
all copy paste you know switch to this window put something there and once you start the ball rolling
you know that you start seeing possibilities where other people might not see them so
that bladder has just been awesome to me I was wondering whether you know to me this really is a
great accessibility tool it makes Linux and a speech recognition and dictation and all that
in conjunction with the Google web speech tool that's really really important for the dictation part
of it but it makes it where I really don't think I will have to boot into windows anymore you do
have to do quite a bit of configuration but maybe all of the stuff that I've done could serve as
some kind of sample configuration for someone else and you know they can use those commands or they
can keep the command part but change the sentence part you know to suit them yeah that's the to me
that one of the greatest things about bladder is just the fact that I get to tell it what I want to
say for something to happen it's really I find that when I did boot into windows after having used
bladder for a while I got so annoyed that I couldn't make it do what I wanted to I had to
make it do what they said that I could make it do that makes sense it absolutely does make sense
and I may attribute that to vernacular such that in Redmond Washington people may speak a certain
way and they would expect to speak to their computer in a certain way and they would expect
someone in another part of the world to speak to their computer the same way that people in
Redmond Washington speak to their computer and you're saying that bladder allows you to speak the
way you speak to your machines exactly and it's not that the what they the commands they have say for
desktop navigation are not sensible they are I mean to switch between one window and another you
have to say you know switch to Firefox switch to Thunderbird but it's not the easiest thing to say
the switch to itch to itch to I mean it's a very strange sound and it gets cumbersome I instead of
doing that I say go to Thunderbird go to Firefox go to Chromium go to Hey Buddy to me that's much
easier to say and it works perfectly and you're okay with using the two and the reason I ask is
I was having problems controlling my lights and I'd say lights fade to blue would be the command
but I found myself saying lights fade blue all the time and I was wondering why the command wouldn't
run and then I realized that's not the command that I have and I've just been sort of shortening it
for speed reasons yeah most of the time when you have the word to t o in one of these commands you
have to say it very fast like if I say go to Hey Buddy that's basically how I have to say it
go to Hey Buddy go to Thunderbird you don't say go to this go to that because it won't recognize it
it'll it'll think you said the word t w o or something and it also doesn't handle pauses and that's
one problem that I have with my anthropomorphized household and network and that the first thing I say
is the name of the device I'm talking to and then I will pause and give it a command yeah so
instead of saying well more naturally I would say Niaoli lights fade red which should fade will
actually would do nothing because it's Niaoli lights fade to red but that space between Niaoli
and fade is going to be recognized as an end of a sentence yeah that doesn't work so in the
unnatural way I would have to say Niaoli lights fade to red see and I find that very unnatural
and one of the problems that you're going to have with that is Niaoli is not the easiest thing to say
actually for me it flows pretty well and it gets picked up very easily from all of the devices I
use that are running blather including my Nokia N900 running the sweet sweet Linux
Mamo operating system nice yeah it's awesome and I still use it and a lot of people still do
I had to get that get put that one in there I wish I had one of those things man
it seeing you run these like what you run like cute apps or GTK apps or something on it you can
run Python things on it isn't that right oh yeah I've run a GTK apps written in Python or Vela
I'll run QT apps written using PySide which is Python for QT I'm running blather on it
and blather has a UI for both QT and for GTK and I could use either of those yeah that's awesome
that would be so cool you could probably find one at the goodwill for about 10 bucks because
you're John you can do that I will certainly keep my eye out for it hey have you one of the the
cool tricks that I figured out in launching mumble is setting environment variables in the launching
spirit hold on timeout timeout you blather not mumble sorry blather my bad to come to completely
different voice applications with a very generic name regarding voice my bad I totally didn't
catch that but yeah so when I'm starting up blather I have to use a script because I have to set
the G streamer library location and at first that was the only environment variable I set in there
but then I pretty soon realized that I could clean up my commands file quite a bit by setting some of
very frequently used long commands as an environment variable in there and so I do that with the
xvkbd command where you have to have a whole string of options after it so my environment variable
keypress equals like xvkbd space hyphen secure space hyphen hyphen tech you know all these things
and also I use it to set my text to speech engine currently set at you'll be glad to hear with
the arctic voice in what's it festival beautiful voice beautiful voice you know she doesn't
pronounce stuff right a lot of the times but it's a beautiful voice yeah like my name I know
e speak will pronounce your name right every time e speak will pronounce my name right every time
e speak is also good for low-end machines like the raspberry pie yeah however I still prefer
the voice of festival it sounds more like the computer on start track I guess you're right there
and it fest it's a really nice voice I'm not gonna argue with you there and I'm using it right
now because lately e speak has like it'll work for a while and then it'll deteriorate into a bunch
of static e buzzes and I really don't know why something to do with puls audio or something I'm
really not sure what happens but it doesn't happen with festival so I just switched to that but some
of my computers what I do is when I change my commands file I have a script that not only will update
the dictionary and all that stuff but it also our syncs with files on two different remote computers
and those like one in my daughter's room and one in my office at work and those computers don't
necessarily have festival voice installed and so that's why originally in my configuration file or
my commands file I had festival or e speak written in the command itself but then as soon as I
sync that commands file up with my work computer it didn't work anymore because that one didn't
have festival so that's when I got the idea to set the voice in environment variable so I can
have a different one on this computer than I do at work and just in the command file just have the
in there that's you know dollar sign voice I pipe it through that and then whatever is set as the
voice will be used when the command is run that works much much better for syncing the same
command set over multiple computers now when I use text to speech I write a script called speak
string and then I just simply run speak string and then all the other words that follow it are the
string and that's what gets spoken and in speak string the script I will decide okay well on this
machine I'm going to use festival and then on this machine I'm going to use e speak and that
way when I'm sending the command I can just say oh when this happens e speak or when this when
some some command is issued run speak string followed by this series of of words and then it's
up to the machine itself or actually me as the programmer on the machine to say okay well this
script is run use e speak so if I'm on the Raspberry Pi I'm going to use e speak if I'm on a machine
that's nice I'll use festival okay so do you in the script does it check the host name or
something to decide like if host name equals this then voice equals that or how do you do it oh no
it's it's it's just a bash script right speak string and then in the in the script itself is
um use it it will be the command either using e speak to speak the string that is the input
or festival okay and so if I don't have that script on a machine then I obviously have to just
create it or I will get some sort of error that there's no script of that name okay yeah that's
the sort of problem that drove me to use the environment variable for choosing the voice
because I actually are sync up my entire duck config slash bladder directory with different
computers and so when I add a new text file that has like a bunch of responses like if my son
asks me what can we do I'm bored I can give the command what can we do I'm bored and it will
choose from a list of 10 things now if I add something to my list and I do my update bladder command
it will you know sync up that data file to all the computers that I have and so my work computer
can do all the same commands that this one can but it'll use a different voice when I run the command
awkward silence
Jesper you there I'm still here I don't know if we dropped out or what oh no that's what in
radio is referred to as the awkward silence or was it dead space dead area yeah yeah dead
air that's not like that I'm not really a podcaster okay nor an oddcaster and being able to fill
in that that space can be interesting yeah well this has never been a problem for me you know I'm
trained as a historical musicologist and I'm a professor and so I make my living by blathering
on about all kinds of stuff and my students are probably more than happy when I shut up at the
end of the class period oh and for people who don't know blather means incoherent and babble right
pretty much what my students think when I speak to them I'm sure well yeah your your students
all get an F right I know yeah yeah you're very keen on failing all of my students well it's not
my fault they're doing a shitty job being shitty students and getting bad grades boy you're you're
just brutal now yes I am well do we have anything else to talk about with respect to blather I mean
how would you suggest people get started with it I would say anyone interested in using blather
should go to the guitarist code repository for blather offhand I have no idea what it is I think
it's probably guitarist slash blather I've already put a link in the show notes for that
oh awesome yeah I've got a link to your site to the guitarist site to the sphinx
site to the LM tool on the CMU sphinx site got all kinds of links up there and a link to my video
one of the things I've found about making podcasts for hpr is that I I'm much more likely to go
through with it and finish it if I make the show notes first before I do any of the recording because
the recording is kind of the fun part and the show notes is kind of the tedious part and so before
we did this interview here I spent the last couple of days putting together show notes and so
it's pretty much ready but yeah well that's awesome yeah one of the links is to the code and so you
can download that I put together a list of dependencies for I think they must be the Debian packages
and it's not the easiest thing to get going there's no single package to install to make blather work
you do have to work at it a little bit and especially the Vader component on Debian can be a little
problematic is that the live g-streamer pockets sphinx library I don't remember what it's called
but on a couple of machines where I've installed blather I've had a little bit of a hard time
getting it to find the pockets sphinx g-streamer libraries but I wrote a little blog post on
that problem and so maybe that will help out well if you have a blog post to the to the solution
to the problem you should definitely put a link to that in the show notes because anyone
interested in running it on Debian is going to need to know how to install it or especially if it's
if it's an issue otherwise it's usually when I write code it's I'll go get the code and blah blah
don't ask me how to install it blah blah I just want to write code blah blah blah blah blah blah blah blah blah
well I'm glad that I the first machine I tried it on was arch because arch by default seems to
install much more like many more libraries than Debian does like all the
development libraries and stuff like that and on Archit it almost just worked
when I started out on Debian I had to work a little harder but it is possible
and I have on my paste bin site I think I have it on there a list of all the
dependencies that I'm pretty sure you need to make to make bladder work and so
maybe I should link to that as well I guess people could just poke around my
paste bin site and say I've got like many iterations of my commands file
there too so there are tons of examples of different kinds of commands you can
use and some of them are essentially scripts that are written out in one great
big long line if the scripts get too complicated then I will save them in a
special scripts directory but if they're not too complicated I like to keep
them in the config files so that way I can share it with people a little easier
and I'd like to say when you first asked me about doing a recording about
bladder all I could think was man I just wanted to listen to Black Sabbath and
turn my lights on and off that's it
well in reality what you've done is create for me what's probably the most
important tool on my desktop now because it allows me to be so productive
without having to use windows and so I got a thank you for that man seems like
most of the code I run on my machine is yours like I use hey buddy for all my
social networking use mutton chop to play music I use sap to play audio and now
the mother of the mall is bladder for me I mean it's it's so important to me
you can't even imagine if it went away right now I think I would get down and
cry like I would be so upset if I couldn't use bladder anymore well one it's
on a public repository two it's gplv3 there three I'm deleting it as we speak
oh you're running faster all right well on that note maybe we should call it
quits for this episode we're up to more than 35 minutes here but all righty well
then hey hpr listeners thanks for listening and do do do do do do do do do do do do
do anyone who has the oh sorry one thing I need to say to the hpr listeners this
is a question if you have a copy of the sheet music to the hpr theme song don't
give it the Jesuit don't don't don't do it because he's gonna play it on
bagpipes give it to me I want it if you have it in in lily pond or in mxl or any
digital format send it to me don't don't do it because he's gonna put it on
bagpipes and then he's gonna make us listen to it maybe Jesuit I can't
thank you enough for talking to me and especially for writing bladder you are
the man oh you're making me blush luckily I got sideburns of tides that's
very cool all right man I'm very welcome I will talk to you later see you on
line man all righty take care everybody okay bye
you have been listening to Hacker Public Radio at Hacker Public Radio those
are we are a community podcast network that releases shows every week day
Monday through Friday today's show like all our shows was contributed by a
hpr listener like yourself if you ever consider recording a podcast then visit
our website to find out how easy it really is Hacker Public Radio was founded by
the digital dot pound and the economical and computer cloud hpr is funded by
the binary revolution at binref.com all binref projects are crowd-sponsored by
lunar pages from shared hosting to custom private clouds go to lunar pages.com
for all your hosting needs unless otherwise stasis today's show is released
under a creative commons attribution share a line free those own license