Files
hpr-knowledge-base/hpr_transcripts/hpr3219.txt
Lee Hanken 7c8efd2228 Initial commit: HPR Knowledge Base MCP Server
- MCP server with stdio transport for local use
- Search episodes, transcripts, hosts, and series
- 4,511 episodes with metadata and transcripts
- Data loader with in-memory JSON storage

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-26 10:54:13 +00:00

740 lines
56 KiB
Plaintext

Episode: 3219
Title: HPR3219: Linux Inlaws S01E18: Voice Recognition and Text to Speech
Source: https://hub.hackerpublicradio.org/ccdn.php?filename=/eps/hpr3219/hpr3219.mp3
Transcribed: 2025-10-24 19:04:38
---
This is Haka Public Radio episode 3,219 for Thursday, 3rd of December 2020.
Today's show is entitled, Linux In-Law Ness Nero 2018.
Voice recognition and text to speech and in part of the series, Linux In-Law, it is hosted by Monochrome
and is about 77 minutes long and currently the next visit flag.
The summary is, how to place fake prank calls into podcasts and what does TTS have to do with this.
This episode of HPR is brought to you by archive.org.
Support universal access to all knowledge by heading over to archive.org forward slash donate.
This is Linux In-Law.
A podcast on topics around free and open source software, any associated contraband, communism,
the revolution in general and whatever else, fans is theoretical.
Please note that this and other episodes may contain strong language, offensive humor,
and other certainly not politically correct language.
You have been warned.
Our parents insisted on this disclaimer.
Happy mum?
That's the content is not suitable for consumption in the workplace, especially when played back on a speaker in an open plan office or similar environments.
Any miners under the age of 35 or any pets including fluffy little killer bunnies,
you trusted guide dog unless on speed and Q to T-Rexes or other associated dinosaurs.
You may want to get in trouble now.
Welcome to Linux In-Law.
Season 1 episode IT.
What?
Sorry.
Okay, cut.
This is Linux In-Law.
Season 1 episode 18.
The one with the text to speech and the speech to text.
Martin, how are things?
Do I have a phone ringing Martin?
I think that's your phone actually.
Do you want to answer that?
Why don't you pick it up Martin?
I think it's for you.
It's I think it's got your name on me.
Thank you.
Hello.
Welcome to Rainbow Escorts.
Do you speak with the doctor?
Hello the doctor.
How are you?
Good evening, Chris.
How are you?
You're back again?
Well, I couldn't say no to the sun.
Thank you.
Thank you.
Thank you.
Hello.
Welcome to Rainbow Escorts.
Do you speak with the doctor?
Hello the doctor.
How are you?
Good evening Chris.
How are you?
You're back again?
Welcome to Rainbow Escorts.
See that again.
I couldn't say no to the sun.
What am I going to do for you?
Listen.
You're not going to blow that.
You were going to blow your fist.
That's it.
You were going to blow
That's it.
The What, the what?
You may blow after the what?
What can I do for you?
The what?
the What, the what?
Aren't you going to blow
The what?
The what?
I am going to blow.
You think Robert the thing is he German doesn't cut it right. That's not bad. I mean I get the I get the over on idea and that was actually quite good.
But you see now the beauty comes into play when you basically combine this when you combine this with the proper i.e. not something stitched up.
What do you mean by proper?
Well, she wasn't responding to my statements.
Okay, this is this is the next step.
Okay, before we before we go any further listeners, maybe a little context this episode is about text speech and voice recognition.
A mindset subject of long talk. That's why you think.
Okay, I said it would be correct and fair enough.
Something that Martin has been looking forward to for what half a year or something.
Like we pretty much did the episode with the Terminator interview you a long time distance will recall it was I think someone in March of this year.
It's in the show knows people look it up.
So Martin has been nagging me ever since to do a special on text speech and speech recognition.
And what you just heard was actually a impersonation by Emma or beloved Emma Emma if you're listening to this.
No, it must be.
Sorry, the attention.
No, I wasn't paying attention to the fact.
The art if you're listening, this is if you're listening, this is not you.
This is just an impersonation. This is very important.
Let's see.
Okay, anyway, it's the only line.
So it's the only line you haven't finished with there.
Right.
That's what I would now cost.
I wish I must have a thousand euros in reach.
Do you speak English too?
For a long time, this is who cannot speak German.
I'm just checking.
We can consider English.
I'm sure we can find some other listeners.
You are about to call you.
Are we going to provide a transcripts Martin because not everyone speaks German?
Yeah, yeah.
Perfect.
So listeners, yes, they will be transcripts.
I don't know why Martin picked German.
I said, maybe Martin, you want to explain this?
I can't help that the artist is German and wants to speak to you.
I can't be a hold responsible for your actions.
Well, yes, the rainbow escorts.
Actually, if you could entice the artist to speak English,
it wouldn't be bad for the rest of the audience.
Okay, we can sort that at some point.
In the meantime, I think, is that the phone again?
It is the phone again.
I think it's for you again.
I'll pass it to you.
Okay.
Shall we go again?
Are you not going to say hello?
Hello.
Sorry.
Where am I?
Where am I?
Where am I?
I don't know.
I, I, I.
Corner.
Is this corner?
You don't have telephone in Germany.
What's going on here?
It is welcome to the show, corner 19.
You have reached Linux in-laws.
The show, the phone in show,
where anonymous callers can, can call in
and talk about their recent shenanigans.
I don't think this one is, is it one of those actually?
Anyway, here we go.
Here we go.
Hello, Chris.
This is Dennis, the CFO,
an invoice across my desk yesterday.
It is from a company called Rainbow Escorts.
Now you know this is clearly not related to your Linux
in law's work.
I suggest you post it to reader slabs.
And here they have more of a party there.
Good fine.
I thought we wanted to leave
card and pass and ...
Pass employees out of this.
Redis claps.
If we sing Love, it is that.
There's no one in there.
There's no one in there.
This is not a tutor work.
I don't know.
I'm not neglecting.
I'm not claiming expenses for dubious entertainment gigs.
Okay.
Have a suggestion from Dennis.
I think anyway feel free to do that. Oh fucking yeah post-production this is going to be cut out right very walking
hello this is little's in last the college oh welcome caller 21 what can I do for you?
hello Chris this is Dennis the CFO again I need some clarification on this purchase order from
a mrd trump for election result improvement services now I appreciate you are trying to boost
the company funds but five dollars is not going to balance the books come back when you can bring
me some proper six-figure purchase goodbye Dennis you want to hang up Dennis you want to take this
up with the Russians the Russian yes well why don't you elaborate on the Russian well it's
quite straightforward right they they did the previous well mastering of the electricity let's call
it this way so there might as well pony up now right so this this is a completely erroneously
posted invoice purchase order I see nothing related to your work gratitude and why does trump
feature all of a sudden on the show I thought was an open source show Martin I can't help
he's calling in it's just you know your your extra curricular activities obviously something to do
with this okay let's bring on the next call a bit you think there's more code oh hey I mean
maybe you're right yeah oh very goes again this is little's in the heart sorry you want to pick
it up now not I'll pick it up yeah hello hello always for christian okay there you go I'll
passing through this is this is living in law is called a 21 welcome to the show what can do for you
ah a mute caller excellent this is manual in production disorder for a hundred kilos
tomorrow you think I am a magician and I hope you have an actual customer for it this time
as the boss I'm gonna be pleased Pablo old friend I thought you were dead
this is my knowledge yeah I think it is so I didn't get that sorry man well you would be public
son right yeah so I don't know if it's referring to some sort of
man I don't sweat it the the cash is on the way don't worry if I understand Martin's email
it's correctly that is damn Martin you want to turn off the show for a minute I don't know
it's not uh yes we're not getting ran to do in my town we were doing it we're not reading
this is limits in laws the college show for weirdos and other co-hosts this is called a 22 what
can I do for you college 22 I don't understand this okay I thought you'd have some
experience with all your dealings with them by the sound of it maybe I'm the second
the working language of the cutter is actually English believe it or not
it's the international operation so don't sweat it right I was done with the phone calls or
what do they know you tell me um Martin you can disconnect this gadget you can actually
I think there's another one yes pick it up please oh here we go again
hello that was for Chris again okay yeah here he comes
time to speak to the uh sorry yeah this is lose in laws the um
college show for weirdos and other people then your college 23 go please go ahead
oh this is Shannon you may not remember me but if I mentioned with sheen in social I think you
will bring any bells now no school of nursing and midwifery make it a little okay okay it was in 1994
and we're both old now but you need to know this you have a daughter and I've run out of money
to support her so send the ruler now I will send you my bank details I buy and thanks for the fun
shout shout and just say on the line I'm terribly sorry but you do have to get in line the trouble is
there are about 50 people waiting in front of you but no matter what you'll be served in about
20 years time so no sweat but thanks for the call anyway what I'm not impressed
and the other mystery call a smart though we are the subject I think Shannon's still in the line actually
all right Shannon go ahead sorry I didn't get that you are not paying me sorry I will pay you but as I
said you have to get in line I'm afraid as enjoying the queue I didn't get enough
that's enough for a chance Shannon I love you I've always loved you you want a special one but unfortunately
as I said they're quite a few women in front of you so no sweat did you all have daughters as well too
well for some I know for some I don't so that's the case okay you want to get to the bottom of this
or should I just take more random callers oh it's up to you I can turn the phone off I feel like
that would be a brilliant idea Mark all right let me let me do that otherwise yeah perfect
it may not get round to the well there you go thank you the call center of course can take
their details so they will be gotten back to them at some stage but they're sharing next week
or the month after but eventually yes right I'm sure to be very happy with this
well I mean I'm not sure about Pablo do you mentioned something about 10 tonight so you may
want to deal with him a bit quicker and see hole in onto your bits of anatomy okay for enough
Mark that has been more than impressive how to elaborate on how you did it well you know I just
pick up the phone right oh dear that concludes episode 18 so I thought it was 20 but yes
you're right 18 is no 20 would be some Christmas special 19 is coming up soon indeed yes and a
big teaser that would be about relis fun enough but anyway about okay now with this phone get
the tree out of the way do you want to get onto the main subject of the show yeah okay
which is speech recognition and synthesis what are quite different subjects right I mean
yes they are oh we can touch on the various acronyms first if you like there's obviously
speech detects right we um you may be familiar with this if you're using say um google chat so
learn that that has for example the option to uh transcribe what people are saying as you go
along which is actually pretty decent let's say this is also known as speech recognition
button there's also known as speech recognition yes yes but a speech record among friends
well it's it's yeah speech but yes recognizes first to change the text what do we need to do with it
but um uh with all the gadget in your house I don't know if you have Alexes in all these kind of
nonsense all right that's obviously the first you know why should I well exactly exactly I mean
uh for the list of time yeah for the list of small old enough i mean which record has been around
for at least 40 years i've been around long with it's not in an artificial form no
she would be a bit tricky computer based speech recognition now man
no fun enough basically i first came came in touch with them with technology what i was working
at a company called fidelity investments about 20 years 20 plus years ago when i was project
managing um cti technology as a computer telephony integration and the record technology of
of choice was then something called nuance which was quite advanced at the time because about
more than 20 years ago actually that nuance technology could differentiate between
Texan and East Coast accents which are from quite impressive yeah i mean if you um you know english
as a language is obviously quite well-defined but um even in the UK we have a million different
little uh ways of speaking differently the of course the thing is that wasn't English that was
American yes indeed English American American English American English New Year
yeah so yes has been around for a while but uh more recently become more um i know more
popularized i guess with this you know uh cars devices in your home all these other things right
yeah but i mean if you are such engine company that has got lucky and if you have a few billion
of dollars at your disposal you might wear as well poured into fancy AI technology and speech recognition
would be one of them i suppose yeah well they are so i mean uh are you referring to google here
or yes i am so they just bought a company right in back in 2014 the bought DeepMind which was a
UK startup and well they that keep buying their stuff is based on they keep buying
they keep buying companies all the time i mean yeah so uh where were we um where this
we were discussing the origins of speech recognition which of course back to the first
projects i think were done in something called the MIT artificial intelligence lab back i think
in the 60s we were understating but no no you were i mean i don't know if i may have mentioned
this on on a previous show but um my father-in-law another in law was a uh a speech researcher at a
UK um let's say government agency which we cannot mention but yeah it's it's been you know
what's called blind blind it's from blindy park it's blithy park or something
no i don't know i was the second one more one the um do you have you heard of Cheltenham
um what have i have i have a lot of what of Cheltenham a town in the city
british it's the next little keen snow no
oh this is this is peace note milton keens does exist it's still i mean yes i mean if if you
hear people talking about it doesn't exist no problem about the map go there they need the money
especially after Brexit just make a point visit milton keens that that's the important thing
milton keens town council if you want to get in touch about supporting the show
yes by all means in the uh i thought that i thought that i was considering just celebrate to
feedback as you know in law starting you and we will continue your your donation no sweat alternatively
bring christ how do you here we go again um anyway so bring mom anyway yeah exactly sorry
Cheltenham town uk um has a famous round building in it which you may be no for the man
no okay right anyway this is where the um a uk listening happens on everything okay one
a government might be interested in um milkept secret nobody knows where it is except there's a
large round building so yeah so yeah anyway so why does this stuff around right because
having humans is expensive and they don't work 24 hours a day so if you can automate this
great okay
um so lots of different um research has been done in this field there
but i think rather than from don't know what you have found but i think google are probably the
before running all of this um in terms of research and not so much open sourcing everything
and but deep speech is open source right uh deep speech is open source but deep speech is
absolute rubbish okay in comparison in that case you never mind you will you will find the
link of the show notes yes yeah so you know if you if you want to use deep speech to to do
transcribe your recordings uh good luck but it won't be anything usable in the um in fact what we
can do is do a transcription of the show and see which is i'm just i'm tempted yes so
um and yeah so advances in hardware um obviously these kind of problems are
quite suitable to GPUs and video have jumped on this um quite heavily as well um
um uh making there um you know because there is a large amount of of learning required for
it's if you want to do it from um from deback that makes sense we want to do it from from uh if you
just want to not buy a piece of software that does it for you you want to run uh you run project
um there are some pre-trained models out there um but you will not get the same results as and
as a google right that that's uh the main answer so you know these google's have the resources
they have the hardware to do millions and millions of hours of training to get it to the stage that
you know they can do that it's kind of level which is is reasonably uh impressive but so i'm still
i mean correct me from wrong but i think deep speech is based on something because tons of flow
so okay so deep speech well deep speech is not actually um text speech five deep speech is the
around um and it's speech synth is essentially there are these features there the project that takes
a uh an audio file and transcribe it so it's speech and condition okay yeah yeah sorry we were
talking about various acronyms what we saw so one thing is obviously um one way around right you
want to be able to uh uh turn speech into texts which is handy for um you know your search engines
your algorithms they are a bit tricky to run on um on soundbites so if you transcribe into text
you can have tanks and you can obviously search things easier and so on so so one of the reasons to
go and um and also i mean you work in sales they're new at a certain company um i think i do
yeah and a lot of these organizations are starting to record conversations with customers and
transcribing them okay to you know feed into their sales forces and whatever they used
which obviously kind of uh yeah begs the question of legality of these kind of practices but
it is a practice those seems to be quite prevalent in your modern sales company
were you not familiar with that okay well i heard about the concept yes but that's it
i don't know um i suppose second yeah yeah yeah yeah yeah yeah yeah yeah yeah yeah yeah yeah yeah yeah yeah
i mean a special edition of course does have its merits if you're talking about bots and stuff
because there are two principles where they can interact with a bot you can type in stuff
in a window or you can actually speak to it well it's it's bots it's uh you know your things around
a house that you want to turn your light on or stuff like that if you don't want to get up and press
button um but also in cars right where people are actually um driving they should keep their
hands on the wheel unless their car is driving poor but it's the whole different question altogether
so there are some advantages of um i mean given the fact that deep speech is actually
if that's correct but um let's assume for the moment it is uh based on tender flow it would be
just another example of something called a domain-specific framework like carrots for image
recognition deep speech for speechbreakable so sorry um for the listeners who don't know what
tender flow is it's essentially a back propagation uh network infrastructure where you can train
your back your your back propagation network in order to to recognize patterns because this is
basically at the end of the day what these um domain-specific frameworks fall down to
you essentially recognize patterns whether it's something in a picture that might be a human that
might be a laptop that might be a smartphone or whether it's a waveform or whether it's a it's a
it's a sorry um i'm missing the i'm i'm missing the theory here whether it's a diphthong
it's is that what i'm looking for it's essentially a component of a of a sound or a sound
that makes a language and again that will be a pattern and a waveform that a back propagation
network simply picks up yeah yeah yeah yeah yeah yeah so uh i'm okay so i mean if you mentioned
tens flow you have to mention PyTorch as well right so there's other two uh tens flow of the
originating from google um PyTorch being the other library that is used for these cover purposes um
and um yeah so the whole uh there's different ways to doing uh if we
sorry we finished but if we're going on to text the speech for example there are
different ways of doing it um if you think about um say uh do you have ever used an apple phone
to have a what an apple telephone oh no you don't have telephones in Germany sorry i forgot
yes um no we just use mobile
and we handy it's handy yes we did away with the fixed lines
it's the handy yeah who came up with that name i do probably some dashkai i can't remember
we don't call them handy's in holland we don't know it's pure German thing you know you can't
blame us for that um anyway so uh German inventor of the name German name for telephone
please let's say here we go okay oh maybe here we'll bring into the
oh no here we go again anyway where will we so um
yeah that happened i would have to plug it back in don't mind don't
turn off any interest of uh finishing the episode
maybe i'll just be redirected to your number that's pretty much better idea than having me
and this calls for some automation and there won't be an extended burden of the show with all the
sound parts that we cut out because that's where it is post production or not
yeah um okay so if we're talking about the uh Texas speech partner um there's two ways of
of people have approached this well probably more than two but the most uh two most used ones
it seems are um uh like i mean so so my reference to uh apple telephones was Siri obviously right
so Siri um uses a um a way to generate sound by sticking individual syllables together so it has
sounds um four pieces of words right so whichever language if you you can cut it into into
small little pieces of sound bytes and then uh Siri just kind of sticks these together right so
that's one way of doing it which is completely different to um uh to wave net which is really
well powers google's uh tts um on the synthesis side so where the actual uh audio is being produced
so you have yeah um which which is using neural neural networks instead um
um to create this speech uh and it's it's really um doing creating something called a
male spectrogram which is really uh your uh your representation of um your audio um in a uh in a
neuron let's call it that but um so it's it's it's a more um yeah uh a more crude approach compared to
well crude is using the power off of machine learning rather than just saying a uh doing a rule-based
type approach and i'll have a lot a lot of bit meal bits of words and i'll stick them together and
i'll make sentences and stuff um yeah so uh google most advanced in this it seems um
the likes of as i mentioned Nvidia they have a lot of uh you know they they're trying to sell
their GPUs right and they're trying to sell their GPUs not just to people playing games
but also because there's only so many um uh desktop computers or even um laptops with Nvidia GPUs
they can sell whereas in the enterprise if you can apply or sell GPUs there then you can stick
multiple on the machine and then you can make them bigger and faster and people can buy even more
of them they um they even have a uh or uh portable data center where you can plug
bunches of these GPU machines together and great is of uh a great big GPU lamp which is what are they
sorry we're intending to do uh or what they're claiming they're going to be doing uh after the arm
purchase goes through turn set up one of these ones in Cambridge but not everybody listening
is probably an expert on on TensorFlow and and related to analogies maybe we should explain a little bit
why GPUs come in handy if you want to take a look at uh bad population networks in general and
TensorFlow in particular essentially the wet works tens of flow and the clues actually in the name
models a bad propagation network which essentially is a layered net of neurons
that have certain characteristics and that are able to recognize patterns because
this is what TensorFlow and PyTorch are all about um essentially what TensorFlow and friends do
they boil down this recognition to simple linear algebra operations given the fact that
tender is in layman's terms something very similar to a matrix or or vector
these operations can vary can be very efficiently executed on GPU because if you take a close look
of how shaders which are one of the central components of the GPU work all they do is essentially
is a linear algebra operations but very fast because these shaders have been designed with that
purpose in mind this is the reason why you see companies like Nvidia AMD and all the rest of them
moving away from simple graphics to actually GPU as in general purpose GPUs that are able to
power these artificial intelligence infrastructures left right and center you actually can see this
at in in the in the in the offerings of the hyperscalers you have TPUs at Google
do little else than just processing dancers because this is what they're known for
tender processing units GPUs and Microsoft so I think has something similar in and I reckon AWS
I think has two and I reckon it's only okay and it's only a matter of time if if Alibaba doesn't
have to act before Alibaba catches on yeah so the whole hardware field is quite interesting
as you say you have your general GPUs 10 Google on the TPUs there's a small startup in UK
called Graphcore which links in the show notes which have built their own
what they call IPU which is specifically designed for AI purposes as well so I mean the biggest
difference between you know we're all used to CPU processing and stuff and that biggest
difference is really that the basis of what both of the other very high level these two technologies
are built for which the CPU is built for latency and GPUs are built through put if you can
paralyze your problem you have a benefit of running on GPUs in terms whereas CPUs are all
about context switching and getting a faster response for whatever requests needs to happen
but yeah it's it's I mean sort of digressing slightly into the hardware field but
even things like FPJs are people are not just what I've come across over the last few years as
people are the whole surge of all we need to have standard but three eight six boxes for everything
and just build the software around it there are very specific benefits to be gained by using
hardware you know FPJs low latency financial trading even our friends at Amazon are are building a
are implementing FPGAs again for the rich of database we should we should probably work FGPAs
are Martin field program gateways yeah everybody knows this no okay sorry that's yeah links in the
show notes I'm sorry a field program will get away yes it's essentially piece of silicon
that can be easily modified based on your specific requirements think of it like a poor man CPU
but in contrast to Intel AMD and all the rest of them you can actually modify the execution
um sorry the the um what's what what's what the instructions have that they they're executing
yep then you have it so what normally CPUs do is essentially and you can and Intel is probably
the best example Intel has moved away from the digital system order to something called risk
but you don't see this because compendium it's all hidden under micro it's all hidden under
what's what I'm looking for um not micro code but
micro code right yeah so when you power up a CPU essentially it executes at the very core like a
risk architecture but it looks like from the outside like this architecture so the die understands
risk instructions and the rest is software so when you buy a core i5 i7 i3 these days or
an i9 you essentially buy a very sophisticated piece of risk architecture well hidden under a
cis shell that's what it works yeah so the thing is FGPAs take away this layer and if you the full
power of being able to program these somewhat simpler CPUs I wouldn't even call them CPUs
they're gate arrays right like like the CPU is but only in a much more custom way exactly
this is the way it is and FGPAs are not something new they have been around at least for the last
30 40 years only these age much more much more what's what I'm looking for
uh compressed but more integrated yes um yeah it's just surprising that the the whole story that
was always so to scale up by adding more hardware um there are benefits to be gained by using
specific hardware for certain problems right and of course the beauty is uh with with using
specialist hardware as Martin does mentioned um CPUs were designed for a particular purpose
to execute software very efficiently but due to that they have to cater for a right variety
of of use cases you actually see this in cisk cisk has a very enhanced name complex instructions
at architect i'm a complex structure set what's a c computing computing i have you find
a show notes um i think that comes to the end there if you take it i mean and that goes back
to the old mainframe days three seventy sorry three seventy and and slash system slash 36 had
quite a quite a large instruction set and i think it's back in the previous episode i think i
elaborated on that before about in the in the seventies somebody took a very close look at how
many how many instructions in in a cobalt compiler of sorts i actually executed from this
complex instruction set that was provided by argument frames and he came up with the with the
percentage of about 15 to 20 so hence risk was born and the first project that IBM did
was actually called rom where they took a mainframe instruction set and slimmed it down
to a very narrow instruction set and arm of acorn fame just took the concept further
reduced it in further along with rockwell 65 or two comes to mind all the rest of them
and at the end of the day something called arm advanced risk machines was born
and the rest is history with regards to android apple speaking of m1 had that has just been launched
recording this episode on the 13th of november so m1 exactly m1 has just hit the streets
so this is essentially the the idea behind risk so you take a very reduced instruction set put
into silicon but in contrast to cisque because there are not that many instructions to execute
you can do it very very much faster the idea is that you do the rest in terms of the remaining
complexity you shift this off to software you see this actually in the court generators of
comparators like c-lang or gcc where a lot of effort has flown into the optimization stages
when you actually specify a risk processor as the compilation target the same goes for
much more specialist hardware like tpu's like gpu's like other specialist hardware that is only
able to execute a very narrow scope of instructions but does it so very efficiently
hence this whole craze about gpu's tpu is not the rest of them where there's also the whole
underlying bill of the processor right now if you look at the images of them you can see why they are
and about the bills for these purposes and absolutely but we digress I think
nice maybe that's fine we can always turn the telephone back on yeah
no we can't the one okay back to um um um
um uh take a speech hmm yeah so I think we were discussing wavelength in front
yeah so so wavelength is the final part right to produce the waveform after the or uh after
the spectrogams have been um generated before that so I mean there's a lot of good papers
around this subject um before before before go for what's the spectrogram the spectrogram it's really
a an image of your well it's not an image you can represent it as an image but it's a it's a
representation of audio think of it as your over time you know so you can you can think about
okay so if you look at your audacity whatever it is whatever audio programming there is a you know
you can see your uh you can see the amplitude of your voice basically that's all you can see right
you can't see any other attributes that associate with that you can just see the volume level
pretty much right uh if you want to add other dimensions to that then that is really what a
male spectrogram is so you are adding more dimensions to uh a waveform than just um
I mean spectrogams obviously get around uh for other purposes right this is where the name
spectrum analyzer comes from and things are left but maybe maybe now is the time to explain
actually how you take a piece of text and you derive at something called this fast speech
you want to take this Martin or should I elaborate on this uh why don't you have a go
no problem if you take a look at normal text normal cortex composed of syllables these
syllables have a certain pronunciation i'm not linguist just claim a very important essentially
the way it works is you take these syllables and turn them into something called phonemes and
these phonemes are the basic building blocks for something called words and if you string words
together you have ultimately a sentence the things of course if you just take a phonem
in terms of if you take a piece of text and if you try to generate a waveform based on based
on the particular phonem it may sound very metallic very very very artificial this is what you see
with traditional TTS approaches and then you have and this is where the magic comes in
and then you have essentially artificial intelligence who similar to other domain specific
frameworks provide this essential feedback loop i.e. what you do you train a back propagation
network by having a TTS generate waveforms essentially sentences in in audio and then you tell
the network what the delta is between a human pronunciation of a sentence i.e. how a human would
utter this in English and German and French and Spanish whatever and what the TTS initially generated
if you do this often enough and this is what wave net and these approaches are all about
at the end of the day you do arrive at a very natural sounding text and text speeches is in
this and this is what the magic is essentially all about you tell the back propagation network
you tell the the neural network essentially how humans would pronounce that sort of thing
okay yeah that's that's um it's good summary good summary you need to break down the text
but then yeah the intricacies of a human speech are more than just converting words and that
is the sounds there are other attributes right which are in our hearts, volumes, speed,
intonation these kind of things and that's really what a mills-pactogram is is having many more
features associated with the inputs to a sound generator to create different voices to create
different waveforms out of that essentially so it's yeah in short you want to get to a stage where
you can have input to your output generator of your waveform generator which has many features
calculated by your network to give you you know different different voices, different
speeds, different pronunciations etc etc like you heard from our callers indeed
and catch to elaborate in Martin on how these callers got into existence
well you'd have to ask their parents I think and in fact maybe you should not go
okay guys what you heard of course were not really called there let me be and daddy be
Martin in about 20 seconds will elaborate on how he did it really I've
promised you as Martin yes okay there are a number of different ways to do this as I mentioned you
can buy his commercial software to do this stuff in fact what's if you have a bit of a poke around
you'll find a number of companies that are actually based on open source projects that have
implemented these kind of techniques the main one that's behind a lot of this is tachyotron
tachyotron 2 in fact it moved on they moved it forwards a couple of years ago
um it attacked on true 2 and this is really the the most advanced project out there now the
the biggest issue with all this is really calculating the model right there are some pre-trained
models out there not many so but if you want to go and do your own training you obviously need a
data set you need a lot of sound bytes to to do that and the more training you put into this
the better your results are going to be strange enough so I did embark on training tachyotron
to myself and soon found out that those are going to take in number of weeks on my on my my GPU
base laptop so you use a pre-trained model yeah pre-trained models are available not
yeah um they're not as you know google is not going to release their pre-trained models but there
are pre-trained models out there that people have run several weeks of stuff on so you can use
some of those to give you something the output first you the link in the show notes there are some
really impressive results of our google has done as outputs right so in I mean you may you may be
familiar with google's text to speech right it it has an API you can it's obviously based on all
the same background code but just the model is better they also have an option which is in beta
to have custom voices so this is kind of the next evolution of this is really uh being able to
learn a new uh adopt a new voice to to this so rather than feeding its hand bytes not many of them
and then it produces uh results based on those and and some people have said there are some
projects out there again things in show notes um uh google is is traveling as a beta in
data to speech but there's also companies like uh resemble the AI who have again taken
over sort of project and trying to monetize this there is um liar bird five bird what's it called
um liar bird again which was a sort of project uh commercialized show yeah there are there are
options right if you want to do yourself uh quite a bit of um a bit of work and uh planning the
right models planning the right just sticking to lab but for the moment but you you you'll find
actually in your standard you want to slash them a new repo simply install it i haven't
looked at the implementation but i thought it was actually just a waif for modulator does it actually
do a full speech recognition and then in turn a full tts i thought it was cha i thought it was just
yeah okay sorry uh correct the what the liar bird that you're talking about is is really uh the
one that you can uh deploy and you've been to download it whatever is uh untaraged install it
etc or just you can even install it from the repo as well yes you can at least on on 20 or
2010 so that's uh that's a voice changer right if that's exactly it yes yeah uh however if you
if you look up libert itself you'll find out a company like uh called the script has taken on this
as a commercial model which does voice cloning for example um and there's other other ones out
there's also projects out there that that that you can use to do that um which i haven't
included in this episode but i'm sure Chris will be volunteering for this one next time
basically you take someone's voice and then apply text to speech using that voice
which is always quite fun similar to your deep fake
a i's that where you can project your um someone's face on your videos okay but maybe that's
a topic for another episode indeed actually it's quite a bit difficult doing a podcast but
a deep fakes because we don't have any video no we don't do videos now we know
yes yeah okay let's describe that idea yeah all right cool so um what's left Martin to discuss on
this because i think it's a very interesting topic and i i suppose we could spend hours on the end
on this hmm well i think this is left is really where is this going right with all uh all things
a i um what are the applications and uh where do we see the future of these things right apart from
change fake phone calls coming into podcasts yes are we going to have fake phone calls
oh we already did Martin it's been noticed yes and it happened about an hour ago
oh i see so um i don't worry about it by you obviously pre-warned otherwise you wouldn't have
known whatever well yeah so application of this stuff what do you think well i mean that this
holds smartphone self-driving car then it comes to mind i mean a series little else on the cloud-based
format is right i mean you speak to it it recognizes your your your sentences it's uh does the
magic in the background then get back and then it gets back to you same goes for Alexa same goes for
what's what's the google thing called google good question home maybe google something
the big bell yeah and i reckon in about 10 or 15 years i think testar has already
so don't don't don't worry i don't don't forget cortana as well for micro lovers out there
yes okay um yeah leaving desktop sites i reckon in about 10 to 15 years time when you
just simply enter your self-driving car you may actually have this on board not just running
in the cloud but rather have the on board processing power to do this locally just a matter of time
and the the prices for for the likes of TPUs GPUs coming down to enable mass production that
would be my well i mean your i guess because though what he has GPUs and then right so that's
exactly i mean this that's what this is what this is at it at the end of the day and we're just
seeing the beginning i mean your honest smartphone these days has a six-foot-foot architecture
and at least eight cores that's a lot of bank for the back well they actually used out these
cores plus a different question then yeah um yep sorry Martin these are just these are just CPU
cores GPUs totally different matter and nevermind this surrounding SOC and if the m1 that has just
launched by by apple is anything to go by stay tuned i mean the m1 has you're gonna you're gonna
buy one well the thing is actually apple if you listen why don't you send this one maybe too actually
okay um do you want the wireless setup yeah i mean i haven't looked at the specs but it's
it's if current law is entity to go by you're looking at an eight core arm-based design with 16
threads and this is just the CPU GPU i think has about four threads if in a complete intersect now
might be correctly maybe eight plus and this is the interesting part apparently they put
image recognition into a special part of the SOC and Martin brace yourself you can get secure
in clavs enclave sorry enclave oh where did i hear that before uh well there was this anuna
company that partner with the company called Redis Labs about half a year ago but you see
where where Intel is still in in the in the research phase i think apple has done it in hardware and
ready to go now um oh sorry um for those this is secure enclave's consider it like a like a
mmu on speeds on steroids sorry secure enclave okay a mmu is essentially protecting page of memory
that's what a memory management unit is all about so you make sure that a process from a user from
from the operating system perspective cannot access the memory of another process that has been
around the block flight these 30 years i mean this is standard unique textbook stuff right
so it goes for mainframes in all the rest of them so but with the advent of viruses
root kits in all the rest of them that simply won't cut the mask that anymore so what secure
enclave that actually actually providing is a layer kind of underneath this where it can only
access a certain piece of memory if you have the right credentials so to speak in layman terms
the technical implementation is slightly more complicated we won't go to the details but you will
find links in the show notes but think of it like a very secure mmu that only allows you to access
certain piece of memory if you actually can prove that you are who you pretend to be which is
pretty amazing when you think about it because that will allow you to to do something called
hopper secure computing yeah where viruses and the look would have a very hard time to penetrate
other processes address spaces um yeah i'm just looking at the end one and you can buy this
not from Apple Apple as smart as smart and tech if you're listening okay if you send us two machines
yes we will review them and this is a promise i'm not very impressed with the
GPU that's why it's only got 2.6 teraplops of throughput
i mean my older RTX does it's seven and a half so it's like which entry are we in Apple come on
you can do better come back when you have at the m2 to us and if you look at the form factor
of this machine i thought we would chip no i'm talking about the four factor of something called
MacBook Pro 13 inch well i'll be 13 inch now you're missing something and the the graphic card
that you're referring to probably has the same volume as that whole as that whole laptop
no no no it's in my laptop sir yes it's in your laptop okay i have an RTX 2070 in my laptop
it's that laptop that you keep plugging around in your suitcase
well not at the moment no i'm uh i'm not trying to name it okay
okay well it has the added benefit that going going to the gym is also not required so
you need to lightweight Apple it's just you just wasting time you have to then go to the gym as
well it's like why are you doing man okay okay you have it you have you're fully your first
your first yes i made this it's not good yes okay um is there anything else we should mention about
tts or um i think the rest of the in this in this in this no yes in this just no no yes
along with your five grams of coke i might wear a few listening just send the just send the shipment
yes yes don't forget that 10 o'clock deadline by Pablo yeah you might be ringing in again
it's from the grey for something
okay very very uh to important um messages um an announcement yes yes feedback is always welcome
at of course yes back at the looks in those for you by telephone or email we would prefer email
actually come to think of it because feedback at the looks in those that you use not a valid for
number in case you're wondering and of course then there's always a next episode and this next
episode will be about Martin uh that is something you eat yes our famous our favorite no secret in
memory database so stay tuned for another episode of it looks in last accident or they're going to
explain how it's a database uh stay tuned
that's okay okay one second oh and non non non non non phone feedback right so hang up the phone
please Martin very important okay we have written in feedback yes um yes um yes um the
the league about affirmative action on pox wrote in and complained that we haven't done poxers and
anti-poxers for quite some time and of course they're spot on yeah so Martin that is this um this
this league is that something similar to the uh society for putting things on top of other things
something like that yeah so Martin what's your what's what's your pox of the week
like pox of the week and well it's not the apple m1 because I'm very unimpressed
the gp 4 that's uh pox the week that would be the end Martin's enter enter pox I suppose
hmm so uh pox uh the single and only pox of the week let me think for a second
there are quite a few um confine yourself to one in the interest of time
yeah so this is I need to pick on right um well yeah there's so many
which one okay let's have a think about our listeners what would they find most relevant
uh okay most relevant right here we go um pox of the week for relevancy is a company called
shell who has started an open source collaboration platform for the petrol company
indeed data science on geophysics indeed correct called the OSDU from a great mistaken
links I hope with with the show yes yes I can remember this one yeah um so yeah that that kind of
first surprised me a company in that shell was all about okay things of money but they actually
trying to bring in back in the community so nice one cool it's okay anti-pox or discussed uh my
pox would be called on the road by carook oh hey what is this what is this this is not open source
related correct but then the pox don't have to be open source related yeah did you know they don't
do now well um oh yes yes you did with the statues okay I've got you just have a hobby called
bit of slaps don't you yeah sorry pox as modern per definition can be anything
well we yes yes yes whatever is my point uh having this yes the hobby leaves you with a little
spare time right okay my pox of the week without going into this insert
is this a book called on the road by carook I think it's it's his name I don't know
I'm not sure about this first name let me double check well prepare to the pox absolutely jack
carook you find the links and show off ahead of him yes it's it's a masterly right what
else did he write he wrote panting else oh I he's age old isn't he wait wait wait wait
that book was published in the 50,000 did he not read the book about wolves and stuff
I know the car I know the car I know the car was done by
hmm step more for a step was done by
I remember jump over no it's actually no let's see book okay okay
now I'm just gonna look up the book that I was thinking of there's actually a bit something
I said wolf was done by a hesse actually so that's not what I'm referring to uh jack carook wrote
on the road essentially it's a it's a it's a it's a it's a road movie in the book uh you have
these posse of characters that keep crisscrossing the states in the late 40s uh lovecraft there's a lot
of shenanigans going on I thought this very inspirational just read it my entire pox would
be all we wait wait wait wait what what inspirational why because it just could you make
me don't need to make me just read uh the wipes of the time I suppose uh what was it it was
second world was just over and these people just are experiencing things let's put it this one
okay a lot of boozing above a lot of womanizing above just read the book it's a classic
I can speak now it's okay okay okay and your antipox yes my antipox would be a guy called Joe Biden
not sure if that rings about um it's supposed to be the 40 whatever seven
friend that president if I'm completely mistaken success of a guy called Donald J Trump
ah yes ma'am we have to get this guy on the show right isn't as in Donald Trump
we do why not well I don't know I don't think his five dollar contribution to the um
uh manipulation of election results services by yourself it's really a reason to put it on the show
anyway why says the antipox of course I mean we've heard as I said we're recording this on the
13th of of of November and now I think Joe Biden is talking in a 29 the uh both in the in the
lecture college of course that takes out that takes out all of the suspension about the further
process because by now it's very clear that never mind what the lion in in the White House tweets
tonight so why is Joe Biden antipox? Because as I said he just takes out of the suspension he just
of the whole thing because by now he's clear on and never mind what a certain guy called Donald
you were looking for his height right I got it no I'm not no okay not just from now it's just boring right
I mean it's clear who's wise before and before anyway that's not what it takes for you
but I'm not listening just call the chorus because you won't get anywhere full stop
and I spent this money on drugs still will mean that's other stuff yeah and don't know how many
U.S. presidents were assassinated again? four? four? I got two things first
Kennedy? Guelts? Oh loads no it can't be loads yeah and they're always knocking them off
you find the news and show notes
I mean surely you remember Lincoln as well that makes it to I don't know what you are to
wear but I'm sure the number four wings of bell and then the binson oh no Reagan he didn't die
did he I think someone tried to he got lucky um but yeah you couldn't imagine that by then
this next one so why let me buy you know what Tom sports I like it's uh I bet you're right
off beer and I'm not talking about this lukewarm piss called beer in the well like the other
craic of beer you haven't produced yet I want this bet of the beer call well that's
he was running the bed of slabs uh the bed is project right now um it's a better
yeah no it's a better one I won uh only only in pure theoretical but I won morally
to quote the certain master master if you're listening when you're coming on when you're coming on the
chairman to quote a certain master with a technique you've won this bet so you want to
try to scrap the term technically because I've won this bet so technically it has to stay in there
I'm sorry so you're all in creative beer and I'm more than happy to to wager you a second
crater of beer I see it's Biden that's Biden doing his first term and I'm just talking about the
first term will not be assassinated but about if he just sort of falls over and dies
that's a different story sorry but by assassinated I am of course not buying off a natural cause
well I know it's somewhat of a blurred night for enough yes it is but if somebody shoots Biden
that's not an initial cause right you see this is what I mean by a blurred line exactly
if you can prove oh sorry if I can prove that the Russians have actually poisoned Biden
that would be an assassination so you would only create a beer yes now Martin if you
know me it's quite a beer actually no no I just said that Biden won't be assassinated uh yes
no Martin if you manage to cover this up oh yeah
what are you talking about if Biden is assassinated you're every beer no if if Biden dies
and if you're behind this and if you managed it behind it no no no no no no no no no
natural death, you get that beer, no worries, makes sense. I'm not how to show off the message now,
if you're listening, Joe, we hope you live long and prosper. Yes, we do.
And Joe, let's have the kind of beer between three of us. You're invited. Yes, that's
yes. Okay. Yes. So these would be my boxes and boxes. Sorry, this will be a box for my
empty box. We've discussed Martin's boxes and empty box. And that actually now finally concludes
the show again. We'll be around next time. Yep. This is the Linux in-laws. You come for the
knowledge. But stay for the madness. Thank you for listening. This podcast is licensed under the
latest version of the creative comments license type attribution share like credits for the
intro music go to blue zeroesters for the songs of the market to twin flames for their
piece called the flow used for the second intros. And finally to the lesser ground for the songs
we just use by the dark side. You find these and other details licensed under cc hmando
or website dedicated to liberate the music industry from choking copyright legislation
and other crap concepts.
You have to send me the email this is of it and post production so I can
right back there. Why don't you contact me? I'll make a call in as well. Okay.
Let's just reach out to HR just again. Oh, yes. We got an HR anyway.
Well, this is the boxes. We don't have an Amazon address or something. I don't know. I mean,
you had them. You should know. I know it. I just had Dennis to the cast your expenses.
Oh, dear. Oh, dear. Okay. Oh, hang on. We haven't done.
One day. Resume the recording. Oh, damn, it's another one.
The outtakes. Pick it up, Mark. Sorry, we haven't. Oh, you say we haven't done what?
What are we now? No, no, no, no, the outtakes. I mean, just just pick up the phone.
Okay. Hello, Linda's indoors here. You're after Chris again. Okay. Here he comes.
This is the last I'm speaking. Yes. How can I help you? I don't know. You tell me.
How to put the corner through. Yes, but I thought you wanted them. There was a thing
that you wanted to do. Yeah, but without the phone. Without the phone. Okay. Sorry. Yes, sorry.
You've been listening to Hacker Public Radio at HackerPublicRadio.org.
We are a community podcast network that releases shows every weekday, Monday through Friday.
Today's show, like all our shows, was contributed by an HBR listener like yourself.
If you ever thought of recording a podcast, then click on our contributing to find out how
easy it really is. Hacker Public Radio was founded by the Digital Dog Pound and the
Infonomicon Computer Club and is part of the binary revolution at binrev.com.
If you have comments on today's show, please email the host directly, leave a comment on the website
or record a follow-up episode yourself. Unless otherwise status, today's show is released on
creative comments, attribution, share a light, 3.0 license.