Episode: 3219 Title: HPR3219: Linux Inlaws S01E18: Voice Recognition and Text to Speech Source: https://hub.hackerpublicradio.org/ccdn.php?filename=/eps/hpr3219/hpr3219.mp3 Transcribed: 2025-10-24 19:04:38 --- This is Haka Public Radio episode 3,219 for Thursday, 3rd of December 2020. Today's show is entitled, Linux In-Law Ness Nero 2018. Voice recognition and text to speech and in part of the series, Linux In-Law, it is hosted by Monochrome and is about 77 minutes long and currently the next visit flag. The summary is, how to place fake prank calls into podcasts and what does TTS have to do with this. This episode of HPR is brought to you by archive.org. Support universal access to all knowledge by heading over to archive.org forward slash donate. This is Linux In-Law. A podcast on topics around free and open source software, any associated contraband, communism, the revolution in general and whatever else, fans is theoretical. Please note that this and other episodes may contain strong language, offensive humor, and other certainly not politically correct language. You have been warned. Our parents insisted on this disclaimer. Happy mum? That's the content is not suitable for consumption in the workplace, especially when played back on a speaker in an open plan office or similar environments. Any miners under the age of 35 or any pets including fluffy little killer bunnies, you trusted guide dog unless on speed and Q to T-Rexes or other associated dinosaurs. You may want to get in trouble now. Welcome to Linux In-Law. Season 1 episode IT. What? Sorry. Okay, cut. This is Linux In-Law. Season 1 episode 18. The one with the text to speech and the speech to text. Martin, how are things? Do I have a phone ringing Martin? I think that's your phone actually. Do you want to answer that? Why don't you pick it up Martin? I think it's for you. It's I think it's got your name on me. Thank you. Hello. Welcome to Rainbow Escorts. Do you speak with the doctor? Hello the doctor. How are you? Good evening, Chris. How are you? You're back again? Well, I couldn't say no to the sun. Thank you. Thank you. Thank you. Hello. Welcome to Rainbow Escorts. Do you speak with the doctor? Hello the doctor. How are you? Good evening Chris. How are you? You're back again? Welcome to Rainbow Escorts. See that again. I couldn't say no to the sun. What am I going to do for you? Listen. You're not going to blow that. You were going to blow your fist. That's it. You were going to blow That's it. The What, the what? You may blow after the what? What can I do for you? The what? the What, the what? Aren't you going to blow The what? The what? I am going to blow. You think Robert the thing is he German doesn't cut it right. That's not bad. I mean I get the I get the over on idea and that was actually quite good. But you see now the beauty comes into play when you basically combine this when you combine this with the proper i.e. not something stitched up. What do you mean by proper? Well, she wasn't responding to my statements. Okay, this is this is the next step. Okay, before we before we go any further listeners, maybe a little context this episode is about text speech and voice recognition. A mindset subject of long talk. That's why you think. Okay, I said it would be correct and fair enough. Something that Martin has been looking forward to for what half a year or something. Like we pretty much did the episode with the Terminator interview you a long time distance will recall it was I think someone in March of this year. It's in the show knows people look it up. So Martin has been nagging me ever since to do a special on text speech and speech recognition. And what you just heard was actually a impersonation by Emma or beloved Emma Emma if you're listening to this. No, it must be. Sorry, the attention. No, I wasn't paying attention to the fact. The art if you're listening, this is if you're listening, this is not you. This is just an impersonation. This is very important. Let's see. Okay, anyway, it's the only line. So it's the only line you haven't finished with there. Right. That's what I would now cost. I wish I must have a thousand euros in reach. Do you speak English too? For a long time, this is who cannot speak German. I'm just checking. We can consider English. I'm sure we can find some other listeners. You are about to call you. Are we going to provide a transcripts Martin because not everyone speaks German? Yeah, yeah. Perfect. So listeners, yes, they will be transcripts. I don't know why Martin picked German. I said, maybe Martin, you want to explain this? I can't help that the artist is German and wants to speak to you. I can't be a hold responsible for your actions. Well, yes, the rainbow escorts. Actually, if you could entice the artist to speak English, it wouldn't be bad for the rest of the audience. Okay, we can sort that at some point. In the meantime, I think, is that the phone again? It is the phone again. I think it's for you again. I'll pass it to you. Okay. Shall we go again? Are you not going to say hello? Hello. Sorry. Where am I? Where am I? Where am I? I don't know. I, I, I. Corner. Is this corner? You don't have telephone in Germany. What's going on here? It is welcome to the show, corner 19. You have reached Linux in-laws. The show, the phone in show, where anonymous callers can, can call in and talk about their recent shenanigans. I don't think this one is, is it one of those actually? Anyway, here we go. Here we go. Hello, Chris. This is Dennis, the CFO, an invoice across my desk yesterday. It is from a company called Rainbow Escorts. Now you know this is clearly not related to your Linux in law's work. I suggest you post it to reader slabs. And here they have more of a party there. Good fine. I thought we wanted to leave card and pass and ... Pass employees out of this. Redis claps. If we sing Love, it is that. There's no one in there. There's no one in there. This is not a tutor work. I don't know. I'm not neglecting. I'm not claiming expenses for dubious entertainment gigs. Okay. Have a suggestion from Dennis. I think anyway feel free to do that. Oh fucking yeah post-production this is going to be cut out right very walking hello this is little's in last the college oh welcome caller 21 what can I do for you? hello Chris this is Dennis the CFO again I need some clarification on this purchase order from a mrd trump for election result improvement services now I appreciate you are trying to boost the company funds but five dollars is not going to balance the books come back when you can bring me some proper six-figure purchase goodbye Dennis you want to hang up Dennis you want to take this up with the Russians the Russian yes well why don't you elaborate on the Russian well it's quite straightforward right they they did the previous well mastering of the electricity let's call it this way so there might as well pony up now right so this this is a completely erroneously posted invoice purchase order I see nothing related to your work gratitude and why does trump feature all of a sudden on the show I thought was an open source show Martin I can't help he's calling in it's just you know your your extra curricular activities obviously something to do with this okay let's bring on the next call a bit you think there's more code oh hey I mean maybe you're right yeah oh very goes again this is little's in the heart sorry you want to pick it up now not I'll pick it up yeah hello hello always for christian okay there you go I'll passing through this is this is living in law is called a 21 welcome to the show what can do for you ah a mute caller excellent this is manual in production disorder for a hundred kilos tomorrow you think I am a magician and I hope you have an actual customer for it this time as the boss I'm gonna be pleased Pablo old friend I thought you were dead this is my knowledge yeah I think it is so I didn't get that sorry man well you would be public son right yeah so I don't know if it's referring to some sort of man I don't sweat it the the cash is on the way don't worry if I understand Martin's email it's correctly that is damn Martin you want to turn off the show for a minute I don't know it's not uh yes we're not getting ran to do in my town we were doing it we're not reading this is limits in laws the college show for weirdos and other co-hosts this is called a 22 what can I do for you college 22 I don't understand this okay I thought you'd have some experience with all your dealings with them by the sound of it maybe I'm the second the working language of the cutter is actually English believe it or not it's the international operation so don't sweat it right I was done with the phone calls or what do they know you tell me um Martin you can disconnect this gadget you can actually I think there's another one yes pick it up please oh here we go again hello that was for Chris again okay yeah here he comes time to speak to the uh sorry yeah this is lose in laws the um college show for weirdos and other people then your college 23 go please go ahead oh this is Shannon you may not remember me but if I mentioned with sheen in social I think you will bring any bells now no school of nursing and midwifery make it a little okay okay it was in 1994 and we're both old now but you need to know this you have a daughter and I've run out of money to support her so send the ruler now I will send you my bank details I buy and thanks for the fun shout shout and just say on the line I'm terribly sorry but you do have to get in line the trouble is there are about 50 people waiting in front of you but no matter what you'll be served in about 20 years time so no sweat but thanks for the call anyway what I'm not impressed and the other mystery call a smart though we are the subject I think Shannon's still in the line actually all right Shannon go ahead sorry I didn't get that you are not paying me sorry I will pay you but as I said you have to get in line I'm afraid as enjoying the queue I didn't get enough that's enough for a chance Shannon I love you I've always loved you you want a special one but unfortunately as I said they're quite a few women in front of you so no sweat did you all have daughters as well too well for some I know for some I don't so that's the case okay you want to get to the bottom of this or should I just take more random callers oh it's up to you I can turn the phone off I feel like that would be a brilliant idea Mark all right let me let me do that otherwise yeah perfect it may not get round to the well there you go thank you the call center of course can take their details so they will be gotten back to them at some stage but they're sharing next week or the month after but eventually yes right I'm sure to be very happy with this well I mean I'm not sure about Pablo do you mentioned something about 10 tonight so you may want to deal with him a bit quicker and see hole in onto your bits of anatomy okay for enough Mark that has been more than impressive how to elaborate on how you did it well you know I just pick up the phone right oh dear that concludes episode 18 so I thought it was 20 but yes you're right 18 is no 20 would be some Christmas special 19 is coming up soon indeed yes and a big teaser that would be about relis fun enough but anyway about okay now with this phone get the tree out of the way do you want to get onto the main subject of the show yeah okay which is speech recognition and synthesis what are quite different subjects right I mean yes they are oh we can touch on the various acronyms first if you like there's obviously speech detects right we um you may be familiar with this if you're using say um google chat so learn that that has for example the option to uh transcribe what people are saying as you go along which is actually pretty decent let's say this is also known as speech recognition button there's also known as speech recognition yes yes but a speech record among friends well it's it's yeah speech but yes recognizes first to change the text what do we need to do with it but um uh with all the gadget in your house I don't know if you have Alexes in all these kind of nonsense all right that's obviously the first you know why should I well exactly exactly I mean uh for the list of time yeah for the list of small old enough i mean which record has been around for at least 40 years i've been around long with it's not in an artificial form no she would be a bit tricky computer based speech recognition now man no fun enough basically i first came came in touch with them with technology what i was working at a company called fidelity investments about 20 years 20 plus years ago when i was project managing um cti technology as a computer telephony integration and the record technology of of choice was then something called nuance which was quite advanced at the time because about more than 20 years ago actually that nuance technology could differentiate between Texan and East Coast accents which are from quite impressive yeah i mean if you um you know english as a language is obviously quite well-defined but um even in the UK we have a million different little uh ways of speaking differently the of course the thing is that wasn't English that was American yes indeed English American American English American English New Year yeah so yes has been around for a while but uh more recently become more um i know more popularized i guess with this you know uh cars devices in your home all these other things right yeah but i mean if you are such engine company that has got lucky and if you have a few billion of dollars at your disposal you might wear as well poured into fancy AI technology and speech recognition would be one of them i suppose yeah well they are so i mean uh are you referring to google here or yes i am so they just bought a company right in back in 2014 the bought DeepMind which was a UK startup and well they that keep buying their stuff is based on they keep buying they keep buying companies all the time i mean yeah so uh where were we um where this we were discussing the origins of speech recognition which of course back to the first projects i think were done in something called the MIT artificial intelligence lab back i think in the 60s we were understating but no no you were i mean i don't know if i may have mentioned this on on a previous show but um my father-in-law another in law was a uh a speech researcher at a UK um let's say government agency which we cannot mention but yeah it's it's been you know what's called blind blind it's from blindy park it's blithy park or something no i don't know i was the second one more one the um do you have you heard of Cheltenham um what have i have i have a lot of what of Cheltenham a town in the city british it's the next little keen snow no oh this is this is peace note milton keens does exist it's still i mean yes i mean if if you hear people talking about it doesn't exist no problem about the map go there they need the money especially after Brexit just make a point visit milton keens that that's the important thing milton keens town council if you want to get in touch about supporting the show yes by all means in the uh i thought that i thought that i was considering just celebrate to feedback as you know in law starting you and we will continue your your donation no sweat alternatively bring christ how do you here we go again um anyway so bring mom anyway yeah exactly sorry Cheltenham town uk um has a famous round building in it which you may be no for the man no okay right anyway this is where the um a uk listening happens on everything okay one a government might be interested in um milkept secret nobody knows where it is except there's a large round building so yeah so yeah anyway so why does this stuff around right because having humans is expensive and they don't work 24 hours a day so if you can automate this great okay um so lots of different um research has been done in this field there but i think rather than from don't know what you have found but i think google are probably the before running all of this um in terms of research and not so much open sourcing everything and but deep speech is open source right uh deep speech is open source but deep speech is absolute rubbish okay in comparison in that case you never mind you will you will find the link of the show notes yes yeah so you know if you if you want to use deep speech to to do transcribe your recordings uh good luck but it won't be anything usable in the um in fact what we can do is do a transcription of the show and see which is i'm just i'm tempted yes so um and yeah so advances in hardware um obviously these kind of problems are quite suitable to GPUs and video have jumped on this um quite heavily as well um um uh making there um you know because there is a large amount of of learning required for it's if you want to do it from um from deback that makes sense we want to do it from from uh if you just want to not buy a piece of software that does it for you you want to run uh you run project um there are some pre-trained models out there um but you will not get the same results as and as a google right that that's uh the main answer so you know these google's have the resources they have the hardware to do millions and millions of hours of training to get it to the stage that you know they can do that it's kind of level which is is reasonably uh impressive but so i'm still i mean correct me from wrong but i think deep speech is based on something because tons of flow so okay so deep speech well deep speech is not actually um text speech five deep speech is the around um and it's speech synth is essentially there are these features there the project that takes a uh an audio file and transcribe it so it's speech and condition okay yeah yeah sorry we were talking about various acronyms what we saw so one thing is obviously um one way around right you want to be able to uh uh turn speech into texts which is handy for um you know your search engines your algorithms they are a bit tricky to run on um on soundbites so if you transcribe into text you can have tanks and you can obviously search things easier and so on so so one of the reasons to go and um and also i mean you work in sales they're new at a certain company um i think i do yeah and a lot of these organizations are starting to record conversations with customers and transcribing them okay to you know feed into their sales forces and whatever they used which obviously kind of uh yeah begs the question of legality of these kind of practices but it is a practice those seems to be quite prevalent in your modern sales company were you not familiar with that okay well i heard about the concept yes but that's it i don't know um i suppose second yeah yeah yeah yeah yeah yeah yeah yeah yeah yeah yeah yeah yeah yeah yeah i mean a special edition of course does have its merits if you're talking about bots and stuff because there are two principles where they can interact with a bot you can type in stuff in a window or you can actually speak to it well it's it's bots it's uh you know your things around a house that you want to turn your light on or stuff like that if you don't want to get up and press button um but also in cars right where people are actually um driving they should keep their hands on the wheel unless their car is driving poor but it's the whole different question altogether so there are some advantages of um i mean given the fact that deep speech is actually if that's correct but um let's assume for the moment it is uh based on tender flow it would be just another example of something called a domain-specific framework like carrots for image recognition deep speech for speechbreakable so sorry um for the listeners who don't know what tender flow is it's essentially a back propagation uh network infrastructure where you can train your back your your back propagation network in order to to recognize patterns because this is basically at the end of the day what these um domain-specific frameworks fall down to you essentially recognize patterns whether it's something in a picture that might be a human that might be a laptop that might be a smartphone or whether it's a waveform or whether it's a it's a it's a sorry um i'm missing the i'm i'm missing the theory here whether it's a diphthong it's is that what i'm looking for it's essentially a component of a of a sound or a sound that makes a language and again that will be a pattern and a waveform that a back propagation network simply picks up yeah yeah yeah yeah yeah yeah so uh i'm okay so i mean if you mentioned tens flow you have to mention PyTorch as well right so there's other two uh tens flow of the originating from google um PyTorch being the other library that is used for these cover purposes um and um yeah so the whole uh there's different ways to doing uh if we sorry we finished but if we're going on to text the speech for example there are different ways of doing it um if you think about um say uh do you have ever used an apple phone to have a what an apple telephone oh no you don't have telephones in Germany sorry i forgot yes um no we just use mobile and we handy it's handy yes we did away with the fixed lines it's the handy yeah who came up with that name i do probably some dashkai i can't remember we don't call them handy's in holland we don't know it's pure German thing you know you can't blame us for that um anyway so uh German inventor of the name German name for telephone please let's say here we go okay oh maybe here we'll bring into the oh no here we go again anyway where will we so um yeah that happened i would have to plug it back in don't mind don't turn off any interest of uh finishing the episode maybe i'll just be redirected to your number that's pretty much better idea than having me and this calls for some automation and there won't be an extended burden of the show with all the sound parts that we cut out because that's where it is post production or not yeah um okay so if we're talking about the uh Texas speech partner um there's two ways of of people have approached this well probably more than two but the most uh two most used ones it seems are um uh like i mean so so my reference to uh apple telephones was Siri obviously right so Siri um uses a um a way to generate sound by sticking individual syllables together so it has sounds um four pieces of words right so whichever language if you you can cut it into into small little pieces of sound bytes and then uh Siri just kind of sticks these together right so that's one way of doing it which is completely different to um uh to wave net which is really well powers google's uh tts um on the synthesis side so where the actual uh audio is being produced so you have yeah um which which is using neural neural networks instead um um to create this speech uh and it's it's really um doing creating something called a male spectrogram which is really uh your uh your representation of um your audio um in a uh in a neuron let's call it that but um so it's it's it's a more um yeah uh a more crude approach compared to well crude is using the power off of machine learning rather than just saying a uh doing a rule-based type approach and i'll have a lot a lot of bit meal bits of words and i'll stick them together and i'll make sentences and stuff um yeah so uh google most advanced in this it seems um the likes of as i mentioned Nvidia they have a lot of uh you know they they're trying to sell their GPUs right and they're trying to sell their GPUs not just to people playing games but also because there's only so many um uh desktop computers or even um laptops with Nvidia GPUs they can sell whereas in the enterprise if you can apply or sell GPUs there then you can stick multiple on the machine and then you can make them bigger and faster and people can buy even more of them they um they even have a uh or uh portable data center where you can plug bunches of these GPU machines together and great is of uh a great big GPU lamp which is what are they sorry we're intending to do uh or what they're claiming they're going to be doing uh after the arm purchase goes through turn set up one of these ones in Cambridge but not everybody listening is probably an expert on on TensorFlow and and related to analogies maybe we should explain a little bit why GPUs come in handy if you want to take a look at uh bad population networks in general and TensorFlow in particular essentially the wet works tens of flow and the clues actually in the name models a bad propagation network which essentially is a layered net of neurons that have certain characteristics and that are able to recognize patterns because this is what TensorFlow and PyTorch are all about um essentially what TensorFlow and friends do they boil down this recognition to simple linear algebra operations given the fact that tender is in layman's terms something very similar to a matrix or or vector these operations can vary can be very efficiently executed on GPU because if you take a close look of how shaders which are one of the central components of the GPU work all they do is essentially is a linear algebra operations but very fast because these shaders have been designed with that purpose in mind this is the reason why you see companies like Nvidia AMD and all the rest of them moving away from simple graphics to actually GPU as in general purpose GPUs that are able to power these artificial intelligence infrastructures left right and center you actually can see this at in in the in the in the offerings of the hyperscalers you have TPUs at Google do little else than just processing dancers because this is what they're known for tender processing units GPUs and Microsoft so I think has something similar in and I reckon AWS I think has two and I reckon it's only okay and it's only a matter of time if if Alibaba doesn't have to act before Alibaba catches on yeah so the whole hardware field is quite interesting as you say you have your general GPUs 10 Google on the TPUs there's a small startup in UK called Graphcore which links in the show notes which have built their own what they call IPU which is specifically designed for AI purposes as well so I mean the biggest difference between you know we're all used to CPU processing and stuff and that biggest difference is really that the basis of what both of the other very high level these two technologies are built for which the CPU is built for latency and GPUs are built through put if you can paralyze your problem you have a benefit of running on GPUs in terms whereas CPUs are all about context switching and getting a faster response for whatever requests needs to happen but yeah it's it's I mean sort of digressing slightly into the hardware field but even things like FPJs are people are not just what I've come across over the last few years as people are the whole surge of all we need to have standard but three eight six boxes for everything and just build the software around it there are very specific benefits to be gained by using hardware you know FPJs low latency financial trading even our friends at Amazon are are building a are implementing FPGAs again for the rich of database we should we should probably work FGPAs are Martin field program gateways yeah everybody knows this no okay sorry that's yeah links in the show notes I'm sorry a field program will get away yes it's essentially piece of silicon that can be easily modified based on your specific requirements think of it like a poor man CPU but in contrast to Intel AMD and all the rest of them you can actually modify the execution um sorry the the um what's what what's what the instructions have that they they're executing yep then you have it so what normally CPUs do is essentially and you can and Intel is probably the best example Intel has moved away from the digital system order to something called risk but you don't see this because compendium it's all hidden under micro it's all hidden under what's what I'm looking for um not micro code but micro code right yeah so when you power up a CPU essentially it executes at the very core like a risk architecture but it looks like from the outside like this architecture so the die understands risk instructions and the rest is software so when you buy a core i5 i7 i3 these days or an i9 you essentially buy a very sophisticated piece of risk architecture well hidden under a cis shell that's what it works yeah so the thing is FGPAs take away this layer and if you the full power of being able to program these somewhat simpler CPUs I wouldn't even call them CPUs they're gate arrays right like like the CPU is but only in a much more custom way exactly this is the way it is and FGPAs are not something new they have been around at least for the last 30 40 years only these age much more much more what's what I'm looking for uh compressed but more integrated yes um yeah it's just surprising that the the whole story that was always so to scale up by adding more hardware um there are benefits to be gained by using specific hardware for certain problems right and of course the beauty is uh with with using specialist hardware as Martin does mentioned um CPUs were designed for a particular purpose to execute software very efficiently but due to that they have to cater for a right variety of of use cases you actually see this in cisk cisk has a very enhanced name complex instructions at architect i'm a complex structure set what's a c computing computing i have you find a show notes um i think that comes to the end there if you take it i mean and that goes back to the old mainframe days three seventy sorry three seventy and and slash system slash 36 had quite a quite a large instruction set and i think it's back in the previous episode i think i elaborated on that before about in the in the seventies somebody took a very close look at how many how many instructions in in a cobalt compiler of sorts i actually executed from this complex instruction set that was provided by argument frames and he came up with the with the percentage of about 15 to 20 so hence risk was born and the first project that IBM did was actually called rom where they took a mainframe instruction set and slimmed it down to a very narrow instruction set and arm of acorn fame just took the concept further reduced it in further along with rockwell 65 or two comes to mind all the rest of them and at the end of the day something called arm advanced risk machines was born and the rest is history with regards to android apple speaking of m1 had that has just been launched recording this episode on the 13th of november so m1 exactly m1 has just hit the streets so this is essentially the the idea behind risk so you take a very reduced instruction set put into silicon but in contrast to cisque because there are not that many instructions to execute you can do it very very much faster the idea is that you do the rest in terms of the remaining complexity you shift this off to software you see this actually in the court generators of comparators like c-lang or gcc where a lot of effort has flown into the optimization stages when you actually specify a risk processor as the compilation target the same goes for much more specialist hardware like tpu's like gpu's like other specialist hardware that is only able to execute a very narrow scope of instructions but does it so very efficiently hence this whole craze about gpu's tpu is not the rest of them where there's also the whole underlying bill of the processor right now if you look at the images of them you can see why they are and about the bills for these purposes and absolutely but we digress I think nice maybe that's fine we can always turn the telephone back on yeah no we can't the one okay back to um um um um uh take a speech hmm yeah so I think we were discussing wavelength in front yeah so so wavelength is the final part right to produce the waveform after the or uh after the spectrogams have been um generated before that so I mean there's a lot of good papers around this subject um before before before go for what's the spectrogram the spectrogram it's really a an image of your well it's not an image you can represent it as an image but it's a it's a representation of audio think of it as your over time you know so you can you can think about okay so if you look at your audacity whatever it is whatever audio programming there is a you know you can see your uh you can see the amplitude of your voice basically that's all you can see right you can't see any other attributes that associate with that you can just see the volume level pretty much right uh if you want to add other dimensions to that then that is really what a male spectrogram is so you are adding more dimensions to uh a waveform than just um I mean spectrogams obviously get around uh for other purposes right this is where the name spectrum analyzer comes from and things are left but maybe maybe now is the time to explain actually how you take a piece of text and you derive at something called this fast speech you want to take this Martin or should I elaborate on this uh why don't you have a go no problem if you take a look at normal text normal cortex composed of syllables these syllables have a certain pronunciation i'm not linguist just claim a very important essentially the way it works is you take these syllables and turn them into something called phonemes and these phonemes are the basic building blocks for something called words and if you string words together you have ultimately a sentence the things of course if you just take a phonem in terms of if you take a piece of text and if you try to generate a waveform based on based on the particular phonem it may sound very metallic very very very artificial this is what you see with traditional TTS approaches and then you have and this is where the magic comes in and then you have essentially artificial intelligence who similar to other domain specific frameworks provide this essential feedback loop i.e. what you do you train a back propagation network by having a TTS generate waveforms essentially sentences in in audio and then you tell the network what the delta is between a human pronunciation of a sentence i.e. how a human would utter this in English and German and French and Spanish whatever and what the TTS initially generated if you do this often enough and this is what wave net and these approaches are all about at the end of the day you do arrive at a very natural sounding text and text speeches is in this and this is what the magic is essentially all about you tell the back propagation network you tell the the neural network essentially how humans would pronounce that sort of thing okay yeah that's that's um it's good summary good summary you need to break down the text but then yeah the intricacies of a human speech are more than just converting words and that is the sounds there are other attributes right which are in our hearts, volumes, speed, intonation these kind of things and that's really what a mills-pactogram is is having many more features associated with the inputs to a sound generator to create different voices to create different waveforms out of that essentially so it's yeah in short you want to get to a stage where you can have input to your output generator of your waveform generator which has many features calculated by your network to give you you know different different voices, different speeds, different pronunciations etc etc like you heard from our callers indeed and catch to elaborate in Martin on how these callers got into existence well you'd have to ask their parents I think and in fact maybe you should not go okay guys what you heard of course were not really called there let me be and daddy be Martin in about 20 seconds will elaborate on how he did it really I've promised you as Martin yes okay there are a number of different ways to do this as I mentioned you can buy his commercial software to do this stuff in fact what's if you have a bit of a poke around you'll find a number of companies that are actually based on open source projects that have implemented these kind of techniques the main one that's behind a lot of this is tachyotron tachyotron 2 in fact it moved on they moved it forwards a couple of years ago um it attacked on true 2 and this is really the the most advanced project out there now the the biggest issue with all this is really calculating the model right there are some pre-trained models out there not many so but if you want to go and do your own training you obviously need a data set you need a lot of sound bytes to to do that and the more training you put into this the better your results are going to be strange enough so I did embark on training tachyotron to myself and soon found out that those are going to take in number of weeks on my on my my GPU base laptop so you use a pre-trained model yeah pre-trained models are available not yeah um they're not as you know google is not going to release their pre-trained models but there are pre-trained models out there that people have run several weeks of stuff on so you can use some of those to give you something the output first you the link in the show notes there are some really impressive results of our google has done as outputs right so in I mean you may you may be familiar with google's text to speech right it it has an API you can it's obviously based on all the same background code but just the model is better they also have an option which is in beta to have custom voices so this is kind of the next evolution of this is really uh being able to learn a new uh adopt a new voice to to this so rather than feeding its hand bytes not many of them and then it produces uh results based on those and and some people have said there are some projects out there again things in show notes um uh google is is traveling as a beta in data to speech but there's also companies like uh resemble the AI who have again taken over sort of project and trying to monetize this there is um liar bird five bird what's it called um liar bird again which was a sort of project uh commercialized show yeah there are there are options right if you want to do yourself uh quite a bit of um a bit of work and uh planning the right models planning the right just sticking to lab but for the moment but you you you'll find actually in your standard you want to slash them a new repo simply install it i haven't looked at the implementation but i thought it was actually just a waif for modulator does it actually do a full speech recognition and then in turn a full tts i thought it was cha i thought it was just yeah okay sorry uh correct the what the liar bird that you're talking about is is really uh the one that you can uh deploy and you've been to download it whatever is uh untaraged install it etc or just you can even install it from the repo as well yes you can at least on on 20 or 2010 so that's uh that's a voice changer right if that's exactly it yes yeah uh however if you if you look up libert itself you'll find out a company like uh called the script has taken on this as a commercial model which does voice cloning for example um and there's other other ones out there's also projects out there that that that you can use to do that um which i haven't included in this episode but i'm sure Chris will be volunteering for this one next time basically you take someone's voice and then apply text to speech using that voice which is always quite fun similar to your deep fake a i's that where you can project your um someone's face on your videos okay but maybe that's a topic for another episode indeed actually it's quite a bit difficult doing a podcast but a deep fakes because we don't have any video no we don't do videos now we know yes yeah okay let's describe that idea yeah all right cool so um what's left Martin to discuss on this because i think it's a very interesting topic and i i suppose we could spend hours on the end on this hmm well i think this is left is really where is this going right with all uh all things a i um what are the applications and uh where do we see the future of these things right apart from change fake phone calls coming into podcasts yes are we going to have fake phone calls oh we already did Martin it's been noticed yes and it happened about an hour ago oh i see so um i don't worry about it by you obviously pre-warned otherwise you wouldn't have known whatever well yeah so application of this stuff what do you think well i mean that this holds smartphone self-driving car then it comes to mind i mean a series little else on the cloud-based format is right i mean you speak to it it recognizes your your your sentences it's uh does the magic in the background then get back and then it gets back to you same goes for Alexa same goes for what's what's the google thing called google good question home maybe google something the big bell yeah and i reckon in about 10 or 15 years i think testar has already so don't don't don't worry i don't don't forget cortana as well for micro lovers out there yes okay um yeah leaving desktop sites i reckon in about 10 to 15 years time when you just simply enter your self-driving car you may actually have this on board not just running in the cloud but rather have the on board processing power to do this locally just a matter of time and the the prices for for the likes of TPUs GPUs coming down to enable mass production that would be my well i mean your i guess because though what he has GPUs and then right so that's exactly i mean this that's what this is what this is at it at the end of the day and we're just seeing the beginning i mean your honest smartphone these days has a six-foot-foot architecture and at least eight cores that's a lot of bank for the back well they actually used out these cores plus a different question then yeah um yep sorry Martin these are just these are just CPU cores GPUs totally different matter and nevermind this surrounding SOC and if the m1 that has just launched by by apple is anything to go by stay tuned i mean the m1 has you're gonna you're gonna buy one well the thing is actually apple if you listen why don't you send this one maybe too actually okay um do you want the wireless setup yeah i mean i haven't looked at the specs but it's it's if current law is entity to go by you're looking at an eight core arm-based design with 16 threads and this is just the CPU GPU i think has about four threads if in a complete intersect now might be correctly maybe eight plus and this is the interesting part apparently they put image recognition into a special part of the SOC and Martin brace yourself you can get secure in clavs enclave sorry enclave oh where did i hear that before uh well there was this anuna company that partner with the company called Redis Labs about half a year ago but you see where where Intel is still in in the in the research phase i think apple has done it in hardware and ready to go now um oh sorry um for those this is secure enclave's consider it like a like a mmu on speeds on steroids sorry secure enclave okay a mmu is essentially protecting page of memory that's what a memory management unit is all about so you make sure that a process from a user from from the operating system perspective cannot access the memory of another process that has been around the block flight these 30 years i mean this is standard unique textbook stuff right so it goes for mainframes in all the rest of them so but with the advent of viruses root kits in all the rest of them that simply won't cut the mask that anymore so what secure enclave that actually actually providing is a layer kind of underneath this where it can only access a certain piece of memory if you have the right credentials so to speak in layman terms the technical implementation is slightly more complicated we won't go to the details but you will find links in the show notes but think of it like a very secure mmu that only allows you to access certain piece of memory if you actually can prove that you are who you pretend to be which is pretty amazing when you think about it because that will allow you to to do something called hopper secure computing yeah where viruses and the look would have a very hard time to penetrate other processes address spaces um yeah i'm just looking at the end one and you can buy this not from Apple Apple as smart as smart and tech if you're listening okay if you send us two machines yes we will review them and this is a promise i'm not very impressed with the GPU that's why it's only got 2.6 teraplops of throughput i mean my older RTX does it's seven and a half so it's like which entry are we in Apple come on you can do better come back when you have at the m2 to us and if you look at the form factor of this machine i thought we would chip no i'm talking about the four factor of something called MacBook Pro 13 inch well i'll be 13 inch now you're missing something and the the graphic card that you're referring to probably has the same volume as that whole as that whole laptop no no no it's in my laptop sir yes it's in your laptop okay i have an RTX 2070 in my laptop it's that laptop that you keep plugging around in your suitcase well not at the moment no i'm uh i'm not trying to name it okay okay well it has the added benefit that going going to the gym is also not required so you need to lightweight Apple it's just you just wasting time you have to then go to the gym as well it's like why are you doing man okay okay you have it you have you're fully your first your first yes i made this it's not good yes okay um is there anything else we should mention about tts or um i think the rest of the in this in this in this no yes in this just no no yes along with your five grams of coke i might wear a few listening just send the just send the shipment yes yes don't forget that 10 o'clock deadline by Pablo yeah you might be ringing in again it's from the grey for something okay very very uh to important um messages um an announcement yes yes feedback is always welcome at of course yes back at the looks in those for you by telephone or email we would prefer email actually come to think of it because feedback at the looks in those that you use not a valid for number in case you're wondering and of course then there's always a next episode and this next episode will be about Martin uh that is something you eat yes our famous our favorite no secret in memory database so stay tuned for another episode of it looks in last accident or they're going to explain how it's a database uh stay tuned that's okay okay one second oh and non non non non non phone feedback right so hang up the phone please Martin very important okay we have written in feedback yes um yes um yes um the the league about affirmative action on pox wrote in and complained that we haven't done poxers and anti-poxers for quite some time and of course they're spot on yeah so Martin that is this um this this league is that something similar to the uh society for putting things on top of other things something like that yeah so Martin what's your what's what's your pox of the week like pox of the week and well it's not the apple m1 because I'm very unimpressed the gp 4 that's uh pox the week that would be the end Martin's enter enter pox I suppose hmm so uh pox uh the single and only pox of the week let me think for a second there are quite a few um confine yourself to one in the interest of time yeah so this is I need to pick on right um well yeah there's so many which one okay let's have a think about our listeners what would they find most relevant uh okay most relevant right here we go um pox of the week for relevancy is a company called shell who has started an open source collaboration platform for the petrol company indeed data science on geophysics indeed correct called the OSDU from a great mistaken links I hope with with the show yes yes I can remember this one yeah um so yeah that that kind of first surprised me a company in that shell was all about okay things of money but they actually trying to bring in back in the community so nice one cool it's okay anti-pox or discussed uh my pox would be called on the road by carook oh hey what is this what is this this is not open source related correct but then the pox don't have to be open source related yeah did you know they don't do now well um oh yes yes you did with the statues okay I've got you just have a hobby called bit of slaps don't you yeah sorry pox as modern per definition can be anything well we yes yes yes whatever is my point uh having this yes the hobby leaves you with a little spare time right okay my pox of the week without going into this insert is this a book called on the road by carook I think it's it's his name I don't know I'm not sure about this first name let me double check well prepare to the pox absolutely jack carook you find the links and show off ahead of him yes it's it's a masterly right what else did he write he wrote panting else oh I he's age old isn't he wait wait wait wait that book was published in the 50,000 did he not read the book about wolves and stuff I know the car I know the car I know the car was done by hmm step more for a step was done by I remember jump over no it's actually no let's see book okay okay now I'm just gonna look up the book that I was thinking of there's actually a bit something I said wolf was done by a hesse actually so that's not what I'm referring to uh jack carook wrote on the road essentially it's a it's a it's a it's a it's a road movie in the book uh you have these posse of characters that keep crisscrossing the states in the late 40s uh lovecraft there's a lot of shenanigans going on I thought this very inspirational just read it my entire pox would be all we wait wait wait wait what what inspirational why because it just could you make me don't need to make me just read uh the wipes of the time I suppose uh what was it it was second world was just over and these people just are experiencing things let's put it this one okay a lot of boozing above a lot of womanizing above just read the book it's a classic I can speak now it's okay okay okay and your antipox yes my antipox would be a guy called Joe Biden not sure if that rings about um it's supposed to be the 40 whatever seven friend that president if I'm completely mistaken success of a guy called Donald J Trump ah yes ma'am we have to get this guy on the show right isn't as in Donald Trump we do why not well I don't know I don't think his five dollar contribution to the um uh manipulation of election results services by yourself it's really a reason to put it on the show anyway why says the antipox of course I mean we've heard as I said we're recording this on the 13th of of of November and now I think Joe Biden is talking in a 29 the uh both in the in the lecture college of course that takes out that takes out all of the suspension about the further process because by now it's very clear that never mind what the lion in in the White House tweets tonight so why is Joe Biden antipox? Because as I said he just takes out of the suspension he just of the whole thing because by now he's clear on and never mind what a certain guy called Donald you were looking for his height right I got it no I'm not no okay not just from now it's just boring right I mean it's clear who's wise before and before anyway that's not what it takes for you but I'm not listening just call the chorus because you won't get anywhere full stop and I spent this money on drugs still will mean that's other stuff yeah and don't know how many U.S. presidents were assassinated again? four? four? I got two things first Kennedy? Guelts? Oh loads no it can't be loads yeah and they're always knocking them off you find the news and show notes I mean surely you remember Lincoln as well that makes it to I don't know what you are to wear but I'm sure the number four wings of bell and then the binson oh no Reagan he didn't die did he I think someone tried to he got lucky um but yeah you couldn't imagine that by then this next one so why let me buy you know what Tom sports I like it's uh I bet you're right off beer and I'm not talking about this lukewarm piss called beer in the well like the other craic of beer you haven't produced yet I want this bet of the beer call well that's he was running the bed of slabs uh the bed is project right now um it's a better yeah no it's a better one I won uh only only in pure theoretical but I won morally to quote the certain master master if you're listening when you're coming on when you're coming on the chairman to quote a certain master with a technique you've won this bet so you want to try to scrap the term technically because I've won this bet so technically it has to stay in there I'm sorry so you're all in creative beer and I'm more than happy to to wager you a second crater of beer I see it's Biden that's Biden doing his first term and I'm just talking about the first term will not be assassinated but about if he just sort of falls over and dies that's a different story sorry but by assassinated I am of course not buying off a natural cause well I know it's somewhat of a blurred night for enough yes it is but if somebody shoots Biden that's not an initial cause right you see this is what I mean by a blurred line exactly if you can prove oh sorry if I can prove that the Russians have actually poisoned Biden that would be an assassination so you would only create a beer yes now Martin if you know me it's quite a beer actually no no I just said that Biden won't be assassinated uh yes no Martin if you manage to cover this up oh yeah what are you talking about if Biden is assassinated you're every beer no if if Biden dies and if you're behind this and if you managed it behind it no no no no no no no no no natural death, you get that beer, no worries, makes sense. I'm not how to show off the message now, if you're listening, Joe, we hope you live long and prosper. Yes, we do. And Joe, let's have the kind of beer between three of us. You're invited. Yes, that's yes. Okay. Yes. So these would be my boxes and boxes. Sorry, this will be a box for my empty box. We've discussed Martin's boxes and empty box. And that actually now finally concludes the show again. We'll be around next time. Yep. This is the Linux in-laws. You come for the knowledge. But stay for the madness. Thank you for listening. This podcast is licensed under the latest version of the creative comments license type attribution share like credits for the intro music go to blue zeroesters for the songs of the market to twin flames for their piece called the flow used for the second intros. And finally to the lesser ground for the songs we just use by the dark side. You find these and other details licensed under cc hmando or website dedicated to liberate the music industry from choking copyright legislation and other crap concepts. You have to send me the email this is of it and post production so I can right back there. Why don't you contact me? I'll make a call in as well. Okay. Let's just reach out to HR just again. Oh, yes. We got an HR anyway. Well, this is the boxes. We don't have an Amazon address or something. I don't know. I mean, you had them. You should know. I know it. I just had Dennis to the cast your expenses. Oh, dear. Oh, dear. Okay. Oh, hang on. We haven't done. One day. Resume the recording. Oh, damn, it's another one. The outtakes. Pick it up, Mark. Sorry, we haven't. Oh, you say we haven't done what? What are we now? No, no, no, no, the outtakes. I mean, just just pick up the phone. Okay. Hello, Linda's indoors here. You're after Chris again. Okay. Here he comes. This is the last I'm speaking. Yes. How can I help you? I don't know. You tell me. How to put the corner through. Yes, but I thought you wanted them. There was a thing that you wanted to do. Yeah, but without the phone. Without the phone. Okay. Sorry. Yes, sorry. You've been listening to Hacker Public Radio at HackerPublicRadio.org. We are a community podcast network that releases shows every weekday, Monday through Friday. Today's show, like all our shows, was contributed by an HBR listener like yourself. If you ever thought of recording a podcast, then click on our contributing to find out how easy it really is. Hacker Public Radio was founded by the Digital Dog Pound and the Infonomicon Computer Club and is part of the binary revolution at binrev.com. If you have comments on today's show, please email the host directly, leave a comment on the website or record a follow-up episode yourself. Unless otherwise status, today's show is released on creative comments, attribution, share a light, 3.0 license.