Episode: 3315 Title: HPR3315: tesseract optical character recognition Source: https://hub.hackerpublicradio.org/ccdn.php?filename=/eps/hpr3315/hpr3315.mp3 Transcribed: 2025-10-24 20:38:04 --- This is Hacker Public Radio Episode 3,315 for Friday, the 16th of April 2021. Tid's show is entitled Tesseract optical character recognition. It is hosted by Ken Felon and is about two minutes long and carries a clean flag. The summary is how to use this amazing doodle. This episode of HBR is brought to you by an honesthost.com. Get 15% discount on all shared hosting with the offer code HBR15. That's HBR15. Better web hosting that's honest and fair at An Honesthost.com. Hi everybody, my name is Ken Felon and you're listening to another episode of Hacker Public Radio. Today I want to talk to you about a tool that I keep forgetting about and it is an optical character recognition tool that takes in an image file and then will convert that to text. The brilliant thing about it is it's got loads of language support so if you're scanning English documents it works out in the box with English. Otherwise you can install different language packs. For example, I've been running Tesseract-L-E-N-G. The name of the image file, JPEG and the prefix for the extension and then that's it. It'll just happily run and look at the optical characters, the funnels and try and determine some text from that. As you know, I've done a series here on scanning textbooks and stuff. At the moment I'm doing of course myself so I scanned in the book and in order to help me learn I have used this to convert the images into text blocks of text which I then get optical character then text-to-speech tools like eSpeak, not like eSpeak, eSpeak to read back to me and that way I hear what I'm learning and I see what I'm reading so that's very good too. It also allows you to change languages as I said and it also figures out very complicated column formats so that if you have, for example, a page with three different columns in it, it'll it's intelligent enough to know that column 1 is here and then the text was underneath that and from column 2 and the text from column 3 goes underneath that. So very very cool little tool and hopefully if you have needed it's quite cool for converting stuff to text for editing. Okay, that's what it's talk to you later and tune in tomorrow for another exciting episode of Hercopublic Radio. You've been listening to HercopublicRadio at HercopublicRadio.org. We are a community podcast network that releases shows every weekday, Monday through Friday. Today's show, like all our shows, was contributed by an HBR listener like yourself. If you ever thought of recording a podcast then click on our contributing to find out how easy it really is. Hercopublic Radio was founded by the digital dog pound and the infonomicon computer club and is part of the binary revolution at binwreff.com. If you have comments on today's show, please email the host directly, leave a comment on the website or record a follow-up episode yourself. Unless otherwise status, today's show is released on the creative comments, attribution, share a light, 3.0 license.