Files

50 lines
3.1 KiB
Plaintext
Raw Permalink Normal View History

Episode: 3315
Title: HPR3315: tesseract optical character recognition
Source: https://hub.hackerpublicradio.org/ccdn.php?filename=/eps/hpr3315/hpr3315.mp3
Transcribed: 2025-10-24 20:38:04
---
This is Hacker Public Radio Episode 3,315 for Friday, the 16th of April 2021.
Tid's show is entitled Tesseract optical character recognition.
It is hosted by Ken Felon and is about two minutes long and carries a clean flag.
The summary is how to use this amazing doodle.
This episode of HBR is brought to you by an honesthost.com.
Get 15% discount on all shared hosting with the offer code HBR15.
That's HBR15.
Better web hosting that's honest and fair at An Honesthost.com.
Hi everybody, my name is Ken Felon and you're listening to another episode of Hacker Public Radio.
Today I want to talk to you about a tool that I keep forgetting about and it is an optical
character recognition tool that takes in an image file and then will convert that to text.
The brilliant thing about it is it's got loads of language support so if you're scanning
English documents it works out in the box with English.
Otherwise you can install different language packs.
For example, I've been running Tesseract-L-E-N-G.
The name of the image file, JPEG and the prefix for the extension and then that's it.
It'll just happily run and look at the optical characters, the funnels and try and
determine some text from that.
As you know, I've done a series here on scanning textbooks and stuff.
At the moment I'm doing of course myself so I scanned in the book and in order to help me learn
I have used this to convert the images into text blocks of text which I then get optical character
then text-to-speech tools like eSpeak, not like eSpeak, eSpeak to read back to me and that way I hear
what I'm learning and I see what I'm reading so that's very good too.
It also allows you to change languages as I said and it also figures out very complicated
column formats so that if you have, for example, a page with three different columns in it,
it'll it's intelligent enough to know that column 1 is here and then the text was underneath that
and from column 2 and the text from column 3 goes underneath that.
So very very cool little tool and hopefully if you have needed it's quite cool
for converting stuff to text for editing.
Okay, that's what it's talk to you later and tune in tomorrow for another exciting
episode of Hercopublic Radio.
You've been listening to HercopublicRadio at HercopublicRadio.org.
We are a community podcast network that releases shows every weekday, Monday through Friday.
Today's show, like all our shows, was contributed by an HBR listener like yourself.
If you ever thought of recording a podcast then click on our contributing
to find out how easy it really is.
Hercopublic Radio was founded by the digital dog pound and the infonomicon computer club
and is part of the binary revolution at binwreff.com.
If you have comments on today's show, please email the host directly, leave a comment on the website
or record a follow-up episode yourself.
Unless otherwise status, today's show is released on the creative comments,
attribution, share a light, 3.0 license.