- MCP server with stdio transport for local use - Search episodes, transcripts, hosts, and series - 4,511 episodes with metadata and transcripts - Data loader with in-memory JSON storage 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
50 lines
3.1 KiB
Plaintext
50 lines
3.1 KiB
Plaintext
Episode: 3315
|
|
Title: HPR3315: tesseract optical character recognition
|
|
Source: https://hub.hackerpublicradio.org/ccdn.php?filename=/eps/hpr3315/hpr3315.mp3
|
|
Transcribed: 2025-10-24 20:38:04
|
|
|
|
---
|
|
|
|
This is Hacker Public Radio Episode 3,315 for Friday, the 16th of April 2021.
|
|
Tid's show is entitled Tesseract optical character recognition.
|
|
It is hosted by Ken Felon and is about two minutes long and carries a clean flag.
|
|
The summary is how to use this amazing doodle.
|
|
This episode of HBR is brought to you by an honesthost.com.
|
|
Get 15% discount on all shared hosting with the offer code HBR15.
|
|
That's HBR15.
|
|
Better web hosting that's honest and fair at An Honesthost.com.
|
|
Hi everybody, my name is Ken Felon and you're listening to another episode of Hacker Public Radio.
|
|
Today I want to talk to you about a tool that I keep forgetting about and it is an optical
|
|
character recognition tool that takes in an image file and then will convert that to text.
|
|
The brilliant thing about it is it's got loads of language support so if you're scanning
|
|
English documents it works out in the box with English.
|
|
Otherwise you can install different language packs.
|
|
For example, I've been running Tesseract-L-E-N-G.
|
|
The name of the image file, JPEG and the prefix for the extension and then that's it.
|
|
It'll just happily run and look at the optical characters, the funnels and try and
|
|
determine some text from that.
|
|
As you know, I've done a series here on scanning textbooks and stuff.
|
|
At the moment I'm doing of course myself so I scanned in the book and in order to help me learn
|
|
I have used this to convert the images into text blocks of text which I then get optical character
|
|
then text-to-speech tools like eSpeak, not like eSpeak, eSpeak to read back to me and that way I hear
|
|
what I'm learning and I see what I'm reading so that's very good too.
|
|
It also allows you to change languages as I said and it also figures out very complicated
|
|
column formats so that if you have, for example, a page with three different columns in it,
|
|
it'll it's intelligent enough to know that column 1 is here and then the text was underneath that
|
|
and from column 2 and the text from column 3 goes underneath that.
|
|
So very very cool little tool and hopefully if you have needed it's quite cool
|
|
for converting stuff to text for editing.
|
|
Okay, that's what it's talk to you later and tune in tomorrow for another exciting
|
|
episode of Hercopublic Radio.
|
|
You've been listening to HercopublicRadio at HercopublicRadio.org.
|
|
We are a community podcast network that releases shows every weekday, Monday through Friday.
|
|
Today's show, like all our shows, was contributed by an HBR listener like yourself.
|
|
If you ever thought of recording a podcast then click on our contributing
|
|
to find out how easy it really is.
|
|
Hercopublic Radio was founded by the digital dog pound and the infonomicon computer club
|
|
and is part of the binary revolution at binwreff.com.
|
|
If you have comments on today's show, please email the host directly, leave a comment on the website
|
|
or record a follow-up episode yourself.
|
|
Unless otherwise status, today's show is released on the creative comments,
|
|
attribution, share a light, 3.0 license.
|