hpr-knowledge-base/hpr_transcripts/hpr2430.txt

Episode: 2430
Title: HPR2430: Scanning books
Source: https://hub.hackerpublicradio.org/ccdn.php?filename=/eps/hpr2430/hpr2430.mp3
Transcribed: 2025-10-19 02:51:53

---

This in HPR episode 2,430 entitled Cunning Hooks, it is hosted by Ken Fallon and in about
12 minutes long, and Karimanek's visit flag.
The summary is, Ken explains how and why he is Cunning Hooks.
This episode of HPR is brought to you by an honesthost.com, get 15% discount on all shared
hosting with the offer code HPR15, that's HPR15, better web hosting that's honest
and fair at an honesthost.com.
Hi everybody, my name is Ken Fallon, you're listening to another episode of Hacker
Public Radio.
Today I want to talk to you about scanning schoolbooks and how I do it.
Some of you, if you have children, they probably have schoolbooks.
Schoolbooks are much preferred for education system over PDFs in my opinion or e-book.
The simple reason for this is if you're doing something like geography, I want to refer
to a map or something, you have the map on one page and the book over and refer to
the questions and go back and forward continually from the word lists at the back, translations
at the front or whatever.
So physical schoolbooks are a thing, there's a reason they've been so successful and they
continue to be.
That said, you don't particularly want your child to be looking home all the books every
day and photocopying certain pages as a bit of a book.
My daughter's school, they have rented the books, by the way in the Netherlands, the children
get the books for free as well as for free.
So the schoolbooks are provided by the school themselves, but they will, in my daughter's
school, they rent the books from the book and then they have to pay a 50 euro deposit
and if the books are in condition, they get the deposit back.
They also have an option where if you're a customer with them, you can rent another for
50 euros, you can rent a complete set of the books for home use so that you don't need
to be looking your books to school and back the whole time.
And that's also very useful and that's it for my son.
They have bought the books outright, a communal schools or a coop of schools have bought
all the books outright so therefore we can't avail of the rent second rent system.
So what I've decided to do is scan all the books.
The books themselves, if you're to buy them are quite expensive, they're about between
70 and 150 euros per book so it is fairly hefty.
So what to do, what to do and the answer of course is scan the books to, as you will
know, I have a printer scanner, I've already done an episode on this and a continuous
ink supply system and it is a brother MFC J59100W and that has a scanning option.
As part of that, it comes with some tools that allow you to scan over the network, scan
an image and you can use a device name brother for I name on dev01 in my case.
However those tools as I found out are only available on I think they're proprietary
and I think they're only available on Intel platforms so I actually scan them
to reply but that was not an option that I had.
Now one thing I could do is connect a Raspberry Pi directly and then scan it over USB
into the Raspberry Pi and have the Raspberry.
So if you're going to do that, that's a definite option and then scan image will work.
This isn't really a technical show as such because it's more, this is what I've done,
there are a lot of manual steps involved, it's a process, a lot more than I'd like, mostly
due to the book publishers insisting on using non-standard formats for the shape of their
books.
Why all the books can't be A4 or similar, actually everything should be.
You've done transferring to from Fahrenheit to Celsius, you might want to transfer to
the A standardized ISO A4, A3, A8, A0, a papering system.
But anyway, let the flame wars begin.
So what I'm doing here is I'm setting up a variable, this is a back script that basically
runs an infinite and I set the image path and then I picked the final name of the image
path for on the day command, whereas specify the date to be saved as plus percent, well,
it's the ISO A61 date, which is year, month, day, well, on score is nice, h hours, minute,
seconds.
And that gives me a year, then I run the scan image program, which is default with the
same pack and I select the device and I set the resolution to be 300 DPI.
For other things, I use 600 DPI, but that's interpolated, so it's just higher as a
little cost guest.
And then dash dash format equals JPE, up to a file name.
And then I open GwenView with the file name, so I can preview the file if it's not a good
take, then I delete it within, otherwise then loop back, continue the process scanning
page.
Now, on all the books that's worthy for format, I could just simply sit here next to
a scanner with a Bluetooth keyboard and press enter, turn the page, press enter, press
the enter and watch the screen.
But in this case, I've had to, because it doesn't actually totally fit onto the scanner,
sometimes it's let any to bring it in a little bit on the left, a little bit on the right,
I've had to add the GwenView option, and that is the standard KD image display to see
my image left or right.
So once I have all that done, what I do is I go in and find a representative page that
I want to crop down.
So I will, all my images will be saved with a date, time style, there will all be an order
starting when I began scanning the book from the first page at the end book.
And it doesn't really matter what the time stamps are, it's just important that they're
in sequence.
And when you're scanning books, it will be, you'll be scanning the first page, will be
the right way around, the second page will be upside down, the next one will be right
way around, the next one will be.
When you open up all the images, there will always be an area of the flap of the flap
edge scanner that's excessive, so it'll be a gray bar at the bottom to get rid of.
So the idea is the first thing, once you've finished all your scans, is make a backup
of the scans that you have.
And then highlight one of the images, save it somewhere else and then crop it, get rid
of that black part of the side.
And you can use, so that is the area that you're interested in of the image.
Now if you're scanning using X-Sane or something like that, that gives you the option per book,
per scan to identify the areas.
To be honest, I wanted to just keep it a little bit more generic and it's actually trivial
to post process the whole thing, rather than scanning and then accidentally truncating
the last two millimeters of the page, some critical word or some eye scan, the whole flap
edge, it's just a lot.
So then I use GM, identify, now GM is what's called graphics magic.
So image magic, I heard this on the tux jumper, that image magic was not being maintained
and that graphics equals is plug and replace them to a loan behold.
I already have graphics magic available to me and I was blown away.
It's a lot easier to use, you type GM and then press enter and you get some help.
So GM, space, identify, will identify the image, which was previously identified.
In image magic, all the tools had different names and you had no idea which one to call.
The graphic magic magic, you have graphic magic GM and then you have the command that
you want, in my case, I want to identify.
So that identifies, it's a JPEG image and then it gives the dimensions of it.
So for two, seven, in my case here, it's two, four, seven, seven, X, two, six, zero, zero
plus zero, zero, zero and then direct class, it's blah, blah, blah.
So that's information that two, four, seven, seven, X, two, six, zero, is actually what
I want to crop all the images.
So you can do that using GM, magnify, space, dash crop, space, just paste that number in
and then space, asterisk.jpeg, boom, all your images get cropped to that size.
So now all your images are now the right size with that piece of the flatbed scanner gun.
So rather than losing any, I have all the information that I took back up of all the original
images before and if you didn't, that's an important step.
Take a back up.
Also, if I haven't mentioned in the Netherlands, it is legal to copy books for your own personal.
What I'm doing is, may very well be illegal in New York, but it is perfectly legal.
So now I've cropped all the images.
So the only problem now is that the first page is around the second page by subversive.
So I've written a very small program that will make use of GM again, GM, magnify, dash,
rotate, space, 180 and then the image in it.
So what I want to do here is I set skip to one, it's a variable and then for image in
a strict JPEG, do if skip is equal to one, then skipping image sets skip or skip equal
zero.
Then when it loops else rotating image, GMs is magnify, space, dash, rotate, rotate, image
and then set skip equals to one.
So when you loop through that, you're skipping this image, rotating, that's skipping this
image, rotating.
Now, depending on your flatbed scanner, the way you choose to scan, the first page may
be all the images will now be the same orientation.
They may all be upside down or they may be all the right.
If they're all upside down, you just go GM, magnify, dash, rotate, 180, asterix and
they love what you will notice though is if you, there might be a few pages at the end
that are the right and wrong way.
Now one thing I forgot to mention is that after you've done the scan, the first thing
that you should do, so you do your scan, the first thing you actually should do is quickly
check to see if you've got all the bits, one right way round, upside down, right way
round.
If to the right way round, you've missed it, and what I do is I go back and scan it and
then find out what the name of the one was before us and then add a milly to that one
and then use that as the final name to save.
So make sure that you have all, right way, wrong way, right way, wrong way, right way,
wrong way, right way, wrong way.
The whole way through that you have to missed any pages or that you have to scan any pages
twice.
So this is a little bit of a reborious process.
So once you've done that, you zip them all up and save them, as you've done that, identify
crop one of the images information, crop all of the images, then run this rotate everything
to the bash command and if you need to, you can log for GM, rotate 180 if you need to.
And then the only thing that's left to do is convert all of these into a PDF.
Now I found that it got too big, the program complained whether I used image magic or graphics
magic, both of them complained creating a large PDF file like so what I ended up doing
was looking into each of the individual books and breaking it up per chapter or usually
it's section.
So each of these books usually have about five sections for a section for semester.
And then I make subdirectories, I make a directory of the name of the book, be all the images
into that, I make subdirectories in there, crop one, two or five or whatever.
And then I put physically copy and put the images in there.
And then I run a simple script for I do convert dollar I forward slash asterisk.jpg into
dollar I PDF.
So what they're all about, I'll put a copy of all of these things into the show notes.
And what you do there is for every subject that there is, it will run the convert and create
a, a final name, a PDF with this subject, so chapter one, the PDF, chapter two, the PDF.
And that's pretty much it.
So for the most part, this is an easy enough thing to do.
And it takes me about two and a half per book, except for this one because this one, this
takes me about an hour and a half per book and I'll just sit there, usually watch a few
big live videos or YouTube videos and just sit beside the scanner, press enter on my
keyboard, flip the book, press enter.
And you get kudos that you're doing something for your kids while at the same time enjoying
an electronic video.
Anyway, that's it, hopefully Murphy has not messed this one up too much for me, so I will
go and post it.
And tune in tomorrow for another exciting episode of Hacker Radio!
You've been listening to Hacker Public Radio at Hacker Public Radio dot org.
We are a community podcast network that releases shows every weekday Monday through Friday.
Today's show, like all our shows, was contributed by an HPR listener like yourself.
If you ever thought of recording a podcast, then click on our contribute link to find
out how easy it really is.
Hacker Public Radio was founded by the digital dot org pound and the Infonomicon Computer
Club and is part of the binary revolution at binwreff.com.
If you have comments on today's show, please email the host directly, leave a comment on
the website or record a follow-up episode yourself.
Unless otherwise stated, today's show is released on the creative comments, attribution,
share a like, 3.0 license.