- MCP server with stdio transport for local use - Search episodes, transcripts, hosts, and series - 4,511 episodes with metadata and transcripts - Data loader with in-memory JSON storage 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
152 lines
13 KiB
Plaintext
152 lines
13 KiB
Plaintext
Episode: 1939
|
|
Title: HPR1939: Collating Pages with pdftk
|
|
Source: https://hub.hackerpublicradio.org/ccdn.php?filename=/eps/hpr1939/hpr1939.mp3
|
|
Transcribed: 2025-10-18 11:29:39
|
|
|
|
---
|
|
|
|
This is HPR Episode 1930, I'm titled, Creating Pages with BDFDK.
|
|
It is hosted by John Kulp and is about 16 minutes long.
|
|
The summary is, I describe how to create the pages of two separate BDF file using BDFDK.
|
|
This episode of HPR is brought to you by an honesthost.com.
|
|
Get 15% discount on all shared hosting with the offer code HPR15, that's HPR15.
|
|
Better web hosting that's honest and fair, at An honesthost.com.
|
|
Hey everybody, this is John Kulp and Latvia Louisiana.
|
|
It is Christmas vacation and so I've got a little bit of time to record a show.
|
|
I'm probably not going to have as much time going forward to record shows as I used to
|
|
because I'm starting a new job.
|
|
I've mentioned this a couple of times in previous episodes that I am now the director of the School
|
|
of Music and Performing Arts here at UL Lafayette and as such, I'm learning new stuff
|
|
and so it's going to take probably more time than I'm used to.
|
|
It's a good thing, I'm ready for a new challenge and hopefully I'll do a good job.
|
|
However, this is probably going to mean curtailing my HPR activities somewhat.
|
|
But I got a little bit of time right now and this episode relates to something that's going on
|
|
as I'm moving into my new office.
|
|
So you can imagine having stayed in my previous office for nearly 15 years that I've accumulated
|
|
quite a lot of stuff and I actually brought a lot of stuff with me when I first moved in there
|
|
and the thing that prompted this episode is all of my old class notes from graduate school.
|
|
I saved all of these things thinking well maybe someday I'm going to want to refer to all of
|
|
these things that my wise instructors told me when I was in graduate school because I might need
|
|
to pass this knowledge along to my own students to use it to to formulate my own coursework and
|
|
for the class notes for the classes I'm teaching and so for not saying this very well.
|
|
Anyway, I thought I might need them so I kept them and these are I mean many, many files full
|
|
of notes and handouts and all kinds of stuff that my teachers back in Texas gave me when I was
|
|
in graduate school and notes that I took by hand and so forth. A lot of stuff. I kind of want to
|
|
get rid of all these things because they are heavy, they dig up space and honestly I have not
|
|
looked at most of them. There are a handful of classes where I've gone back and referred to notes
|
|
many times and then a lot of other classes where I've not really referred to them at all.
|
|
But I think I might want to someday. This is the academic hoarder in me coming out thinking well
|
|
you never know. It's sure it's been 20 years since I took that class and I haven't looked at the
|
|
notes but you never know when the occasion might arise where you need them. So what I'm doing is
|
|
I'm starting to scan these things and I'm doing it because I got a scanner that came with my new job.
|
|
I've got this. The printer is a brother DCP all in one 7065dL maybe I think that's a model number.
|
|
Don't quote me on that. I can't look at it right now. But it's a really nice laser printer
|
|
and flatbed scanner and it has a feeder. The feeder is what makes all of this stuff possible. So
|
|
you put a pile of pages in the feeder, press scan and off it goes. It just scans one page after
|
|
another and so it can do a whole stack of 25 or 30 pages in just a couple of minutes. Whereas if
|
|
I had to feed one page at a time or put them on a flatbed or something it would have taken too long
|
|
and I would never would even have started doing this. It would just be too hard.
|
|
So I'm scanning notes. It works great if your notes are all on one side of the page because you
|
|
can scan it straight to a PDF, save it, you're done. However, many of my notes are double-sided
|
|
which means that and this scanner will not do double-sided scanning. So I put it in and scan
|
|
the front side and after that's done I turn the entire stack over and scan the back side.
|
|
And that means I end up with two separate PDFs. One that has all of the front pages and one that
|
|
has all of the back pages. Now this is better than nothing but it would be better wouldn't it
|
|
if all of the pages were together in the same file in the right order. So there are two problems.
|
|
Here one is that when I scan the back sides I have to scan in reverse order. So last page to first.
|
|
And then the other problem of course is that the pages are not collated. So I'm pretty handy
|
|
with scripting and I've used the PDF toolkit, PDFTK, many times to do all kinds of manipulation of
|
|
PDF files. And so I thought well I can probably sort this out using PDFTK again. If you have never
|
|
listened to it you might want to go back and listen to episode 1760, my first episode about PDFTK
|
|
where I go down a lot of the basic commands and stuff. In this episode I just want to talk about
|
|
what I think is a pretty clever solution to this problem of having two separate PDFs. One with
|
|
all the front sides and the other with all of the back sides in reverse order. So I wrote a script
|
|
that will take a couple of steps to do all the things we need to to get it in the right order.
|
|
The first thing that needs to be done is to reverse the order of the pages for file number two.
|
|
That housed all the back sides. And thankfully this is super easy with PDFTK. I actually gave an
|
|
example of this in my episode 1760. I had mentioned that my wife came home one day annoyed after
|
|
having scanned an entire article I guess at the library or somewhere and it was done. She did it
|
|
in reverse order because when we were graduate students in order to get the paper to come out
|
|
the right way. We always scanned our articles back to front and when they came out of the
|
|
photocopy of that way the first page was on top and the last page was on the bottom. But if you're
|
|
scanning then that just means you've scanned it in the wrong order. And so she was kind of upset
|
|
and I told her don't worry I think I can sort this out. And I found that you can just run a PDFTK
|
|
command that will concatenate from the last page to the first and then output it to a new file name
|
|
and it's done. So the first command is PDFTK followed by the name of the file that has the
|
|
reverse pages followed by I'm just going to read this. PDFTK space backwards files. PDF, space,
|
|
cat, space, end-1, space, output, space, new file named.PDF and that's that. And in this
|
|
spelled END and then just use a hyphen and then the number one. And so that tells it to take the
|
|
file that you're inputting and read it backwards to front and output with a new file name where all
|
|
the pages are in the right order. Great. Okay so the pages on the second file are now in the correct
|
|
order. This is good. So what we've got to do next is figure out a way to take the first page of the
|
|
first file and the first page of the second file and put them next to each other and do this
|
|
consecutively for all of the pages in both files. Well the way to do this that I have that I came
|
|
up with is to burst both of the files. There's one of the commands for the PDFTK for the PDF
|
|
toolkit is the burst command. When you burst a PDF it separates the file into all of its component
|
|
pages and numbers them consecutively 0, 0, 0, 1, 0, 0, 0, 2 and so forth. And so I knew about the
|
|
burst command. I had never really had much occasion to use it but this is a perfect opportunity.
|
|
So what I do is burst both of these files so that all of the pages are separated.
|
|
And to make this work out now I could I suppose just do that and then run the PDFTK cat command.
|
|
And I mean in theory it's not that hard at all to take and put a whole bunch of PDF files
|
|
together into one big one in any order you want. You just go PDFTK, space, first file name,
|
|
second file name, third file name and so forth. You just keep telling it which files you want to
|
|
put together and when you've done all that do space cat output in the new file name and it will
|
|
put them all together. But this is extremely tedious. This has to be automated because I'm talking
|
|
about here sometimes up to 40 different pages. And there is no way I'm going to type in 40 command
|
|
line arguments for something. So this has to be automated. The way I found to do this was
|
|
to use the file naming option. PDFTK allows you to specify certain file name and conventions
|
|
for the burst command. And so what I've done is chose to burst with the file name starting with
|
|
percent zero three D. And what that tells it is to use in the numbering of these pages to use
|
|
three digits and to put that at the front of the file name. So the command then is PDFTK,
|
|
spacefrontpages.pdf, space, burst, space, output, space, percent, zero three D underscore
|
|
front.pdf. What that one is going to do is burst the first PDF into all of its component pages
|
|
and the file names will read 001 underscore front, 002 underscore front, 003 underscore front
|
|
and so forth. Now I do the same thing with the other PDF that has all the back pages,
|
|
only this time instead of underscore front, I use underscore reverse. Now I don't say underscore
|
|
back because that would mess up what I'm about to do. What I want to be able to do is have the
|
|
list command LS give me all of the files that are going to be the arguments when I'm concatenating
|
|
everything. And if I use the word back instead of reverse, then all the back pages are going to be
|
|
listed before the front pages because it lists alphabetically. So I use the word word reverse
|
|
instead. So the next command to burst the other file then be PDFTK, space, backpages.pdf,
|
|
space, burst, space, output, space, percent, zero three D underscore reverse.pdf.
|
|
And after you've done this to each of the files, if you look in your director, you'll see all
|
|
those burst pages in there. And then the next thing to do is concatenate them. And the way to do that
|
|
is by the following command instead of specifying each file individually, I use the list command and
|
|
filter out any file that does not begin with a number. So here's the way that works. PDFTK, space,
|
|
dollar sign, open parenthesis, LS, pipe, grip, space, carrot, left square brace, zero dash nine
|
|
right square brace. And then for good measure, I do another, not just to make sure it starts with
|
|
two digits. So after the first zero through nine right square brace, left square brace, zero
|
|
dash nine right square brace, close parenthesis. So that whole thing starting with the dollar sign
|
|
to the close parenthesis tells it use the output of this list command filtering out everything that
|
|
doesn't start with a number as the command line arguments for PDFTK. And after that, space, cat,
|
|
space output, space, new file name combined dot PDF or whatever you want to call it.
|
|
And voila, it's done. It has concatenated everything in the right order. And it's done so very,
|
|
very quickly. Now I've put all these into a script and in the show notes, you can see the script
|
|
that I use. And so the script I call front back PDF. And so the command that I run will be front
|
|
back PDF space front page file name, space back page file name. And it does all the rest.
|
|
All right. No, hope that's not too confusing. And hope you found that somewhat useful.
|
|
And I think that's going to be about it. Be sure to check out the show notes if you're
|
|
interested in trying this because I fear I may have messed things up when I'm speaking out
|
|
commands. This doesn't come all that naturally to me. All right. Oh, by the way, I've been doing
|
|
a series of episodes where I try different audio configurations. And the last few of them,
|
|
I've used my phone plus the $2 microphone. And today I'm using a different thing. This is a
|
|
microphone that I inherited when I saw it sitting on my boss's counter several months ago.
|
|
This is microphone that looks like a pretty high-tech studio mic, but it's actually it's a USB
|
|
microphone made by M audio. That's M-audio. It's called producer USB. And so I'm using this M audio
|
|
producer USB mic and just plugged into my laptop recording to Audacity. I don't normally record
|
|
to Audacity because I'm afraid it's going to crash whenever I'm recording. I prefer to record
|
|
either to my phone or to my Zoom H1. But today since this is a USB microphone, I can't really
|
|
plug it into the Zoom. So I'm doing it on my laptop. And I hope it sounds pretty good. I don't know
|
|
how much this microphone costs because it's a little bit old and it was just given to me.
|
|
But I think it's one of those in about the $50 range or something. So a little bit more than the
|
|
$2 microphone, but not really out of reach for normal people either. Anyway, thanks a lot. I will
|
|
talk to you guys later. Bye.
|
|
You've been listening to Hacker Public Radio at Hacker Public Radio. We are a community podcast
|
|
network that releases shows every weekday Monday through Friday. Today's show, like all our shows,
|
|
was contributed by an HBR listener like yourself. If you ever thought of recording a podcast
|
|
and click on our contributing to find out how easy it really is. Hacker Public Radio was found
|
|
by the digital dog pound and the infonomicon computer club and it's part of the binary revolution
|
|
at binwreff.com. If you have comments on today's show, please email the host directly, leave a
|
|
comment on the website or record a follow-up episode yourself. Unless otherwise status,
|
|
today's show is released on the create of comments, attribution, share a light 3.0 license.
|