hpr-knowledge-base/hpr_transcripts/hpr1939.txt

Episode: 1939
Title: HPR1939: Collating Pages with pdftk
Source: https://hub.hackerpublicradio.org/ccdn.php?filename=/eps/hpr1939/hpr1939.mp3
Transcribed: 2025-10-18 11:29:39

---

This is HPR Episode 1930, I'm titled, Creating Pages with BDFDK.
It is hosted by John Kulp and is about 16 minutes long.
The summary is, I describe how to create the pages of two separate BDF file using BDFDK.
This episode of HPR is brought to you by an honesthost.com.
Get 15% discount on all shared hosting with the offer code HPR15, that's HPR15.
Better web hosting that's honest and fair, at An honesthost.com.
Hey everybody, this is John Kulp and Latvia Louisiana.
It is Christmas vacation and so I've got a little bit of time to record a show.
I'm probably not going to have as much time going forward to record shows as I used to
because I'm starting a new job.
I've mentioned this a couple of times in previous episodes that I am now the director of the School
of Music and Performing Arts here at UL Lafayette and as such, I'm learning new stuff
and so it's going to take probably more time than I'm used to.
It's a good thing, I'm ready for a new challenge and hopefully I'll do a good job.
However, this is probably going to mean curtailing my HPR activities somewhat.
But I got a little bit of time right now and this episode relates to something that's going on
as I'm moving into my new office.
So you can imagine having stayed in my previous office for nearly 15 years that I've accumulated
quite a lot of stuff and I actually brought a lot of stuff with me when I first moved in there
and the thing that prompted this episode is all of my old class notes from graduate school.
I saved all of these things thinking well maybe someday I'm going to want to refer to all of
these things that my wise instructors told me when I was in graduate school because I might need
to pass this knowledge along to my own students to use it to to formulate my own coursework and
for the class notes for the classes I'm teaching and so for not saying this very well.
Anyway, I thought I might need them so I kept them and these are I mean many, many files full
of notes and handouts and all kinds of stuff that my teachers back in Texas gave me when I was
in graduate school and notes that I took by hand and so forth. A lot of stuff. I kind of want to
get rid of all these things because they are heavy, they dig up space and honestly I have not
looked at most of them. There are a handful of classes where I've gone back and referred to notes
many times and then a lot of other classes where I've not really referred to them at all.
But I think I might want to someday. This is the academic hoarder in me coming out thinking well
you never know. It's sure it's been 20 years since I took that class and I haven't looked at the
notes but you never know when the occasion might arise where you need them. So what I'm doing is
I'm starting to scan these things and I'm doing it because I got a scanner that came with my new job.
I've got this. The printer is a brother DCP all in one 7065dL maybe I think that's a model number.
Don't quote me on that. I can't look at it right now. But it's a really nice laser printer
and flatbed scanner and it has a feeder. The feeder is what makes all of this stuff possible. So
you put a pile of pages in the feeder, press scan and off it goes. It just scans one page after
another and so it can do a whole stack of 25 or 30 pages in just a couple of minutes. Whereas if
I had to feed one page at a time or put them on a flatbed or something it would have taken too long
and I would never would even have started doing this. It would just be too hard.
So I'm scanning notes. It works great if your notes are all on one side of the page because you
can scan it straight to a PDF, save it, you're done. However, many of my notes are double-sided
which means that and this scanner will not do double-sided scanning. So I put it in and scan
the front side and after that's done I turn the entire stack over and scan the back side.
And that means I end up with two separate PDFs. One that has all of the front pages and one that
has all of the back pages. Now this is better than nothing but it would be better wouldn't it
if all of the pages were together in the same file in the right order. So there are two problems.
Here one is that when I scan the back sides I have to scan in reverse order. So last page to first.
And then the other problem of course is that the pages are not collated. So I'm pretty handy
with scripting and I've used the PDF toolkit, PDFTK, many times to do all kinds of manipulation of
PDF files. And so I thought well I can probably sort this out using PDFTK again. If you have never
listened to it you might want to go back and listen to episode 1760, my first episode about PDFTK
where I go down a lot of the basic commands and stuff. In this episode I just want to talk about
what I think is a pretty clever solution to this problem of having two separate PDFs. One with
all the front sides and the other with all of the back sides in reverse order. So I wrote a script
that will take a couple of steps to do all the things we need to to get it in the right order.
The first thing that needs to be done is to reverse the order of the pages for file number two.
That housed all the back sides. And thankfully this is super easy with PDFTK. I actually gave an
example of this in my episode 1760. I had mentioned that my wife came home one day annoyed after
having scanned an entire article I guess at the library or somewhere and it was done. She did it
in reverse order because when we were graduate students in order to get the paper to come out
the right way. We always scanned our articles back to front and when they came out of the
photocopy of that way the first page was on top and the last page was on the bottom. But if you're
scanning then that just means you've scanned it in the wrong order. And so she was kind of upset
and I told her don't worry I think I can sort this out. And I found that you can just run a PDFTK
command that will concatenate from the last page to the first and then output it to a new file name
and it's done. So the first command is PDFTK followed by the name of the file that has the
reverse pages followed by I'm just going to read this. PDFTK space backwards files. PDF, space,
cat, space, end-1, space, output, space, new file named.PDF and that's that. And in this
spelled END and then just use a hyphen and then the number one. And so that tells it to take the
file that you're inputting and read it backwards to front and output with a new file name where all
the pages are in the right order. Great. Okay so the pages on the second file are now in the correct
order. This is good. So what we've got to do next is figure out a way to take the first page of the
first file and the first page of the second file and put them next to each other and do this
consecutively for all of the pages in both files. Well the way to do this that I have that I came
up with is to burst both of the files. There's one of the commands for the PDFTK for the PDF
toolkit is the burst command. When you burst a PDF it separates the file into all of its component
pages and numbers them consecutively 0, 0, 0, 1, 0, 0, 0, 2 and so forth. And so I knew about the
burst command. I had never really had much occasion to use it but this is a perfect opportunity.
So what I do is burst both of these files so that all of the pages are separated.
And to make this work out now I could I suppose just do that and then run the PDFTK cat command.
And I mean in theory it's not that hard at all to take and put a whole bunch of PDF files
together into one big one in any order you want. You just go PDFTK, space, first file name,
second file name, third file name and so forth. You just keep telling it which files you want to
put together and when you've done all that do space cat output in the new file name and it will
put them all together. But this is extremely tedious. This has to be automated because I'm talking
about here sometimes up to 40 different pages. And there is no way I'm going to type in 40 command
line arguments for something. So this has to be automated. The way I found to do this was
to use the file naming option. PDFTK allows you to specify certain file name and conventions
for the burst command. And so what I've done is chose to burst with the file name starting with
percent zero three D. And what that tells it is to use in the numbering of these pages to use
three digits and to put that at the front of the file name. So the command then is PDFTK,
spacefrontpages.pdf, space, burst, space, output, space, percent, zero three D underscore
front.pdf. What that one is going to do is burst the first PDF into all of its component pages
and the file names will read 001 underscore front, 002 underscore front, 003 underscore front
and so forth. Now I do the same thing with the other PDF that has all the back pages,
only this time instead of underscore front, I use underscore reverse. Now I don't say underscore
back because that would mess up what I'm about to do. What I want to be able to do is have the
list command LS give me all of the files that are going to be the arguments when I'm concatenating
everything. And if I use the word back instead of reverse, then all the back pages are going to be
listed before the front pages because it lists alphabetically. So I use the word word reverse
instead. So the next command to burst the other file then be PDFTK, space, backpages.pdf,
space, burst, space, output, space, percent, zero three D underscore reverse.pdf.
And after you've done this to each of the files, if you look in your director, you'll see all
those burst pages in there. And then the next thing to do is concatenate them. And the way to do that
is by the following command instead of specifying each file individually, I use the list command and
filter out any file that does not begin with a number. So here's the way that works. PDFTK, space,
dollar sign, open parenthesis, LS, pipe, grip, space, carrot, left square brace, zero dash nine
right square brace. And then for good measure, I do another, not just to make sure it starts with
two digits. So after the first zero through nine right square brace, left square brace, zero
dash nine right square brace, close parenthesis. So that whole thing starting with the dollar sign
to the close parenthesis tells it use the output of this list command filtering out everything that
doesn't start with a number as the command line arguments for PDFTK. And after that, space, cat,
space output, space, new file name combined dot PDF or whatever you want to call it.
And voila, it's done. It has concatenated everything in the right order. And it's done so very,
very quickly. Now I've put all these into a script and in the show notes, you can see the script
that I use. And so the script I call front back PDF. And so the command that I run will be front
back PDF space front page file name, space back page file name. And it does all the rest.
All right. No, hope that's not too confusing. And hope you found that somewhat useful.
And I think that's going to be about it. Be sure to check out the show notes if you're
interested in trying this because I fear I may have messed things up when I'm speaking out
commands. This doesn't come all that naturally to me. All right. Oh, by the way, I've been doing
a series of episodes where I try different audio configurations. And the last few of them,
I've used my phone plus the $2 microphone. And today I'm using a different thing. This is a
microphone that I inherited when I saw it sitting on my boss's counter several months ago.
This is microphone that looks like a pretty high-tech studio mic, but it's actually it's a USB
microphone made by M audio. That's M-audio. It's called producer USB. And so I'm using this M audio
producer USB mic and just plugged into my laptop recording to Audacity. I don't normally record
to Audacity because I'm afraid it's going to crash whenever I'm recording. I prefer to record
either to my phone or to my Zoom H1. But today since this is a USB microphone, I can't really
plug it into the Zoom. So I'm doing it on my laptop. And I hope it sounds pretty good. I don't know
how much this microphone costs because it's a little bit old and it was just given to me.
But I think it's one of those in about the $50 range or something. So a little bit more than the
$2 microphone, but not really out of reach for normal people either. Anyway, thanks a lot. I will
talk to you guys later. Bye.
You've been listening to Hacker Public Radio at Hacker Public Radio. We are a community podcast
network that releases shows every weekday Monday through Friday. Today's show, like all our shows,
was contributed by an HBR listener like yourself. If you ever thought of recording a podcast
and click on our contributing to find out how easy it really is. Hacker Public Radio was found
by the digital dog pound and the infonomicon computer club and it's part of the binary revolution
at binwreff.com. If you have comments on today's show, please email the host directly, leave a
comment on the website or record a follow-up episode yourself. Unless otherwise status,
today's show is released on the create of comments, attribution, share a light 3.0 license.