Initial commit: HPR Knowledge Base MCP Server
- MCP server with stdio transport for local use - Search episodes, transcripts, hosts, and series - 4,511 episodes with metadata and transcripts - Data loader with in-memory JSON storage 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
243
hpr_transcripts/hpr2708.txt
Normal file
243
hpr_transcripts/hpr2708.txt
Normal file
@@ -0,0 +1,243 @@
|
||||
Episode: 2708
|
||||
Title: HPR2708: Ghostscript
|
||||
Source: https://hub.hackerpublicradio.org/ccdn.php?filename=/eps/hpr2708/hpr2708.mp3
|
||||
Transcribed: 2025-10-19 07:54:37
|
||||
|
||||
---
|
||||
|
||||
This is HBR episode 2007-18 titled Ghost Crypt and is part of the series Privacy and Security.
|
||||
It is hosted by Klaatu and is about 23 minutes long and currently in a clean flag.
|
||||
The summary is Klaatu talks about manipulating BDF with the F and BDF table.
|
||||
This episode of HBR is brought to you by AnanasThost.com.
|
||||
Get 15% discount on all shared hosting with the offer code HBR15.
|
||||
That's HBR15.
|
||||
Better web hosting that's honest and fair at AnanasThost.com.
|
||||
Hello folks, Kay Wisher here to remind you that it's that time of year again.
|
||||
Time for the Hacker Public Radio New Year's Eve Show.
|
||||
For those who don't know, on New Year's Eve December 31, 2018, at 10am UTC,
|
||||
that is 5am Eastern Standard Time.
|
||||
We will have a recording going on the HBR Mumble Server for anyone to come on and say happy
|
||||
New Year and talk about whatever they want.
|
||||
We will leave the recording going until January 1, 2019, 12am UTC.
|
||||
That will be 7am Eastern Standard Time or until the conversation stops.
|
||||
Please visit hackerpublicradio.org to find all the details and links
|
||||
about how to set up the PC Mumble client, your favorite mobile app,
|
||||
the mobile server connection details.
|
||||
Our Etherpad show notes and the live audio stream if you only prefer to listen in on the
|
||||
lively banter. So please stop and say hi and maybe join in the conversation with other HBR
|
||||
listeners and contributors. It's always a good time.
|
||||
You're listening to Hacker Public Radio. My name is Clat 2.
|
||||
In this episode, I want to talk a little bit about PDFs.
|
||||
Specifically, how I manage to live with them.
|
||||
And I've done an episode pretty sure with Lost and Bronx about why PDFs are some of
|
||||
the most important pieces of code to ever come your way.
|
||||
And I feel that way very strongly.
|
||||
However, that doesn't change the fact that I deal with them all the time, whether I'm
|
||||
purchasing them online under the guise of, oh, these are ebooks, which PDFs are not ebooks at all.
|
||||
Or whether it's because I'm using them at work,
|
||||
outputting to PDF at work. Whatever the case, I have to deal with PDFs a lot.
|
||||
And I just kind of want to talk about some of the random observations and tricks that I've come
|
||||
up with when having to do things with PDFs. So the first thing that I want to talk about,
|
||||
and I've talked about this on my show, a new world order before, but I think it deserves
|
||||
not really another mention, but some additional information. And that is the Ghost Script command,
|
||||
or as it is typed, the GS command. So Ghost Script is the free and open-source version of
|
||||
Post Script. Post Script being the syntax and code used to generate how a printer is going to
|
||||
produce whatever it produces. So you might have dealt with Post Script directly as an EPS.
|
||||
That's an encapsulated, I think, Post Script file. Post Script is the back end for PDFs,
|
||||
and it is the back end for many printers. So the vectorized versions and the code that goes into
|
||||
ensuring that what you print is the same thing as the stuff that you see in your PDF,
|
||||
that's Post Script. And you can manipulate that a little bit. We'll look in a moment at just how
|
||||
ugly PDFs are and how difficult it makes it to really do anything useful to it after it's been
|
||||
generated. But there are a couple of quick hacks that you can do to help yourself manage some of
|
||||
the PDFs in your life. So the first problem that I often have to solve, and this I've covered
|
||||
on my show before, but not on Hacker Public Radio. So I might as well talk about it. So the first
|
||||
thing is that a lot of PDFs are really, really large. And that is because PDFs are intended as
|
||||
printer input, really. You send a PDF to a printer, and that produces that PDF as a physical
|
||||
thing, as a physical document. That's what a PDF is, which means that a lot of times when people
|
||||
create a PDF, they go, for instance, if I'm inscribed this, I'll go to export, save as PDF. And if I
|
||||
go to color output intended for screen web, okay, that's one thing. Now I could go to printer.
|
||||
The printer output typically, all the defaults, actually that didn't reset the defaults, but anyway,
|
||||
I can set these defaults. So the resolution for graphics, let's say 300 DPI maximum image resolution,
|
||||
300 DPI compression method, lossy or loss less. Yeah, let's go loss less. Compression quality,
|
||||
let's go maximum quality. So the defaults get set very high for the typical output of a PDF.
|
||||
The resulting file size is, indeed, for instance, the sample PDF that I did for my episode on
|
||||
scribus is about nine megabytes for the printer version. And that's quite a hefty file size for
|
||||
one page. It's a one-page document. It's nine dot one megabytes. Then the smaller version of that
|
||||
is like 900 kilobytes less than a megabyte. And that's output for the web. So there are a
|
||||
couple of different profiles that post script or ghost script at least. I don't know exactly what
|
||||
the post script terminology is, but ghost script can accept a couple of different profiles for
|
||||
its output. And you can manipulate that yourself for something that already exists. So for instance,
|
||||
if I have this example file from my scribus episode, I can do GS for ghost script. And then dash
|
||||
s, lowercase s, device, all capitals equals PDF right. So I'm just outputting back out to
|
||||
to to the PDF writer. I'm not actually printing dash D. Compatibility level equals. I'm going to
|
||||
set it really low because I like backward compatibility dash D. So that's 1.4. So dash D PDF
|
||||
settings equals. And this is the profile. There are five different profiles that ghost scripts
|
||||
can understand. One is a forward slash screen, which is intended for screen viewing only. So it's
|
||||
72 DPI maximum images. So anything greater than that at downres. Slash ebook, that's forward slash
|
||||
ebook is a low quality 150 DPI image. So that's not bad, but you wouldn't you probably wouldn't
|
||||
want to print from it. I mean, you honestly probably could, but let's say you're, you know,
|
||||
you wouldn't send it to a professional printer probably. Forward slash printer is high quality
|
||||
300 DPI. Forward slash pre press is 300 DPI image with color color space being managed. And then
|
||||
forward slash default is something else apparently super similar to screen. I'm not clear on
|
||||
the difference there. So those are the different profiles that you can you can you can leverage. So
|
||||
if we just go for forward slash screen for a nine megabyte file, that should have a pretty dramatic
|
||||
result, which is what I'm looking for for the sake of of this proof of concept. So then I'm going
|
||||
to do another option called dash D and then batch. And these options, I don't I've never seen it
|
||||
typed any other way. So I'm assuming the options can have no space between the option and the
|
||||
attribute or the the the argument. So dash D batch all one word with batch being all capitals.
|
||||
And then dash s output file equals output dot pdf. And then I'm going to point it at this example
|
||||
plus bleed dot pdf, which is in the current directory. The dash D batch makes sure that
|
||||
ghost script does not go. It doesn't drop down into an interactive prompt, which it does by default
|
||||
otherwise. So don't want to don't leave that out. And yes, so here's an output dot pdf at 142
|
||||
kilobytes, which I mean down from nine megabytes is orders of magnitude literally. And the difference
|
||||
is really only in the in the images. So the only the only optimization that it has available to
|
||||
it is two down res images. That's really all all we can do. Well, there's there's something else. But
|
||||
in this in this case, in what we're speaking about right now, it's just it's just the images.
|
||||
And you know, the text is still text. So you can zoom in on that forever. And it will recalibrate
|
||||
how it's aliasing the text. And it'll look great no matter what. You could print that and be
|
||||
perfectly happy with it. It's just the graphics that got down res. Not a big deal really. So now the
|
||||
other thing that I've done in the past to to shrink the size and complexity of PDFs. And that's
|
||||
kind of that's a big one to be honest. Sometimes I can I can kind of handle a PDF on on several
|
||||
devices, whether it's my little ebook, my eink ebook reader or whether it's a mobile phone or
|
||||
something like it it's a pain because you still have to scroll around to try to read. And you know,
|
||||
it doesn't really it doesn't really do that well. But but the the real problem for me is a lot of
|
||||
times that it'll spend so much time trying to render these graphics on this slow device that it
|
||||
slows down the the reading process to just being too annoying. So half the time my my issue is not
|
||||
even necessarily the resolution of graphics. It is it is the presence of graphics. I just don't
|
||||
need to suspend any cycles on generating the graphics half the time. That's not always true.
|
||||
Sometimes the graphics are integral to what you're reading. So you need them there. But other times
|
||||
you don't. And as it turns out, this is I guess a common enough problem because there is a filter
|
||||
in ghost script to filter out images. And the filter is dash D and then filter image all
|
||||
capitals filter image. Now that filters out very specifically raster images. So if you need to get
|
||||
rid of vector images as well, there's a separate filter for that. I find in practice that I don't
|
||||
really have to deal with the vector images very often. It's it's it's almost always raster images
|
||||
that are in PDFs and they are huge. So adding the dash D filter image to the same command. So I
|
||||
guess I'll read that out again. So that's ghost script or GS space space dash S device equals PDF
|
||||
right. That's where we're going to dash D compatibility level equals one dot four. That's the
|
||||
version of PDF readers that will be able to open this, which is I think as far back as you can go.
|
||||
I've never seen anything earlier than that. I mean, never. I haven't seen recently in recent years
|
||||
anything farther back than that dash D PDF settings equals forward slash screen. I'm just keeping
|
||||
it small dash D, especially because we're not going to even have images in in here. Anyway,
|
||||
it doesn't really matter. Dash D batch dash D filter image dash S output file equals output dot
|
||||
PDF and then the example plus bleed dot PDF, which is the the big nine megabyte file that we're
|
||||
going off of here. So you do that and it processes and dumps output dot PDF into the into the
|
||||
current directory. Now I'm doing LS dash LH on output dot PDF and it is down from nine megabytes
|
||||
to 40 kilobytes. That's a lot more reasonable. And if I open the thing up, then I see on my screen
|
||||
a perfect representation of that PDF except there's just no graphic there. So we're not spending any
|
||||
any file size is on on the graphics and we're not spending any CPU cycles trying to render those
|
||||
graphics for no good reason. So that's a huge one for me. That's that's really saved me from being
|
||||
able to you know not being able to read a PDF on some device to actually being able to read the
|
||||
PDF on a device. It's made all the difference. Now the place where that's also made a difference
|
||||
is when when printing like sometimes I'll have a PDF and I want to print something for reference
|
||||
on actual paper. It does happen sometimes and a lot of times they'll have background images you know
|
||||
for for whatever reason like the for style really. I mean it's a background image to evoke
|
||||
some kind of mood or just to look cool and then some other images here and there and maybe
|
||||
the images I could usually stand but I mean to print 50 pages of background floral prints over my
|
||||
text or behind the text ostensibly it just doesn't make any sense. So if you do this command the
|
||||
go script command and filter out all those images that gets rid of those background images. I mean
|
||||
it gets rid of the foreground ones too which that's a little bit annoying but but really the the
|
||||
background images for me are the ones that really matter but I mean I don't even mind printing without
|
||||
the the foreground images usually. I usually don't want the foreground images or if I do it's just
|
||||
a couple of them and those I could like screenshot and print separately or or maybe not print it all
|
||||
and just have them on a screen as a single file and that sort of thing. So go script filter image
|
||||
really really useful if you like me need to sometimes print a PDF and don't want to spend all
|
||||
of your ink on fanciful background images or if PDFs are simply too large for you. Now in the past
|
||||
in a past episode I've talked about bookmarks retaining and editing and applying bookmarks to a
|
||||
PDF file. I've also done an episode on PDFTK which is the program that I generally use to chop
|
||||
app chop up PDFs when I need to just extract you know a page from a PDF just here or there for
|
||||
whatever reason or maybe I need to extract a couple of pages and then merge them back together
|
||||
you know so basically taking a subset of a of a larger PDF and I I realized that I probably
|
||||
should mention a separate or a related program because I don't think I mentioned it may have
|
||||
but it's called PDF stapler and PDF stapler is an application that sort of takes the place of
|
||||
PDFTK not exactly it doesn't have one-to-one parity of features it doesn't quite have everything
|
||||
that PDFTK does but it's got it's got that magical you know 80 or 90% of stuff and what it doesn't
|
||||
do all that well is the bookmarking stuff actually that's PDFTK really but PDF stapler and I have
|
||||
seen it generally called PDF dash stapler PDF dash stapler is a I think it's Python based as far
|
||||
as I remember and its syntax is similar it's not the same it's actually just similar enough to confuse
|
||||
me half the time but it's it's kind of it's kind of similar to PDFTK so for instance if you're
|
||||
going to cat a bunch of files into one big PDF and a common I think for me a common use case
|
||||
for this at one point I used to have to do this a lot I would take a collection of images and then
|
||||
convert them to PDF and then concatenate them into a into a big PDF that was a fairly typical thing
|
||||
to do for for some artists they would need you know they would want their things in in a PDF but
|
||||
they couldn't figure out the easy and quick way to get you know 100 photos or whatever into one
|
||||
one file and that was very frequently doing a convert command on all you know PNGs or whatever
|
||||
in the current directory make them in resize them and put them you know output them is like jpegs
|
||||
and then run some some command to then concatenate all those things into a big PDF so for instance if
|
||||
I was doing that on with pdf stapler it would be pdf dash stapler space cat for the that's the
|
||||
command and then space and then I guess I would just do a wildcard dot pdf or yeah because I would
|
||||
have done a convert on all those jpegs to pdf and then I would have done wildcard dot pdf and then
|
||||
space and I don't know output dot pdf and and and it puts all of the files that you pointed at into
|
||||
one big pdf that will open and people can flip through so it's a cat or cell for some reason I'm
|
||||
not really sure why they they do that I'm not sure if there's a difference but there's their cat
|
||||
to concatenate pages there's also something called cell s e l for select the given page range
|
||||
and again I'm not 100% sure if if they mean for that to if there's going to be some other function
|
||||
for that or if it's if it's just the same thing I'm not sure but it as far as I can tell it's the
|
||||
same thing but anyway there's also Dell for delete the EL you can delete a page or a range of
|
||||
of pages there's burst or split which is creating one file per page for an input pdf which is
|
||||
something that I've I think people probably would need to do I've I've definitely heard people
|
||||
needing to do that I personally I can't imagine having to do that no I can for a printer spread
|
||||
totally I can I can see doing that and then there's also zip which is merge or collate the given
|
||||
input files interleaved so it's you know odds and evens that sort of thing there's also info
|
||||
which displays pdf metadata but there's nothing as far as I know as far as I've been able to
|
||||
find in the command there's nothing to reapply that image that the metadata to a pdf so if you
|
||||
you can you can get the data from something but whether you can reapply it to your new pdf or to
|
||||
to to another pdf for some reason as far as I can tell there is no way in pdf stapler for that to
|
||||
happen the site that you can download that at is github.com slash hellerbard slash stapler and I
|
||||
will put a link to that in the show notes H-E-L-L-E-R-B-I-R-D-E is the username and it's just called
|
||||
stapler there I don't know if I'm using an older version or if if the command simply has
|
||||
remained pdf dash stapler I'm not really I don't really remember where I got this thing it's just
|
||||
one of those things that I have on my work computer and have been using as is with with great success
|
||||
so that's that's another tool that I use it's really interesting if you if you look at pdf files
|
||||
it's kind of shocking like if I do it you can look in pdf it's kind of interesting if you go to
|
||||
emax space and then output dot pdf I'm just doing output dot pdf because that's what I just did
|
||||
with my go script thing that removed the images then I hit return now in emax it it actually renders
|
||||
the pdf for me which I don't actually want in this at this particular moment so we're going to
|
||||
hit control c control c and that gets me to the source view if you will and you can see what goes
|
||||
into making a pdf a pdf and it is horrible to look at it really is it's honestly just dismal
|
||||
there's there you really can't make heads or tails of it but what's funny is that you kind of get
|
||||
this cadence and there's this there's this line here called stream str em and that appears to
|
||||
it seems to begin a block of binary data that you cannot you know it's not it's nothing that you
|
||||
can actually read and then at the end of all that there's an end stream tag I guess you could
|
||||
call it or declaration and then an end object and then a declaration of the object number which
|
||||
I don't know where the object numbers come from I don't know what's generating those it's it's
|
||||
really not very it's pretty mysterious to look at but what's really funny is if you go into these
|
||||
streams and just start deleting things it's kind of entertaining to see exactly how little
|
||||
effect you have on the pdf output like I just deleted a bunch of stuff from a stream and it took
|
||||
away the v in the word gave and the m in the word fanaticism in the in the pdf that I generated
|
||||
and that's all it did and it was like this huge chunk code that I just got rid of and you can do
|
||||
that and and the pdf still opens it's it's really really kind of kind of frightening in a way
|
||||
because you think what what what could someone just put into a pdf file and post online for
|
||||
people to download because apparently the pdf would just open and you have no idea you know
|
||||
really what's in the pdf it's really really strange I've never seen I don't think I've ever
|
||||
quite seen now there I have broken it enough at one point where it wouldn't open but it doesn't
|
||||
it's not something that's it isn't really something that you find you know you there's a lot of
|
||||
flex it's not very strict is what I'm trying to say you can you can delete all kinds of things
|
||||
sometimes there will be no apparent no visible change whatsoever other times there'll be a little
|
||||
and just little quirks you know like maybe a font will will disappear so you're just left with
|
||||
a normal font instead of something that was supposed to be italicized or whatever so yeah it just
|
||||
kind of depends on on what you're deleting but it is quite interesting to have a look behind the
|
||||
scenes and you like I say you can do that in emax when you open emax it'll render the pdf for you
|
||||
so just hit control c control c to get to the to the the text view and you can kind of poke around
|
||||
and see what's what's in a pdf and and yeah you should it's it's surprising what you can just
|
||||
put into pdf's really is it's very very shocking and it kind of makes me think that maybe maybe
|
||||
a file format with a little bit more sort of more transparency and also a stricter kind of stricter
|
||||
syntax checking would be a good idea because these pdf's as far as I can tell you could just put
|
||||
whatever you wanted into them and then send them around and no one would ever really know I mean
|
||||
I guess it would depend I mean maybe you'd have to put for instance a gpg encoded something or
|
||||
another in in there you know maybe you'd want to encode it but but certainly it wouldn't be the
|
||||
first place for people to look I wouldn't imagine now could you do that you know if there are
|
||||
md5 sums being taken and so on no obviously not but it is is fascinating to see just how lazy
|
||||
the pdf format really is and how bloated apparently it is because I I kid you not I've I've deleted
|
||||
screenfills of information and reopen to the pdf with no apparent change in in display it's pretty
|
||||
shocking so there you go that's that's pdf's for you hopefully I've given you some ways to reduce
|
||||
their size to simplify them to make them a little bit more portable which is funny because I think
|
||||
that's what it used to stand for portable maybe it was paperless all along I forget either way
|
||||
that's pdf's that's ghost script it's pdf stapler hope it's helpful talk to you next time
|
||||
you've been listening to hecka public radio at hecka public radio dot org we are a community podcast
|
||||
network that releases shows every weekday Monday through Friday today's show like all our shows
|
||||
was contributed by an hbr listener like yourself if you ever thought of recording a podcast
|
||||
and click on our contributing to find out how easy it really is hecka public radio was found
|
||||
by the digital dog pound and the infonomican computer club and it's part of the binary revolution
|
||||
at bnw.com if you have comments on today's show please email the host directly leave a comment on
|
||||
the website or record a follow up episode yourself unless otherwise status today's show is released
|
||||
creative comments attribution share a light 3.0 license
|
||||
Reference in New Issue
Block a user