639 lines
29 KiB
Plaintext
639 lines
29 KiB
Plaintext
|
|
Episode: 2767
|
||
|
|
Title: HPR2767: Djvu and other paperless document formats
|
||
|
|
Source: https://hub.hackerpublicradio.org/ccdn.php?filename=/eps/hpr2767/hpr2767.mp3
|
||
|
|
Transcribed: 2025-10-19 16:34:38
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
This is HPR episode 2007-167 entitled BKVU and other paperless document formats.
|
||
|
|
It is posted by Klaatu and is about 32 minutes long and carrying a clean flag.
|
||
|
|
The summary is a tutorial on how to read and generate BKVU files.
|
||
|
|
This episode of HPR is brought to you by archive.org.
|
||
|
|
Support universal access to all knowledge by heading over to archive.org forward slash donate.
|
||
|
|
Hey everybody, you're listening to Agra Public Radio. This is Klaatu and this is an episode
|
||
|
|
about Deja Vu.
|
||
|
|
Hey everybody, you're listening to Agra Public Radio. This is Klaatu and this is an episode
|
||
|
|
about an interesting file format called Deja Vu.
|
||
|
|
With a brief mention of CBZ as well. As it happens, I was looking for a file format that would
|
||
|
|
allow me to take a series of, for instance, scanned images and dump them into a single file.
|
||
|
|
So I wanted to bundle these things up. I didn't want them just to be in a directory.
|
||
|
|
I wanted them in a single file that was sort of user-facing, if you will. In other words,
|
||
|
|
I didn't want to just put them into a document and then send the person the document full of images.
|
||
|
|
I wanted the document to be a collection of images.
|
||
|
|
The first thing that came to mind was the CBZ format, which is pretty popular among comic books.
|
||
|
|
I think CBZ actually stands for comic book archive. It's a great format and it's super easy to make.
|
||
|
|
So for instance, if you've never made one, this will be an easy demo.
|
||
|
|
So if you go into a folder with a bunch of images in it, which I happen to have,
|
||
|
|
because that's a project that I've been working on lately, and you do a zip,
|
||
|
|
and then you create, let's say, mybook.zip.
|
||
|
|
And I'm doing a .zip because the zip terminal command will complain if you're not going out to zip.
|
||
|
|
It doesn't understand that you might want to go to a different file extension.
|
||
|
|
So we'll rename that later and then we'll do .r and then we'll just do,
|
||
|
|
or actually I don't even think we have to do a .r, we'll just do asterisk.jpeg,
|
||
|
|
or whatever file format, whatever image format you're putting into this thing.
|
||
|
|
Hit return, let it bring all of those files into the zip archive, and once it is finished,
|
||
|
|
do a move, mvspace, mybook.zip, space, mybook.cbz.
|
||
|
|
And now you have a comic book archive that you can open with a comic book reading application,
|
||
|
|
or sometimes just any random document viewer on Linux.
|
||
|
|
Ocular, for example, opens up comic book archives quite easily in KDE.
|
||
|
|
So I thought of that at first, and then I made a couple of archives, and I realized that the archives
|
||
|
|
were almost always as large as the sum of their parts, which I know sounds obvious.
|
||
|
|
But I was, I think, I guess I was looking for something with a little bit more compression.
|
||
|
|
And to be fair, I could manually compress the files myself, and I did that a couple of times.
|
||
|
|
And that worked a little bit better for me.
|
||
|
|
But I was still missing certain features, specifically with regards to metadata.
|
||
|
|
So for instance, if you have, say, an ePub, you can have a table of contents.
|
||
|
|
You can make annotations in some, in certain clients, you can do annotations in the ePub, and so on.
|
||
|
|
Whereas the comic book archives, they tend not to really specialize in that.
|
||
|
|
They really are just a very convenient way of looking at a zip file in a way.
|
||
|
|
So I was looking for something with a little bit more features towards finding stuff
|
||
|
|
within a potentially large document with a lot of text in it.
|
||
|
|
But not necessarily searchable text, because my use case has been so far, at least,
|
||
|
|
to simply scan pages of whatever it might be.
|
||
|
|
It might be an old unix manual from AT&T, or it might be a comic book,
|
||
|
|
or it might be a historical document, I guess.
|
||
|
|
So a book published at the turn of the century, the previous century,
|
||
|
|
that's getting pretty old, and probably could use some preservation, that sort of thing.
|
||
|
|
I'm not going to sit there and transcribe all that text.
|
||
|
|
I'm not going to have the ASCII text of the content, but I do have the pages,
|
||
|
|
and I might want to refer to it.
|
||
|
|
So I might want to find chapter 5, for instance.
|
||
|
|
And I might not want to scroll through a bunch of thumbnails,
|
||
|
|
or just flip through a bunch of pages until I find it chapter 5.
|
||
|
|
Seems like since it's on a computer, I should have that kind of data available to me more easily,
|
||
|
|
than actually manually going through it and looking for it.
|
||
|
|
So I thought, well, why not use an EPUB?
|
||
|
|
That seems like a good, this seems like a great idea.
|
||
|
|
It's not a bloated format, it's quite nice.
|
||
|
|
It's a good format, I like it, I've never had a problem with it.
|
||
|
|
It is a little bit weird, though, to take a bunch of images,
|
||
|
|
and put it into an EPUB, and then get all the overhead of the EPUB.
|
||
|
|
When the EPUB itself, I think, generally, expects to contain text.
|
||
|
|
That's not to say that I couldn't abuse it, and just put images into it.
|
||
|
|
I'm sure it's probably been done, but I felt like that wasn't the best,
|
||
|
|
potentially the best use of the EPUB format,
|
||
|
|
and I didn't feel, even more importantly, that the clients,
|
||
|
|
the EPUB applications that I'm using to view the resulting documents,
|
||
|
|
will really expect me to have a bunch of images in it.
|
||
|
|
And this isn't, like, I don't want to offend someone, you know, it's not,
|
||
|
|
I don't care what the clients, the applications expect,
|
||
|
|
I don't care what EPUB is intended for.
|
||
|
|
I'm just saying, if I want to zoom in quickly on an image,
|
||
|
|
because I want to inspect some detail of some art,
|
||
|
|
then I don't want to have to work around the fact that this EPUB
|
||
|
|
really thought all I was ever going to ask for was a larger font,
|
||
|
|
and so I have to sort of hack around it to zoom in on an image easily and conveniently.
|
||
|
|
Well, the answer that I eventually fell upon was deja vu.
|
||
|
|
deja vu is a digital document format designed with compression included
|
||
|
|
and the ability to contain metadata.
|
||
|
|
In other words, it is exactly what I was looking for,
|
||
|
|
so I sat down to figure out how I could leverage deja vu
|
||
|
|
in my everyday computing life, and this is what I came up with.
|
||
|
|
So first of all, I will warn you that deja vu is highly dependent upon your use case.
|
||
|
|
It's a pretty flexible format in a sense,
|
||
|
|
because depending on what you need to put into it
|
||
|
|
will determine how you generate them.
|
||
|
|
But before we get into generating a deja vu file,
|
||
|
|
let's talk a little bit about reading a deja vu file
|
||
|
|
to make sure that it's a format that's well enough supported for us
|
||
|
|
to resort to on a daily basis.
|
||
|
|
Turns out it's actually quite well supported,
|
||
|
|
so if you're on a Linux desktop and why wouldn't you be,
|
||
|
|
then you have a deja vu reader probably installed already.
|
||
|
|
Certainly, the GNOME desktop ships with, I think, events.
|
||
|
|
Is there default document reader?
|
||
|
|
And it reads deja vu files.
|
||
|
|
KDE ships with Oculus, which I've already said reads CBZ files,
|
||
|
|
comic book files.
|
||
|
|
It also reads deja vu files.
|
||
|
|
If you don't have either of those available to you for some reason,
|
||
|
|
then you can go get DJ View, that is DJ and VIEW.
|
||
|
|
It's part of a package called deja libre,
|
||
|
|
which is the tool set for all the deja vu stuff
|
||
|
|
that you'll be doing if you start using that as a format.
|
||
|
|
Should be in your repository, and if not, it's on source forage.
|
||
|
|
DJ View should probably also be in your repository,
|
||
|
|
maybe not, again, it's on source forage.
|
||
|
|
It is cross-platform, so if you're not on Linux,
|
||
|
|
then this might be perfect for you.
|
||
|
|
It's easy to compile.
|
||
|
|
It does require the cute framework.
|
||
|
|
Either cute for or cute five.
|
||
|
|
I compiled it on Slackware with cute five with no problems.
|
||
|
|
And it works quite well.
|
||
|
|
Now, if none of those things are available to you for whatever reason,
|
||
|
|
maybe you're on a computer that just doesn't permit
|
||
|
|
that level of application management for you for whatever reason,
|
||
|
|
then there are options for your web browser as well.
|
||
|
|
There is, well, there are online sites
|
||
|
|
where you can upload documents and look at them.
|
||
|
|
There's a JavaScript library called deja vu.js
|
||
|
|
that you can check out, deja vu.js.org.
|
||
|
|
And then, finally, there's a Firefox browser plugin
|
||
|
|
or rather, yeah, plugin and add-on extension,
|
||
|
|
whatever they call them these days,
|
||
|
|
called deja vu.js, which is a local copy of deja vu.js
|
||
|
|
so that it runs as a plugin in your browser.
|
||
|
|
You just, you click on the icon.
|
||
|
|
It presents you with an empty tab.
|
||
|
|
You drag your deja vu file onto the tab.
|
||
|
|
Then it opens it up in your browser.
|
||
|
|
It doesn't upload it to the internet or anything.
|
||
|
|
It's local. It's just using your browser as the engine.
|
||
|
|
So that's pretty easy as well.
|
||
|
|
On mobile, there are document viewers for your mobile phone as well.
|
||
|
|
I don't really know anything about the iPhone platform
|
||
|
|
so I can't really even, I can't begin to guess
|
||
|
|
what might be available for it.
|
||
|
|
But certainly on Android, from FDroid,
|
||
|
|
even, you can get an application called document viewer,
|
||
|
|
which is a viewer from many document formats.
|
||
|
|
And it supports deja vu, it supports ePub,
|
||
|
|
it supports comic book, the CBZ,
|
||
|
|
fiction book, FB2, and a couple of others.
|
||
|
|
In other words, there are lots of options
|
||
|
|
for reading deja vu files.
|
||
|
|
And no matter what kind of device you're on,
|
||
|
|
the chances are really high
|
||
|
|
that there is a deja vu viewer for you.
|
||
|
|
You should go get a deja vu file
|
||
|
|
and test some of these things out to see if you like it.
|
||
|
|
I think you probably will.
|
||
|
|
If you need a good demo deja vu file,
|
||
|
|
you can go to deja vu.org,
|
||
|
|
go to the Downloads and Resources section.
|
||
|
|
And at the bottom of that page,
|
||
|
|
they have some white papers and tech documents.
|
||
|
|
And any of those you can download
|
||
|
|
and look at, they're all in the deja vu format.
|
||
|
|
I thought it was kind of cool of the project actually
|
||
|
|
to put their reference document
|
||
|
|
and their specification document in deja vu.
|
||
|
|
So you have to have deja vu
|
||
|
|
in order to read about deja vu, it's quite slick.
|
||
|
|
So there you go, that's the consumption side of things.
|
||
|
|
It's pretty easy, it's a lot more available
|
||
|
|
than you might first have thought.
|
||
|
|
If you're not really aware of deja vu,
|
||
|
|
these probably kind of just passed you by unnoticed.
|
||
|
|
But they are there, they're there,
|
||
|
|
and they work quite well,
|
||
|
|
and they're highly compatible
|
||
|
|
with lots of different platforms and devices.
|
||
|
|
So do check them out.
|
||
|
|
So now let's talk about how to create a deja vu file,
|
||
|
|
because certainly if you're going to use this,
|
||
|
|
and certainly the reason I'm using deja vu
|
||
|
|
is because you want to put documents into that format.
|
||
|
|
Now I'll admit, this can be a little bit tricky
|
||
|
|
in some ways, and by that I mean
|
||
|
|
that the process isn't actually difficult,
|
||
|
|
but there are certain conveniences
|
||
|
|
that just don't exist.
|
||
|
|
For instance, if you're trying to quickly export something
|
||
|
|
from, I don't know, Google Docs or something,
|
||
|
|
you're not gonna go up to the export menu
|
||
|
|
and find a deja vu format,
|
||
|
|
at least I don't think you are, I don't know,
|
||
|
|
but I'm assuming you're not gonna go to deja vu
|
||
|
|
or Google Docs and find an export format of deja vu.
|
||
|
|
You'll probably find other formats like ODT,
|
||
|
|
I know that's in there, PDF, that's definitely in there,
|
||
|
|
and maybe some other stuff,
|
||
|
|
but deja vu isn't gonna just,
|
||
|
|
you're not going to just inherit the capability
|
||
|
|
to export as a deja vu, generally speaking.
|
||
|
|
It's gonna depend on the application obviously,
|
||
|
|
but I'm just saying, in the real world out there,
|
||
|
|
obviously a lot of us had never even heard of deja vu
|
||
|
|
before this episode, so it's kind of self-evident
|
||
|
|
that it's not just, it's not gonna fall into your lap.
|
||
|
|
You will have to decide, I'm going to be a deja vu user,
|
||
|
|
and then you have to go get the tools
|
||
|
|
to generate deja vu,
|
||
|
|
and then you may have to work around some workflow
|
||
|
|
that you have already established
|
||
|
|
to create deja vu files.
|
||
|
|
Luckily, I have some answers for that,
|
||
|
|
but it's still, it's gonna be a little bit different, right?
|
||
|
|
There's not gonna be very rarely
|
||
|
|
are you going to find a file menu
|
||
|
|
where you can go to file, print, print to deja vu.
|
||
|
|
That just doesn't exist,
|
||
|
|
whereas if you go to file, print to PDF, that exists.
|
||
|
|
The difference is, of course,
|
||
|
|
that PDF is a horrible format,
|
||
|
|
and deja vu is actually quite nice.
|
||
|
|
Let's look at it, shall we?
|
||
|
|
So the deja vu toolset, as I've said,
|
||
|
|
is deja vu libre is the,
|
||
|
|
that's the open source implementation
|
||
|
|
of the deja vu spec.
|
||
|
|
deja vu libre is, as its name suggests,
|
||
|
|
free and open source software.
|
||
|
|
So you can grab it and use it,
|
||
|
|
and it is completely open.
|
||
|
|
You can learn all about everything that you need to know
|
||
|
|
about deja vu from both deja vu.org and deja vu libre.
|
||
|
|
There's some really good documentation
|
||
|
|
in the deja vu libre source package,
|
||
|
|
or maybe it's the deja vu source package.
|
||
|
|
One of those two, it has some good documentation
|
||
|
|
that kind of, it gives you an overview
|
||
|
|
of all the different commands
|
||
|
|
that come with deja vu libre.
|
||
|
|
And I have to say the commands, there are many.
|
||
|
|
And that is, again, because the way that you want
|
||
|
|
to build a deja vu file will control
|
||
|
|
or will dictate rather how you do that,
|
||
|
|
what tools you use for the job.
|
||
|
|
I'm not gonna go through all of them.
|
||
|
|
I will go through some of the major ones.
|
||
|
|
First of all, we need a series of documents
|
||
|
|
that we want to convert to deja vu.
|
||
|
|
Now deja vu is interesting.
|
||
|
|
Now remember, I'm saying that it's a document format
|
||
|
|
into which you can put lots of images, for instance.
|
||
|
|
And then you'll have this file that seems like a book,
|
||
|
|
and so it'll be like a paperless book.
|
||
|
|
And that's great, but that's only one use of deja vu.
|
||
|
|
deja vu itself is perfectly happy to be a single file,
|
||
|
|
like a single thing, a single entity.
|
||
|
|
So for instance, if I have any random photo
|
||
|
|
from a phone or something, then I can convert that.
|
||
|
|
I'm gonna go over to my pictures, graphics,
|
||
|
|
whatever it's called, graphic folder here.
|
||
|
|
And yeah, here's a TIFF.
|
||
|
|
So I'm gonna go, I'm gonna, so a TIFF is a file,
|
||
|
|
it's a pretty high quality, or potentially high quality,
|
||
|
|
graphic file, and it is not necessarily,
|
||
|
|
but generally speaking, it's like a color document,
|
||
|
|
probably fairly high detail.
|
||
|
|
So for that, we would want to use
|
||
|
|
the sort of the high end converter for deja vu,
|
||
|
|
which is called C44.
|
||
|
|
If I type in C44-H of space-H for help,
|
||
|
|
then I get a little bit of a blur about it.
|
||
|
|
It says it's image compression utility
|
||
|
|
using IW44 wavelets.
|
||
|
|
Now I don't know what that means.
|
||
|
|
There are a couple of different options here.
|
||
|
|
The only ones I care about is the dash DPI,
|
||
|
|
because it sets the image resolution.
|
||
|
|
So I'm gonna do C44, and C44 again,
|
||
|
|
is included with the deja vu libre package
|
||
|
|
that you presumably downloaded and installed,
|
||
|
|
or got from your repository on Slackware,
|
||
|
|
it's already installed.
|
||
|
|
So C44, and then it says in the help,
|
||
|
|
it says to do options, okay, so that's dash DPI,
|
||
|
|
and I'm gonna keep this at, let's say 300 DPI,
|
||
|
|
and then it says to give a PNM or JPEG file.
|
||
|
|
Okay, so it only accepts PNM or JPEG.
|
||
|
|
This is a quirk about the toolset
|
||
|
|
that I never really understood or got used to,
|
||
|
|
but apparently for higher quality documents,
|
||
|
|
the PNM or JPEG formats are supported,
|
||
|
|
but for lower quality documents,
|
||
|
|
the formats traditionally associated
|
||
|
|
with high quality graphics are also supported, so TIFF.
|
||
|
|
So for this, I cannot use a TIFF file,
|
||
|
|
so I'm gonna zero in on a different,
|
||
|
|
well actually I don't have a different one,
|
||
|
|
so I'm just gonna convert this thing to a JPEG,
|
||
|
|
I'll convert it actually to a PNM,
|
||
|
|
so I'm gonna do a convert, that's an image magic command.
|
||
|
|
If you don't have image magic or graphics magic installed,
|
||
|
|
just install that, and then you can do convert,
|
||
|
|
or GM convert if you got graphics magic.
|
||
|
|
So convert in MK1, that's the name of this image,
|
||
|
|
I should probably look at this image
|
||
|
|
to see what on earth it is.
|
||
|
|
Okay, so it is a, actually it is a really basic black
|
||
|
|
and white logo, that's funny.
|
||
|
|
Okay, so I'm gonna actually convert a different one,
|
||
|
|
this penguin picture that I have.
|
||
|
|
So I'm gonna do a convert penguin dot PNG
|
||
|
|
to penguin dot, what am I saying, oh PNM, right?
|
||
|
|
Okay, and that happened, that's done, that was very quick.
|
||
|
|
And now I'm gonna use the C44 tool,
|
||
|
|
C44-DPI 300, and I'm gonna feed it,
|
||
|
|
the penguin dot PNM file, and then it tells me
|
||
|
|
to define a deja vu file into which this,
|
||
|
|
the conversion should be placed,
|
||
|
|
so I'll just do penguin dot, DJVU is the file extension
|
||
|
|
by default, and that's finished, that's done.
|
||
|
|
So now I'll open up a graphical file manager here,
|
||
|
|
just for testing to see what happens
|
||
|
|
when I click on things, and it looks like I've got,
|
||
|
|
actually I'm gonna look at file sizes as well,
|
||
|
|
so the source of this penguin was 43.8 kilobytes.
|
||
|
|
When I converted it up to the PNM format,
|
||
|
|
I got a 1.8 megabyte file,
|
||
|
|
so that's obviously going sort of in the wrong direction,
|
||
|
|
right, if compression is one of the things
|
||
|
|
that we care about going from 43 kilobytes to 1.8 megabyte,
|
||
|
|
not a good thing, but wait a minute,
|
||
|
|
the deja vu version, looking at that,
|
||
|
|
is 18.1 kilobytes, that's quite a lot smaller
|
||
|
|
than the original PNG, 43.8, so I'll click on that
|
||
|
|
and try and take a look at it, it looks good,
|
||
|
|
looks really nice, no problems really,
|
||
|
|
no complaints about this, except a couple of different things,
|
||
|
|
and that is that the background is black,
|
||
|
|
and that's because I brought it in from a PNG,
|
||
|
|
so I'm gonna go back up to my convert command,
|
||
|
|
and rather than doing the convert from PNG to PNM,
|
||
|
|
I'm gonna, or not rather, but in addition,
|
||
|
|
I'm gonna add background, quote white, and then flatten.
|
||
|
|
So that will take any kind of alpha channel
|
||
|
|
that I inherit from the PNG, it will cause it to be,
|
||
|
|
it will cause the background behind that alpha channel,
|
||
|
|
if you think of it that way, it to be white,
|
||
|
|
and then dash flatten, flatten the image,
|
||
|
|
so that there is no alpha channel, so now we'll do that,
|
||
|
|
and then we'll do the c44-dpi300 penguin PNM,
|
||
|
|
actually you know what I'm gonna even do,
|
||
|
|
I'm gonna drop the dpi300 and let the c44 thing
|
||
|
|
go with its defaults.
|
||
|
|
Okay, so now we have a deja vu file,
|
||
|
|
which when I open is, yeah, it has white in the background,
|
||
|
|
so that's good, that's a little bit better.
|
||
|
|
So that works, now of course that's only a single file,
|
||
|
|
that's one single image, and it's not really a document,
|
||
|
|
it's just an image, but we can create digital books,
|
||
|
|
we can create sort of e-books out of a deja vu
|
||
|
|
by combining several deja vues into one bigger deja vu.
|
||
|
|
Now for that, we'll need another deja vu file,
|
||
|
|
and I do have this tiff, and it is a black and white logo
|
||
|
|
completely coincidentally.
|
||
|
|
It turns out that if you have a low quality image,
|
||
|
|
or I should say a simple image,
|
||
|
|
it doesn't have to be a low quality,
|
||
|
|
but it has to be, it is expected to be simple,
|
||
|
|
and in fact it is expected to be by,
|
||
|
|
what do they say, bytonal I think,
|
||
|
|
and the tools that deja vu Libre provides for that
|
||
|
|
is CJB2, I realize that neither of these commands,
|
||
|
|
C44 or CJB2 make any sense,
|
||
|
|
or have anything apparently to do with deja vu,
|
||
|
|
and that doesn't annoy me,
|
||
|
|
but it's just one of those things
|
||
|
|
that you kind of remember after a while,
|
||
|
|
or you put it into a script,
|
||
|
|
and you never have to remember it yourself.
|
||
|
|
So CJB2-H gives me a couple of options,
|
||
|
|
and again, I can specify my DPI.
|
||
|
|
It says it defaults to 300.
|
||
|
|
That seems reasonable to me.
|
||
|
|
There's some cleaning up that you can do.
|
||
|
|
A dash clean apparently cleans up the image
|
||
|
|
by removing small fly specs.
|
||
|
|
You can make it lossy, and you can set the loss level.
|
||
|
|
I'm not gonna do anything of that fancy.
|
||
|
|
I'm just gonna give it an input file,
|
||
|
|
and it says that the input that it accepts
|
||
|
|
is either a PBM or a TIFF.
|
||
|
|
Okay, so CJB2 in MK1.TIFF,
|
||
|
|
and then I'll do in MK1.DJZU,
|
||
|
|
and that converted it pretty quickly as well.
|
||
|
|
So once again, the TIFF, the source TIFF was 182 kilobytes.
|
||
|
|
The deja vu version of that is 3.8,
|
||
|
|
so quite the difference.
|
||
|
|
I'll open it up here in Ocula, it looks fine,
|
||
|
|
it looks like a very accurate representation
|
||
|
|
of the simple graphic, and that's good.
|
||
|
|
Okay, so now if we wanna make a deja vu file
|
||
|
|
that contains both of these images as a page one and a page two,
|
||
|
|
we can do that with the command DJVM,
|
||
|
|
and I'll just do a dash H again,
|
||
|
|
and it spells it out for me.
|
||
|
|
So it says to compose a multi-page document,
|
||
|
|
you can do DJVM dash C for create,
|
||
|
|
and then the file, the destination file.
|
||
|
|
So I'm just gonna call this output.DJVU,
|
||
|
|
and then finally you end with all of the pages
|
||
|
|
that you want to put into this document.
|
||
|
|
So alphabetically, it looks like in MK should come first,
|
||
|
|
so I'm gonna just, I'm gonna do something crazy here,
|
||
|
|
and set Penguin first, and then in MK1, deja vu.
|
||
|
|
So I've got output.dajavu is my target,
|
||
|
|
and then penguin.dajavu, and then in MK1.dajavu.
|
||
|
|
Return, and it produces an output file for me,
|
||
|
|
and just because it's always fascinating to look at file sizes,
|
||
|
|
it does look like this is about 25.6 kilobytes.
|
||
|
|
So again, just keeping track of these things,
|
||
|
|
I've got this penguin that was 43 kilobytes,
|
||
|
|
and I've got this logo that was 182 kilobytes
|
||
|
|
in one document at 25 kilobytes.
|
||
|
|
So literally both of them combined in a deja vu file
|
||
|
|
is smaller than either of them separate, pretty cool.
|
||
|
|
And it does look as if though the images are in the order
|
||
|
|
that I defined, so the penguin comes first,
|
||
|
|
and then the in MK1 comes second.
|
||
|
|
Now if I had just done like a wild card,
|
||
|
|
deja vu files in the directory
|
||
|
|
would have just done in MK1, and then penguin
|
||
|
|
because that's alphabetical.
|
||
|
|
But I did wanna demonstrate that you could set that,
|
||
|
|
you can manually set the order of the pages in your command.
|
||
|
|
Okay, so now we've got this document,
|
||
|
|
and it's the self-contained document,
|
||
|
|
you can open it up in ocular, or in DJ view,
|
||
|
|
or in deja vu.js, and read it, and look at it,
|
||
|
|
and it's great, but how can you find stuff in that document?
|
||
|
|
Well, it turns out that doing metadata is pretty easy,
|
||
|
|
and we can create a bookmarks file for this.
|
||
|
|
I'm just gonna make one called book.marks in the same folder.
|
||
|
|
It's just a text file, and you open it up with a parentheses,
|
||
|
|
with an opening parentheses, or a bracket, whatever you call it.
|
||
|
|
It's a circle, half circle, so that thing.
|
||
|
|
And then book marks, that's the word bookmarks.
|
||
|
|
Next line, I'm gonna do another parentheses,
|
||
|
|
and I have not closed the parentheses.
|
||
|
|
So we've got an open parentheses, bookmarks,
|
||
|
|
and then next line, open another parentheses,
|
||
|
|
and I'm gonna put in, let's do the word penguin.
|
||
|
|
Quote penguin, closed quote space, quote hash one,
|
||
|
|
closed quote, closed parentheses once.
|
||
|
|
Okay, next line, open parentheses, quote,
|
||
|
|
what was the other one, oh yeah, a logo,
|
||
|
|
closed quote space, quote hash two,
|
||
|
|
closed quote, closed parentheses,
|
||
|
|
and then finally closing the main parentheses,
|
||
|
|
the big parentheses.
|
||
|
|
So it's bookmarks, penguin logo,
|
||
|
|
or it's bookmarks, and then penguin,
|
||
|
|
and then the page number, or the deja vu page number,
|
||
|
|
and then the next line, the next thing that you want to locate,
|
||
|
|
and then the deja vu page number,
|
||
|
|
and then you close out the whole parentheses.
|
||
|
|
The parentheses delineate the level of everything.
|
||
|
|
So bookmarks is your main, that's the main entity, right?
|
||
|
|
So you don't close that bookmark,
|
||
|
|
you don't close that parentheses
|
||
|
|
until the very end of your bookmarks.
|
||
|
|
That makes sense.
|
||
|
|
Then with each line itself is a new entry,
|
||
|
|
and it needs to have a human readable title,
|
||
|
|
and then the reference to the deja vu page number.
|
||
|
|
Now if you don't know what that is off hand,
|
||
|
|
you can just open up the deja vu file in a viewer and look,
|
||
|
|
because you're doing this separate.
|
||
|
|
You're doing this in a text file.
|
||
|
|
So you look at that and you say, oh, the penguin is on page one.
|
||
|
|
OK, so quote hash one, closed quote, closed parentheses.
|
||
|
|
Now if this was a more complex document,
|
||
|
|
and we wanted sub headings, then we wouldn't,
|
||
|
|
then just don't close penguin, and have like logo page two,
|
||
|
|
and then close the parentheses, and then close the bookmark.
|
||
|
|
So if you leave a parentheses open,
|
||
|
|
then everything below it, or everything within that,
|
||
|
|
becomes sub headings, which is handy,
|
||
|
|
because if you have a chapter and then a section,
|
||
|
|
and then maybe a subsection, and then you close, close, close,
|
||
|
|
and then you go back to a chapter level.
|
||
|
|
So that's your level setting.
|
||
|
|
OK, once you have your bookmarks defined in a text file,
|
||
|
|
you use a command called deja used,
|
||
|
|
or maybe it's, or it knows, deja of used.
|
||
|
|
Maybe it's deja vu said, I don't know.
|
||
|
|
It's DJVUSED, and then dash e, and then quote set dash outline,
|
||
|
|
and then the name of the bookmarks file.
|
||
|
|
So that's book.marks, close quote, and then dash s for save.
|
||
|
|
If I don't have a dash s, it's a dry run.
|
||
|
|
It'll apply the outline.
|
||
|
|
It'll sort of validate the outline, really,
|
||
|
|
is what it's doing.
|
||
|
|
And then it will not save what is just done,
|
||
|
|
and your output dot deja vu will not have bookmarks.
|
||
|
|
So dash s means save.
|
||
|
|
So dash s output dot deja vu, because that's
|
||
|
|
the name of the file that I gave it, output dot deja vu.
|
||
|
|
All right, so now let's take a look.
|
||
|
|
It doesn't take any time to apply an outline.
|
||
|
|
That's really a fast one.
|
||
|
|
There we go.
|
||
|
|
And now, yeah, it looks like in Ocula,
|
||
|
|
I've got a table of contents on the left with penguin
|
||
|
|
and a logo, logo being a child of penguin.
|
||
|
|
I left the parentheses open so I can click on the little
|
||
|
|
disclosure symbol there and get to see logo.
|
||
|
|
And it's got the page number that it corresponds to
|
||
|
|
over on the right.
|
||
|
|
Now, that's just been a very basic example.
|
||
|
|
In real life, I have found that the file size savings
|
||
|
|
amount has varied pretty wildly.
|
||
|
|
It really depends on where your images are coming from,
|
||
|
|
what you're converting from, and how much you're
|
||
|
|
willing to compress them in the deja vu document.
|
||
|
|
I would say, typically, I see about maybe a 20% savings.
|
||
|
|
That's what I would guess.
|
||
|
|
It's just a little bit shaved off the top.
|
||
|
|
It starts to add up the more you do it.
|
||
|
|
But I wouldn't expect, for instance, to take a PDF that
|
||
|
|
you downloaded from somewhere, and then you convert it
|
||
|
|
to deja vu, I wouldn't expect it to be 50% smaller,
|
||
|
|
or 60% smaller, anything like that.
|
||
|
|
It would more likely be either the same size or 10% or 20%.
|
||
|
|
The benefit, possibly, for you, is that deja vu is a sane
|
||
|
|
and open format.
|
||
|
|
It's fun to manipulate, and it's easy to find information on
|
||
|
|
because all of its specs and information are open and online.
|
||
|
|
There aren't really any hidden glitches disguised as features
|
||
|
|
or features disguised as glitches in deja vu.
|
||
|
|
Not at least in the way that there are in PDFs, which
|
||
|
|
sometimes just are so confusing that even when you figure
|
||
|
|
something out, you can't really figure out
|
||
|
|
if it should even be opening.
|
||
|
|
Like, should it still be working?
|
||
|
|
Shouldn't I have just broken this?
|
||
|
|
So yeah, I'm really enjoying deja vu.
|
||
|
|
Now, you can also embed text, and I've never done that yet.
|
||
|
|
I have not had the occasion to, I've not converted a document,
|
||
|
|
for instance, to which I have embedded text,
|
||
|
|
like the way that PDFs have.
|
||
|
|
Those are not documents that I have bothered converting.
|
||
|
|
Or if I have, it's been for quick reference on the go.
|
||
|
|
And it's not one of those things where I'm thinking,
|
||
|
|
oh, I need to select this text.
|
||
|
|
I need to search for this exact string in the document.
|
||
|
|
Obviously, for that sort of thing, I would
|
||
|
|
want to have the text there, but so far for the way
|
||
|
|
that I'm using it, the text just isn't available
|
||
|
|
and the things that I'm converting to deja vu.
|
||
|
|
If I wanted text to be embedded in the deja vu,
|
||
|
|
I would have to transcribe it, looking at the screen,
|
||
|
|
typing it all out, and that would be silly.
|
||
|
|
So I've not gone in that direction yet.
|
||
|
|
I'm not saying I never will.
|
||
|
|
I may well do that.
|
||
|
|
And maybe I'll look into easy and quick ways
|
||
|
|
to take a PDF with embedded text and convert it
|
||
|
|
to deja vu, while retaining the embedded text.
|
||
|
|
Who knows, maybe, I've done crazier things.
|
||
|
|
So if I ever do that, I'm sure you will hear more
|
||
|
|
about it on hacker public radio.
|
||
|
|
But until such a time, this has been an introduction
|
||
|
|
to deja vu.
|
||
|
|
Hopefully it's been informative to you.
|
||
|
|
Maybe it's even useful.
|
||
|
|
I suggest you try it out.
|
||
|
|
If you've never used deja vu, give it a go.
|
||
|
|
Make a document or get one from online.
|
||
|
|
See what it's like.
|
||
|
|
It's actually quite nice.
|
||
|
|
I think you'll probably like it.
|
||
|
|
And if you intend to use it seriously,
|
||
|
|
then sit down and kind of think about your workflow too.
|
||
|
|
Because I know that for some people,
|
||
|
|
PDF is a very easy, well, for most everyone.
|
||
|
|
PDF is a very easy output target.
|
||
|
|
Because as I said earlier, it's probably
|
||
|
|
in your file menu.
|
||
|
|
It's like two clicks away.
|
||
|
|
So if that stands in your way of using deja vu,
|
||
|
|
then sit down and kind of think of what ways
|
||
|
|
you might be able to pull a couple of commands together
|
||
|
|
to make that process easier.
|
||
|
|
And frankly, I don't know that it is
|
||
|
|
a great format.
|
||
|
|
It might not be your first format for a paper
|
||
|
|
that you've just written in LibreOffice.
|
||
|
|
Would it make sense to go out to deja vu?
|
||
|
|
I mean, arguably it would, but arguably not at all.
|
||
|
|
Because you're really, you would just
|
||
|
|
be basically generating a raster file
|
||
|
|
of a representation of text and then embedding text.
|
||
|
|
And that seems really odd.
|
||
|
|
So there are probably better formats
|
||
|
|
if you're just typing stuff up and you
|
||
|
|
want something on the go.
|
||
|
|
Maybe EPUB is the best answer for you.
|
||
|
|
But if you're doing archival work or your scanning documents,
|
||
|
|
I mean, archival makes it sound fancy.
|
||
|
|
If you're scanning stuff in, because you like them,
|
||
|
|
but you want to throw the physical copy of it out.
|
||
|
|
Or maybe you like it and you see that the physical copy
|
||
|
|
is decaying, so you want to preserve it,
|
||
|
|
scan it in, throw it into deja vu file,
|
||
|
|
and see how it treats you.
|
||
|
|
Thank you for listening.
|
||
|
|
I will talk to you next time.
|
||
|
|
You've been listening to HECCA Public Radio at HECCA Public Radio.org.
|
||
|
|
We are a community podcast network that
|
||
|
|
releases shows every weekday Monday through Friday.
|
||
|
|
Today's show, like all our shows, was contributed
|
||
|
|
by an HBR listener like yourself.
|
||
|
|
If you ever thought of recording a podcast,
|
||
|
|
then click on our contributing to find out
|
||
|
|
how easy it really is.
|
||
|
|
HECCA Public Radio was founded by the Digital Dog
|
||
|
|
Pound and the Infonomicon Computer Club.
|
||
|
|
And it's part of the binary revolution at binwreff.com.
|
||
|
|
If you have comments on today's show, please email the host directly.
|
||
|
|
Leave a comment on the website or record
|
||
|
|
a follow-up episode yourself.
|
||
|
|
Unless otherwise status, today's show is released
|
||
|
|
under Creative Commons, Attribution, ShareLife, 3.0 license.
|