- MCP server with stdio transport for local use - Search episodes, transcripts, hosts, and series - 4,511 episodes with metadata and transcripts - Data loader with in-memory JSON storage 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
266 lines
24 KiB
Plaintext
266 lines
24 KiB
Plaintext
Episode: 3384
|
|
Title: HPR3384: Page Numbers in EPUB eBook Files
|
|
Source: https://hub.hackerpublicradio.org/ccdn.php?filename=/eps/hpr3384/hpr3384.mp3
|
|
Transcribed: 2025-10-24 22:28:54
|
|
|
|
---
|
|
|
|
This is Hacker Public Radio Episode 3384 for Thursday, the 22nd of July 2021.
|
|
Today's show is entitled, Page Numbers in a pub ebuk files.
|
|
It is hosted by John Culp and is about 28 minutes long and carries a clean flag.
|
|
The summary is, response to HPR 3367i, describe how to specify page numbers in an ebuk ebuk.
|
|
This episode of HPR is brought to you by an honesthost.com.
|
|
Get 15% discount on all shared hosting with the offer code HPR15, that's HPR15.
|
|
Better web hosting that's honest and fair at an honesthost.com.
|
|
Hey everybody, this is John Culp in Lafayette, Louisiana.
|
|
Yes, I'm still alive.
|
|
It's been a long time since I recorded an episode.
|
|
Although I do think I have one in calendar year 2021.
|
|
It seems like I recorded one right at the beginning of the year about something or other.
|
|
I don't remember what it was, but anyway, I've been like the last time.
|
|
I've been away from HPR for a pretty good while not only as a contributor, but also as a listener,
|
|
sadly. I just haven't had time to listen every day the way I probably should.
|
|
But I did check back in a couple days ago and saw that there was an episode
|
|
by Dave Morris and Andrew Conway where they talked about e-books.
|
|
And of course, I'm a degenerate for e-books and e-book readers and formatting.
|
|
And I love all that stuff.
|
|
I totally geek out on it, so I had to listen to that one anyway.
|
|
And they brought up some things in there that I thought it would be worth exploring.
|
|
And then based on what I found, I decided to go ahead and record a response episode.
|
|
So what I'm doing here is responding to HPR 3367, which I definitely encourage everyone to go
|
|
listen to if they haven't already. And in that one, Andrew was talking about his process for
|
|
creating e-books. And Dave apparently is going to be having a follow-up episode where he talks
|
|
about his process. And so I'm always interested in how people create these things, the tools they
|
|
use and all that kind of stuff. So first of all, thanks guys for the name check on me.
|
|
Andrew, I think, said that he was somewhat inspired by the episode HPR 1512 that I did a few years
|
|
ago about creating a digital edition of an old counterpoint textbook. And so thanks for that.
|
|
I haven't done that kind of work in a while. Most of the e-book work I do lately is just
|
|
fixing whatever e-books I purchase to read or e-books that I get from Project Gutenberg for free
|
|
or something like that. I like to do a little bit of tweaking and reformatting to suit my
|
|
preferences. And also I have to convert them to e-pub format. If they're in kindle format,
|
|
I have to convert them to e-pub before I can put them on my wonderful Kobo Aura 1 e-book reader.
|
|
But most of the time it's just doing a couple of tweaks to the CSS. I haven't really gotten into
|
|
the nitty gritty of the e-book internals in a while. So it was kind of fun for this topic to come
|
|
up and have an excuse to get back in there and poke around. The issue that I really wanted to focus
|
|
on was that of page numbers. Now if I recall right, it came up when Andrew and Dave were discussing
|
|
the notion of an index. An index in an e-book is something... I'm not sure I've really seen it before.
|
|
I haven't seen a lot of academic titles in... Actually I should look. I have a couple of like
|
|
leadership books and stuff like that that I've read. I might check it at the very back and see
|
|
if they've got indices in them. But I... they definitely have bibliographies. But I don't know that
|
|
they have indices the way a paper book would have. But in academic books and technical books,
|
|
it's super important to have a good index in there to make the book much more useful. And of course
|
|
in e-books, the index becomes a little bit less necessary than in a paper book. Because of course
|
|
you can quickly search through the text of an e-book and find what you need. You can find all the
|
|
instances of a person's name or a topic that you're looking for or a term or something like that.
|
|
And it's not hard at all. But what you can't do with an e-book that has no index is you can't
|
|
just browse the index, which is for an academic, that's one of the things they do. The first time I
|
|
get a new book, I'll kind of flip right back to the bibliography to see what sources they used
|
|
and also check through the index to see what topics they cover. And this might sound kind of
|
|
weird to people who just read books for pleasure. But I assure you in scientific fields and academia
|
|
and stuff, it's perfectly normal to jump right to the index and start looking around. And so I can
|
|
certainly understand Andrew's concern in making an index for the e-book that has a certain
|
|
functionality like going, you want to be able to have your list of search terms and be able to
|
|
tap on something and have it go right there in the book and having good page numbers that refer
|
|
specifically to the places in the original paper version would be kind of important. And also
|
|
in academia, it's important when we are doing research ourselves, writing papers and books and
|
|
we always have to cite our sources. And part of that is not only saying what book you got
|
|
it from, but what page it was on in that book. And with e-books, it kind of throws this into confusion.
|
|
And so of course it'd be wonderful if there were a predictable, reliable way to have the same
|
|
pagination in an e-book that you do in a paper book. And by that I don't mean that I want every
|
|
page to look the same. I mean to me, it's critical that an e-book be able to flow
|
|
to fit the screen that you're looking at. So when I'm reading an e-book on my phone, which has
|
|
what a six inch screen or something, or on my Kobo or a one that has about a seven or eight inch
|
|
screen or my iPad with a ten inch screen, or my shiny new Kobo mini with a five inch screen,
|
|
that book should reflow to fit all those screens. And I should be able to reliably change the
|
|
font size and have it still fill up the screen just fine and not end up, you know, reading
|
|
really tiny words to try to fit all the, you know, what I don't want is an image of every page,
|
|
right? The text needs to flow. But we also, in academia, we kind of need to know where the page
|
|
numbers fall. So all of this is to say, I perfectly understand the issue that Andrew was talking
|
|
about and the day was talking about. And it's something that has concerned me a little bit,
|
|
but I've never really tried to follow up with it. Incidentally, Dave mentioned that his son
|
|
told him about indexing in using Latak, and I can confirm it's very easy to make an index in
|
|
Latak. But of course, Latak is something that's normally meant to end up with a print product,
|
|
or I mean a PDF, but to me, a PDF is barely better than paper because it's completely inflexible.
|
|
It doesn't have the reflowing capability that a true ebook format does. But it is, it's very
|
|
easy to make an index. And I remember because several years ago, I made a cookbook for my wife
|
|
of all of her favorite recipes so that she'd have them in one place in a book and I actually printed
|
|
it out. But I've made an index for it so that like in the index, it has the names of all of the
|
|
recipes, but also names of certain kinds of ingredients so that you could look at an ingredient and see
|
|
which recipe is it shows up in and that kind of thing. But anyway, you just kind of, whenever you
|
|
have a word in the in the text that you want to appear in the index, you just tag it with a certain
|
|
thing and then you run an indexing command and it voila generates it for you. It's wonderful.
|
|
No such thing exists for e-pubs.
|
|
So after listening to their episode, I decided I wanted to try to figure this out because I thought
|
|
I remembered hearing at some point or reading somewhere that there was as part of the e-pub
|
|
three specification, there was support for page numbers. In other words, for publishers to put
|
|
in there the actual page numbers that correspond to the paper versions of their books. And so I did
|
|
some reading and found that yes, that's true. And there was some limited support under e-pub too,
|
|
but I couldn't make it work under e-pub too. And I mean to be honest, I didn't really
|
|
know much about the difference between e-pub two and e-pub three, but essentially all of the e-pub
|
|
files that I've got in my library and there are thousands are in e-pub two format
|
|
unless I'm unaware of it. But the main difference is in the navigation file.
|
|
But there's a way to convert your e-pub to book into an e-pub three and that's the first step in
|
|
putting page numbers into your e-book. And so I did that to one of my, there's a reading that I
|
|
like to have my music history students do like a 19th century German critic writing about the
|
|
music of Beethoven. And it's only about six pages long. And so I decided, well I'm going to start
|
|
with a short reading like this that comes from an academic book where I kind of do want them to
|
|
have the page numbers handy. And these are page, you know, it's only six pages long, but the page
|
|
numbers in the paper copy are like 776 to 782 or something like that. And so of course when you open
|
|
up in an e-book reader it's going to display page numbers like one two three four five and six
|
|
instead of the page numbers that are actually in the 700s. So I thought that would be a pretty good
|
|
proof of concept thing. So the first thing to do was to figure out how to convert it into e-pub three.
|
|
And what I ended up using was caliber, caliber is what I use for management of my entire e-book
|
|
collection, but also for editing e-pub files. And before I could use it though, I had to uninstall
|
|
the repository version of caliber. I'm on Ubuntu 16.04 and you might don't at me. I know it's an
|
|
old version, but it's the one that still has compatibility with bladder speech recognition,
|
|
which is really important for me. So I have not upgraded. So I uninstalled caliber from the
|
|
repository and then just downloaded it from the caliber website, the latest version. I think it
|
|
will see what version this is. Five point two three. This is caliber 5.23 that I've got here.
|
|
And the newest version, I think even after version four points something, he has a way in there
|
|
very easily to convert from e-pub two to e-pub three. And so what you do is you open up
|
|
whatever book it is that you want to work on. So I have here selected one of my books and I just press
|
|
T or you can right click and choose edit e-book. And then once it's open in the editor,
|
|
you go to the tools menu. That's the third one from the upper left. And the very bottom item on the
|
|
menu says upgrade book internals. Now that's not the most discoverable e-pub two to e-pub three
|
|
conversion, but that was actually the first one I tried and it did it just fine.
|
|
So what it does is it creates a different kind of navigation file. The default navigation file in
|
|
e-pub two is called TOC dot NCX. So NCX. And it's kind of it's an XML file. And it's kind of
|
|
far cumbersome and difficult to navigate and understand. And when you upgrade to e-pub three,
|
|
what you get is a new file called nav.xhtml, which is much easier to read for me. Anyway,
|
|
it's a lot less cluttered and easier to work with. And so anyway, once you've done that,
|
|
you've got one of the key pieces in place. You've got your book upgraded to e-pub three and it's
|
|
ready to start inserting pages. Now, after you do that, you've got to insert page anchors and
|
|
that tells that you just put an anchor everywhere that you want a page break to be and you tell it
|
|
what page number it should be. Now, for some of the books that I've either edited or recreated
|
|
or whatever, I already had a rudimentary form of this. Like in the one that I was working on
|
|
for my music history students, I had already put right in line visible in the text,
|
|
just page numbers in square brackets. So they'd be reading right along and in the middle of a
|
|
sentence, it would say 7777 for a new page number in square brackets, which is not very elegant,
|
|
but it did tell them what page they were on. And so that made it easy for me to go through and find
|
|
first of all where the page breaks were and then what page number to assign to those.
|
|
And once you have that, what you want to do is put in an empty span. So it's a span tag.
|
|
And I will have an example in the show notes. If you want to follow along, it might be easier.
|
|
It says, so open span tag and then right after we're at span, there's a space and an e-pub colon
|
|
type equals quote page break, end quote space, ID equals quote page 57. Well, in the one I have
|
|
here on the in the show notes, it's page 57, ID equals quote page 57, end quote space title equals
|
|
quote 57, end quote, and then you close the opening span tag and then immediately you put the
|
|
closed span tag. But that probably doesn't make sense the way I'm you really need to see it to
|
|
make better sense of it. Anyway, it's kind of a cumbersome bunch of text that you've got to put
|
|
in there just to get a single page number. And of course, I like to try to automate any tedious
|
|
repetitive tasks. And so I made a a bladder voice command that would do this for me. So all I
|
|
have to do is in my file, I type in the page number. In this case, it would be 57. And then I select it.
|
|
And then I speak the words page break. And when it hears that command, it copies that number into
|
|
the clipboard and then runs a Python script that I wrote and puts the entire bit of HTML span
|
|
stuff there and then inserts the number 57 at the two appropriate spots and then pastes it into
|
|
the ebook. So it's a pretty quick way to do that. Now, my counterpoint book, the subject of
|
|
HPR, what, 1512? Yeah. I actually had the foresight to do as part of the kind of
|
|
infrastructure of that book. I did specify page numbers all the way through in kind of invisible
|
|
page anchors. Now, they're not formatted the way you would need to for ePub3. But they're formatted
|
|
very consistently. And the page numbers are all in there. And so I could very easily do a search
|
|
and replace to replace the anchors that I've put with the correct ones that will work. And I haven't
|
|
done that yet, but I probably will very soon. And while I was working on the book, the reason I did
|
|
it was in part because I thought, well, at some point I'm probably going to want to know where the pages
|
|
are and maybe there'll be a way to have it show it correctly in an ebook reader. But in a more
|
|
practical way, I was dealing with making a digital version of a paper book. And it just helped me
|
|
find my place in the HTML file to be able to go up into the address field and put like a hashtag
|
|
followed by a page number and press entering. And it would take me directly to that spot of the
|
|
HTML file that would correspond to a certain page in the book. So it just helped me navigate
|
|
things a lot easier. But it's all still in there invisible, but there. And it's ready to be called
|
|
into service. Okay, so once you've got your page anchors and you put those just right in line
|
|
in like right in the middle of a sentence, wherever there's a page break, just put the page number
|
|
there as an empty span. And it won't be visible while you're reading like in the middle of the
|
|
sentence. But when everything works correctly, and if you look at it in the right, well, the only
|
|
reader that seems to work with it, over in the margin, it'll say what page you're on based on your
|
|
specified page numbers. Okay, so you got your page anchors. The next thing you need to do is create
|
|
a page list. And that goes in the navigation file. That's the new navigation file that's generated
|
|
when you convert from EPUB 2 to EPUB 3 format. And I've got in the show notes an example of a page
|
|
list. And in the exam, it's kind of a minimal example where it just goes from page 122 to 126.
|
|
And as I say here, that's the kind of thing that would happen if like let's say you wanted to make
|
|
an ebook out of a five-page article from an academic journal. And that article appears
|
|
kind of toward the end of the volume. It's going to have pages in the hundreds. It won't start with
|
|
page one. And so this would enable you to specify that these are pages 122 to 126 from that journal.
|
|
And then you'd be able to use that appropriately to cite your sources blah, blah, blah, whatever.
|
|
So there's a navigation block that has a very simple ordered list inside it. And the ordered list
|
|
is just a series of list items with hyperlinks to the page anchors that you've created.
|
|
It's a much more simple and elegant way to deal with it than the old NCX XML kind of thing.
|
|
I actually tried doing that too and it failed. When I tried to open the book and my ebook readers
|
|
had choked and said there's something wrong with this file. I don't know that it matters very much
|
|
where you put this navigation block in your nav.xhtml file. But I decided to put mine between the
|
|
table of contents block and what they call a landmarks block. I don't even know what the landmarks
|
|
block does. But I stuck it between those. And when I saved it and opened it up in an ebook app,
|
|
it worked. Now creating this list, I've got an example of a script I wrote
|
|
to automate some of the process of creating your page list. Because of course it could be very
|
|
tedious if you've got hundreds of pages making an ordered list that's hundreds of list items long
|
|
would be very tedious. So that definitely needs to be scripted. And so I wrote a little bash script.
|
|
Forgive me, Dave, in advance, for writing a script that's probably going to make you choke a
|
|
little bit. But I just use bash. You can probably make a better one in Perl or Python or something.
|
|
This is what I know best and I figured I could probably do it. So I wrote a script that I call
|
|
pagelist.sh. And this script takes two command line arguments. The first and they're both
|
|
numbers. The first is the opening page number. And the second is the closing page number.
|
|
So in my example on the HPR show notes, I just say the command that you'd run would be pagelist.shu
|
|
space 42 space 61. So this would create the navigation block for something where you wanted pages 42
|
|
through 61. It just grabs those command line arguments and passes them in there. What it does is
|
|
there's a for loop. It says for I in dollar sign open parenthesis SEQ space. And then I've got
|
|
that beginning and ending numbers. And then it has it do the stuff. And it's it's way easier to
|
|
look at this. I should not I should not be trying to read scripts in your ear. But it iterates through
|
|
all the numbers between 42 and 61 and creates a list item for each one and just keeps adding it
|
|
to the temporary file. And then at the end of my own script, I actually have it opening up in my
|
|
editor. Although I left that part out of the example here. The one thing that you'll need to do
|
|
is make sure that the URLs in your page list are correct. I didn't really incorporate that
|
|
part very well into my script. And so after it was done running, I open it up in the editor and
|
|
just at a search and replace to put the correct HTML file name, which you you get that by when you
|
|
open up your ebook in the editor in caliber. If you look over on the file browser pane on the left
|
|
hand side under the text block, it will have it will have the file names for all of the files.
|
|
And so on the one that I've got open right now, the file name is index underscore split underscore
|
|
zero zero zero dot X HTML. And then it's, you know, there are a bunch more after that zero zero one zero zero
|
|
two zero zero three and so forth. On my minimal examples that I did, there was only one file that had
|
|
all of that stuff in it. So it was fairly easy to get the URLs correct on the page list. But you
|
|
just got to make sure they're all pointing to the right place. So once you've got your page list
|
|
in your navigation file, then just save the book and try opening it up in a book reader.
|
|
Now here's where some of the problems start to come. There's not widespread support for displaying
|
|
the publisher page numbers in these things. So when I opened it up on my cobo, for example,
|
|
there was no difference at all. It made no difference in what page numbers were displayed.
|
|
The cobo displays page numbers based on an algorithm that it's got in its internals. I think it
|
|
just counts 250 words and then puts a new page. And there might be a way to go in and adjust the
|
|
the word count to make it divide up the pages a little bit differently. But it does something
|
|
like that. It doesn't look for page numbers that you have specified in your book.
|
|
The only application I found that will display your shiny new custom page numbers is iBooks.
|
|
And I know that in the crowd that I'm talking to here, Apple is not one of the most favorite
|
|
companies and I only have one Apple device. It's just a regular iPad. I like the device fairly well.
|
|
But I like to have at least one iOS device to be able to test things and be able to see what my
|
|
students are looking at because so many of them use these things. Anyway, the iBooks app,
|
|
when you open up your book with the new page numbers embedded in there,
|
|
if you tap on the table of contents menu item and then at the very bottom it will say show
|
|
publisher page numbers. If you tap that, then when you go back to reading, it will suddenly show
|
|
the page numbers that you've told it to show instead of the ones that it generates automatically.
|
|
And so it works very well. Now, I also tried it in overdrive on my Android phone. I tried it in
|
|
Marvin, which is an EPUB reading app for iOS that I like quite a lot. It didn't work in either of
|
|
those. It didn't work on my cobo. I have not tried converting it to a Kindle format and then
|
|
opening it on a Kindle. I haven't tried that yet. And I'm curious whether it might work in one
|
|
of those open source alternative ebook readers like KO reader. If you hack your Kindle
|
|
and put an alternate reader on it, it might work in there. I haven't tried that either, but maybe
|
|
that's something for the future. But anyway, hopefully at some point the firmware is for all these
|
|
ebook devices will be upgraded so that they will support the display of these page numbers.
|
|
But even if you can't see them displayed in the page number area down at the bottom of your ebook,
|
|
it could still be useful for the purpose that Andrew and Dave were talking about, which was
|
|
to make an index. Because in your index, you could put the whatever search term that you
|
|
are trying to show. And you could put a series of page numbers that are linked to the page numbers
|
|
you've put in your file. And it will jump right there. So for that purpose, it might be very useful
|
|
for actually displaying the page numbers. The only one that will do it that I found is iBooks.
|
|
Anyway, that's probably enough for that. You guys have probably had enough of me talking about ebook
|
|
stuff. But I've had fun learning about it and enabling it in a couple of things. And I've
|
|
definitely got a few more books in the queue that I want to do it to. So if I learn anything more
|
|
about it, I will write. I'll do another episode. I mean, anyway, it's been fun. Glad to be talking to
|
|
y'all again. And I hope I'll have time to listen to some more episodes very soon. And I've actually
|
|
got a couple of ideas for follow-up episodes for myself. One about my cobo mini ebook reader and
|
|
another about watermarks in Libre Office. But those will be left for another day. That's all for now.
|
|
It's been fun. I will talk to you later. This has been John Culp and Lafayette, Louisiana. Bye, y'all.
|
|
You've been listening to HecopobliGradio at HecopobliGradio.org. We are a community podcast
|
|
network that releases shows every weekday Monday through Friday. Today's show, like all our shows,
|
|
was contributed by an HBR listener like yourself. If you ever thought of recording a podcast,
|
|
then click on our contributing to find out how easy it really is. HecopobliGradio was found
|
|
by the digital dog pound and the infonomicon computer club and is part of the binary revolution
|
|
at binrev.com. If you have comments on today's show, please email the host directly, leave a comment
|
|
on the website or record a follow-up episode yourself. Unless otherwise stated, today's show is
|
|
released under Creative Commons, Attribution, ShareLite, 3.0 license.
|