Episode: 3384 Title: HPR3384: Page Numbers in EPUB eBook Files Source: https://hub.hackerpublicradio.org/ccdn.php?filename=/eps/hpr3384/hpr3384.mp3 Transcribed: 2025-10-24 22:28:54 --- This is Hacker Public Radio Episode 3384 for Thursday, the 22nd of July 2021. Today's show is entitled, Page Numbers in a pub ebuk files. It is hosted by John Culp and is about 28 minutes long and carries a clean flag. The summary is, response to HPR 3367i, describe how to specify page numbers in an ebuk ebuk. This episode of HPR is brought to you by an honesthost.com. Get 15% discount on all shared hosting with the offer code HPR15, that's HPR15. Better web hosting that's honest and fair at an honesthost.com. Hey everybody, this is John Culp in Lafayette, Louisiana. Yes, I'm still alive. It's been a long time since I recorded an episode. Although I do think I have one in calendar year 2021. It seems like I recorded one right at the beginning of the year about something or other. I don't remember what it was, but anyway, I've been like the last time. I've been away from HPR for a pretty good while not only as a contributor, but also as a listener, sadly. I just haven't had time to listen every day the way I probably should. But I did check back in a couple days ago and saw that there was an episode by Dave Morris and Andrew Conway where they talked about e-books. And of course, I'm a degenerate for e-books and e-book readers and formatting. And I love all that stuff. I totally geek out on it, so I had to listen to that one anyway. And they brought up some things in there that I thought it would be worth exploring. And then based on what I found, I decided to go ahead and record a response episode. So what I'm doing here is responding to HPR 3367, which I definitely encourage everyone to go listen to if they haven't already. And in that one, Andrew was talking about his process for creating e-books. And Dave apparently is going to be having a follow-up episode where he talks about his process. And so I'm always interested in how people create these things, the tools they use and all that kind of stuff. So first of all, thanks guys for the name check on me. Andrew, I think, said that he was somewhat inspired by the episode HPR 1512 that I did a few years ago about creating a digital edition of an old counterpoint textbook. And so thanks for that. I haven't done that kind of work in a while. Most of the e-book work I do lately is just fixing whatever e-books I purchase to read or e-books that I get from Project Gutenberg for free or something like that. I like to do a little bit of tweaking and reformatting to suit my preferences. And also I have to convert them to e-pub format. If they're in kindle format, I have to convert them to e-pub before I can put them on my wonderful Kobo Aura 1 e-book reader. But most of the time it's just doing a couple of tweaks to the CSS. I haven't really gotten into the nitty gritty of the e-book internals in a while. So it was kind of fun for this topic to come up and have an excuse to get back in there and poke around. The issue that I really wanted to focus on was that of page numbers. Now if I recall right, it came up when Andrew and Dave were discussing the notion of an index. An index in an e-book is something... I'm not sure I've really seen it before. I haven't seen a lot of academic titles in... Actually I should look. I have a couple of like leadership books and stuff like that that I've read. I might check it at the very back and see if they've got indices in them. But I... they definitely have bibliographies. But I don't know that they have indices the way a paper book would have. But in academic books and technical books, it's super important to have a good index in there to make the book much more useful. And of course in e-books, the index becomes a little bit less necessary than in a paper book. Because of course you can quickly search through the text of an e-book and find what you need. You can find all the instances of a person's name or a topic that you're looking for or a term or something like that. And it's not hard at all. But what you can't do with an e-book that has no index is you can't just browse the index, which is for an academic, that's one of the things they do. The first time I get a new book, I'll kind of flip right back to the bibliography to see what sources they used and also check through the index to see what topics they cover. And this might sound kind of weird to people who just read books for pleasure. But I assure you in scientific fields and academia and stuff, it's perfectly normal to jump right to the index and start looking around. And so I can certainly understand Andrew's concern in making an index for the e-book that has a certain functionality like going, you want to be able to have your list of search terms and be able to tap on something and have it go right there in the book and having good page numbers that refer specifically to the places in the original paper version would be kind of important. And also in academia, it's important when we are doing research ourselves, writing papers and books and we always have to cite our sources. And part of that is not only saying what book you got it from, but what page it was on in that book. And with e-books, it kind of throws this into confusion. And so of course it'd be wonderful if there were a predictable, reliable way to have the same pagination in an e-book that you do in a paper book. And by that I don't mean that I want every page to look the same. I mean to me, it's critical that an e-book be able to flow to fit the screen that you're looking at. So when I'm reading an e-book on my phone, which has what a six inch screen or something, or on my Kobo or a one that has about a seven or eight inch screen or my iPad with a ten inch screen, or my shiny new Kobo mini with a five inch screen, that book should reflow to fit all those screens. And I should be able to reliably change the font size and have it still fill up the screen just fine and not end up, you know, reading really tiny words to try to fit all the, you know, what I don't want is an image of every page, right? The text needs to flow. But we also, in academia, we kind of need to know where the page numbers fall. So all of this is to say, I perfectly understand the issue that Andrew was talking about and the day was talking about. And it's something that has concerned me a little bit, but I've never really tried to follow up with it. Incidentally, Dave mentioned that his son told him about indexing in using Latak, and I can confirm it's very easy to make an index in Latak. But of course, Latak is something that's normally meant to end up with a print product, or I mean a PDF, but to me, a PDF is barely better than paper because it's completely inflexible. It doesn't have the reflowing capability that a true ebook format does. But it is, it's very easy to make an index. And I remember because several years ago, I made a cookbook for my wife of all of her favorite recipes so that she'd have them in one place in a book and I actually printed it out. But I've made an index for it so that like in the index, it has the names of all of the recipes, but also names of certain kinds of ingredients so that you could look at an ingredient and see which recipe is it shows up in and that kind of thing. But anyway, you just kind of, whenever you have a word in the in the text that you want to appear in the index, you just tag it with a certain thing and then you run an indexing command and it voila generates it for you. It's wonderful. No such thing exists for e-pubs. So after listening to their episode, I decided I wanted to try to figure this out because I thought I remembered hearing at some point or reading somewhere that there was as part of the e-pub three specification, there was support for page numbers. In other words, for publishers to put in there the actual page numbers that correspond to the paper versions of their books. And so I did some reading and found that yes, that's true. And there was some limited support under e-pub too, but I couldn't make it work under e-pub too. And I mean to be honest, I didn't really know much about the difference between e-pub two and e-pub three, but essentially all of the e-pub files that I've got in my library and there are thousands are in e-pub two format unless I'm unaware of it. But the main difference is in the navigation file. But there's a way to convert your e-pub to book into an e-pub three and that's the first step in putting page numbers into your e-book. And so I did that to one of my, there's a reading that I like to have my music history students do like a 19th century German critic writing about the music of Beethoven. And it's only about six pages long. And so I decided, well I'm going to start with a short reading like this that comes from an academic book where I kind of do want them to have the page numbers handy. And these are page, you know, it's only six pages long, but the page numbers in the paper copy are like 776 to 782 or something like that. And so of course when you open up in an e-book reader it's going to display page numbers like one two three four five and six instead of the page numbers that are actually in the 700s. So I thought that would be a pretty good proof of concept thing. So the first thing to do was to figure out how to convert it into e-pub three. And what I ended up using was caliber, caliber is what I use for management of my entire e-book collection, but also for editing e-pub files. And before I could use it though, I had to uninstall the repository version of caliber. I'm on Ubuntu 16.04 and you might don't at me. I know it's an old version, but it's the one that still has compatibility with bladder speech recognition, which is really important for me. So I have not upgraded. So I uninstalled caliber from the repository and then just downloaded it from the caliber website, the latest version. I think it will see what version this is. Five point two three. This is caliber 5.23 that I've got here. And the newest version, I think even after version four points something, he has a way in there very easily to convert from e-pub two to e-pub three. And so what you do is you open up whatever book it is that you want to work on. So I have here selected one of my books and I just press T or you can right click and choose edit e-book. And then once it's open in the editor, you go to the tools menu. That's the third one from the upper left. And the very bottom item on the menu says upgrade book internals. Now that's not the most discoverable e-pub two to e-pub three conversion, but that was actually the first one I tried and it did it just fine. So what it does is it creates a different kind of navigation file. The default navigation file in e-pub two is called TOC dot NCX. So NCX. And it's kind of it's an XML file. And it's kind of far cumbersome and difficult to navigate and understand. And when you upgrade to e-pub three, what you get is a new file called nav.xhtml, which is much easier to read for me. Anyway, it's a lot less cluttered and easier to work with. And so anyway, once you've done that, you've got one of the key pieces in place. You've got your book upgraded to e-pub three and it's ready to start inserting pages. Now, after you do that, you've got to insert page anchors and that tells that you just put an anchor everywhere that you want a page break to be and you tell it what page number it should be. Now, for some of the books that I've either edited or recreated or whatever, I already had a rudimentary form of this. Like in the one that I was working on for my music history students, I had already put right in line visible in the text, just page numbers in square brackets. So they'd be reading right along and in the middle of a sentence, it would say 7777 for a new page number in square brackets, which is not very elegant, but it did tell them what page they were on. And so that made it easy for me to go through and find first of all where the page breaks were and then what page number to assign to those. And once you have that, what you want to do is put in an empty span. So it's a span tag. And I will have an example in the show notes. If you want to follow along, it might be easier. It says, so open span tag and then right after we're at span, there's a space and an e-pub colon type equals quote page break, end quote space, ID equals quote page 57. Well, in the one I have here on the in the show notes, it's page 57, ID equals quote page 57, end quote space title equals quote 57, end quote, and then you close the opening span tag and then immediately you put the closed span tag. But that probably doesn't make sense the way I'm you really need to see it to make better sense of it. Anyway, it's kind of a cumbersome bunch of text that you've got to put in there just to get a single page number. And of course, I like to try to automate any tedious repetitive tasks. And so I made a a bladder voice command that would do this for me. So all I have to do is in my file, I type in the page number. In this case, it would be 57. And then I select it. And then I speak the words page break. And when it hears that command, it copies that number into the clipboard and then runs a Python script that I wrote and puts the entire bit of HTML span stuff there and then inserts the number 57 at the two appropriate spots and then pastes it into the ebook. So it's a pretty quick way to do that. Now, my counterpoint book, the subject of HPR, what, 1512? Yeah. I actually had the foresight to do as part of the kind of infrastructure of that book. I did specify page numbers all the way through in kind of invisible page anchors. Now, they're not formatted the way you would need to for ePub3. But they're formatted very consistently. And the page numbers are all in there. And so I could very easily do a search and replace to replace the anchors that I've put with the correct ones that will work. And I haven't done that yet, but I probably will very soon. And while I was working on the book, the reason I did it was in part because I thought, well, at some point I'm probably going to want to know where the pages are and maybe there'll be a way to have it show it correctly in an ebook reader. But in a more practical way, I was dealing with making a digital version of a paper book. And it just helped me find my place in the HTML file to be able to go up into the address field and put like a hashtag followed by a page number and press entering. And it would take me directly to that spot of the HTML file that would correspond to a certain page in the book. So it just helped me navigate things a lot easier. But it's all still in there invisible, but there. And it's ready to be called into service. Okay, so once you've got your page anchors and you put those just right in line in like right in the middle of a sentence, wherever there's a page break, just put the page number there as an empty span. And it won't be visible while you're reading like in the middle of the sentence. But when everything works correctly, and if you look at it in the right, well, the only reader that seems to work with it, over in the margin, it'll say what page you're on based on your specified page numbers. Okay, so you got your page anchors. The next thing you need to do is create a page list. And that goes in the navigation file. That's the new navigation file that's generated when you convert from EPUB 2 to EPUB 3 format. And I've got in the show notes an example of a page list. And in the exam, it's kind of a minimal example where it just goes from page 122 to 126. And as I say here, that's the kind of thing that would happen if like let's say you wanted to make an ebook out of a five-page article from an academic journal. And that article appears kind of toward the end of the volume. It's going to have pages in the hundreds. It won't start with page one. And so this would enable you to specify that these are pages 122 to 126 from that journal. And then you'd be able to use that appropriately to cite your sources blah, blah, blah, whatever. So there's a navigation block that has a very simple ordered list inside it. And the ordered list is just a series of list items with hyperlinks to the page anchors that you've created. It's a much more simple and elegant way to deal with it than the old NCX XML kind of thing. I actually tried doing that too and it failed. When I tried to open the book and my ebook readers had choked and said there's something wrong with this file. I don't know that it matters very much where you put this navigation block in your nav.xhtml file. But I decided to put mine between the table of contents block and what they call a landmarks block. I don't even know what the landmarks block does. But I stuck it between those. And when I saved it and opened it up in an ebook app, it worked. Now creating this list, I've got an example of a script I wrote to automate some of the process of creating your page list. Because of course it could be very tedious if you've got hundreds of pages making an ordered list that's hundreds of list items long would be very tedious. So that definitely needs to be scripted. And so I wrote a little bash script. Forgive me, Dave, in advance, for writing a script that's probably going to make you choke a little bit. But I just use bash. You can probably make a better one in Perl or Python or something. This is what I know best and I figured I could probably do it. So I wrote a script that I call pagelist.sh. And this script takes two command line arguments. The first and they're both numbers. The first is the opening page number. And the second is the closing page number. So in my example on the HPR show notes, I just say the command that you'd run would be pagelist.shu space 42 space 61. So this would create the navigation block for something where you wanted pages 42 through 61. It just grabs those command line arguments and passes them in there. What it does is there's a for loop. It says for I in dollar sign open parenthesis SEQ space. And then I've got that beginning and ending numbers. And then it has it do the stuff. And it's it's way easier to look at this. I should not I should not be trying to read scripts in your ear. But it iterates through all the numbers between 42 and 61 and creates a list item for each one and just keeps adding it to the temporary file. And then at the end of my own script, I actually have it opening up in my editor. Although I left that part out of the example here. The one thing that you'll need to do is make sure that the URLs in your page list are correct. I didn't really incorporate that part very well into my script. And so after it was done running, I open it up in the editor and just at a search and replace to put the correct HTML file name, which you you get that by when you open up your ebook in the editor in caliber. If you look over on the file browser pane on the left hand side under the text block, it will have it will have the file names for all of the files. And so on the one that I've got open right now, the file name is index underscore split underscore zero zero zero dot X HTML. And then it's, you know, there are a bunch more after that zero zero one zero zero two zero zero three and so forth. On my minimal examples that I did, there was only one file that had all of that stuff in it. So it was fairly easy to get the URLs correct on the page list. But you just got to make sure they're all pointing to the right place. So once you've got your page list in your navigation file, then just save the book and try opening it up in a book reader. Now here's where some of the problems start to come. There's not widespread support for displaying the publisher page numbers in these things. So when I opened it up on my cobo, for example, there was no difference at all. It made no difference in what page numbers were displayed. The cobo displays page numbers based on an algorithm that it's got in its internals. I think it just counts 250 words and then puts a new page. And there might be a way to go in and adjust the the word count to make it divide up the pages a little bit differently. But it does something like that. It doesn't look for page numbers that you have specified in your book. The only application I found that will display your shiny new custom page numbers is iBooks. And I know that in the crowd that I'm talking to here, Apple is not one of the most favorite companies and I only have one Apple device. It's just a regular iPad. I like the device fairly well. But I like to have at least one iOS device to be able to test things and be able to see what my students are looking at because so many of them use these things. Anyway, the iBooks app, when you open up your book with the new page numbers embedded in there, if you tap on the table of contents menu item and then at the very bottom it will say show publisher page numbers. If you tap that, then when you go back to reading, it will suddenly show the page numbers that you've told it to show instead of the ones that it generates automatically. And so it works very well. Now, I also tried it in overdrive on my Android phone. I tried it in Marvin, which is an EPUB reading app for iOS that I like quite a lot. It didn't work in either of those. It didn't work on my cobo. I have not tried converting it to a Kindle format and then opening it on a Kindle. I haven't tried that yet. And I'm curious whether it might work in one of those open source alternative ebook readers like KO reader. If you hack your Kindle and put an alternate reader on it, it might work in there. I haven't tried that either, but maybe that's something for the future. But anyway, hopefully at some point the firmware is for all these ebook devices will be upgraded so that they will support the display of these page numbers. But even if you can't see them displayed in the page number area down at the bottom of your ebook, it could still be useful for the purpose that Andrew and Dave were talking about, which was to make an index. Because in your index, you could put the whatever search term that you are trying to show. And you could put a series of page numbers that are linked to the page numbers you've put in your file. And it will jump right there. So for that purpose, it might be very useful for actually displaying the page numbers. The only one that will do it that I found is iBooks. Anyway, that's probably enough for that. You guys have probably had enough of me talking about ebook stuff. But I've had fun learning about it and enabling it in a couple of things. And I've definitely got a few more books in the queue that I want to do it to. So if I learn anything more about it, I will write. I'll do another episode. I mean, anyway, it's been fun. Glad to be talking to y'all again. And I hope I'll have time to listen to some more episodes very soon. And I've actually got a couple of ideas for follow-up episodes for myself. One about my cobo mini ebook reader and another about watermarks in Libre Office. But those will be left for another day. That's all for now. It's been fun. I will talk to you later. This has been John Culp and Lafayette, Louisiana. Bye, y'all. You've been listening to HecopobliGradio at HecopobliGradio.org. We are a community podcast network that releases shows every weekday Monday through Friday. Today's show, like all our shows, was contributed by an HBR listener like yourself. If you ever thought of recording a podcast, then click on our contributing to find out how easy it really is. HecopobliGradio was found by the digital dog pound and the infonomicon computer club and is part of the binary revolution at binrev.com. If you have comments on today's show, please email the host directly, leave a comment on the website or record a follow-up episode yourself. Unless otherwise stated, today's show is released under Creative Commons, Attribution, ShareLite, 3.0 license.