259 lines
18 KiB
Plaintext
259 lines
18 KiB
Plaintext
|
|
Episode: 1657
|
||
|
|
Title: HPR1657: Hacking Gutenberg eBooks
|
||
|
|
Source: https://hub.hackerpublicradio.org/ccdn.php?filename=/eps/hpr1657/hpr1657.mp3
|
||
|
|
Transcribed: 2025-10-18 06:28:23
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
It's Tuesday 9th of December 2014.
|
||
|
|
This is HPR Episode 1657 entitled Hacking Gutenberg Ebooks.
|
||
|
|
It is hosted by John Kulp and is about 27 minutes long.
|
||
|
|
Feedback can be sent to JohnlandChickelp at mail.com or by leaving a comment on this episode.
|
||
|
|
The summary is, I talk about ebook formatting and how to customize an ebook from Project
|
||
|
|
Gutenberg.
|
||
|
|
This episode of HPR is brought to you by An Honesthost.com.
|
||
|
|
Get 15% discount on all shared hosting with the offer code HPR15 that's HPR15.
|
||
|
|
Get your web hosting that's honest and fair at An Honesthost.com.
|
||
|
|
Hey everybody, John Kulp in Lafayette, Louisiana here and it's been quite a long time since
|
||
|
|
I recorded an episode for HPR.
|
||
|
|
I went back and looked and it was in May and so it's high time that I did another one
|
||
|
|
especially since apparently shows are running short.
|
||
|
|
So I'm going to talk for a few minutes today about something that's really interested
|
||
|
|
me a lot lately and that is ebooks.
|
||
|
|
Now I've been a book lover for most of my life and in fact there was quite a while when
|
||
|
|
I was in my 20s when I collected rare books and I really prize the book as an artifact.
|
||
|
|
However, in the last couple of years I've really grown to love ebooks almost as much
|
||
|
|
if not more than regular books.
|
||
|
|
Part of this is the convenience and part of it is the fact that they are so much more
|
||
|
|
accessible than physical books in terms of things like font size and cross platform availability
|
||
|
|
and also accessible in my pocket.
|
||
|
|
I mean with ebooks everywhere I go I have a book that I can read if I get bored.
|
||
|
|
It's on my phone, it's on my laptop, it's on my tablet and the thing that really got
|
||
|
|
me interested in ebook formatting was my purchase at the end of the spring semester
|
||
|
|
I think last year I got a Kindle.
|
||
|
|
And a Kindle is a wonderful device.
|
||
|
|
It's not the only really good ebook reader but it's the only one I have, well my kids
|
||
|
|
have the nook color which to me is not as good of a ebook reading experience.
|
||
|
|
The thing that's great about the Kindle is the eink technology which is a really wonderful
|
||
|
|
looking, I don't even know what to call it but it's a way of displaying text on a screen
|
||
|
|
that is not using a glowing screen.
|
||
|
|
When you use an eink device you can take it out right out in direct sunlight and see
|
||
|
|
it perfectly.
|
||
|
|
In fact you can see it better in direct sunlight than you can in a dark which is exactly
|
||
|
|
the opposite of a smartphone or a tablet which you cannot possibly read if you're out
|
||
|
|
in the sun.
|
||
|
|
The Kindle that I got is the Kindle Paper White and it's got built in LED back lighting
|
||
|
|
if you have to read in low lighting situations and most of the time I keep those lights
|
||
|
|
on.
|
||
|
|
The battery life is incredible, it'll last a long time.
|
||
|
|
It does not have expandable storage but it holds enough books.
|
||
|
|
I use the Caliber ebook management program to manage my ebook library and transfer books
|
||
|
|
over to the Kindle when I want to.
|
||
|
|
Now what got me interested in hacking ebooks was the fact that the Kindle is wonderful as
|
||
|
|
it is has one really serious flaw which is it is not able to do decent justified text
|
||
|
|
and almost every ebook comes with text fully justified so that in other words the left
|
||
|
|
and right margins are straight and while that looks wonderful in a printed book it looks
|
||
|
|
awful on a Kindle because the Kindle is not able to break words in a sane way.
|
||
|
|
In fact it does not try to break words at all and so a book that has justified text
|
||
|
|
reading on the Kindle ends up with all these giant spaces between words which is extremely
|
||
|
|
annoying to me and so I decided I'm going to learn how to get into these ebooks and
|
||
|
|
fix that where every ebook I read has left justification instead of full justification.
|
||
|
|
Left alignment maybe I should say.
|
||
|
|
So the only margin I care about then is the left one.
|
||
|
|
Everything lines up on the left and the right is a ragged margin which I don't mind so
|
||
|
|
much.
|
||
|
|
Maybe it looks a little bit prettier if the right margin is all nice and straight but I
|
||
|
|
would prefer to have the ragged right margin and have equal spacing between words instead
|
||
|
|
of having both margins nice and straight but having really irregular wildly erratic spacing
|
||
|
|
between words.
|
||
|
|
Okay so my workflow when I get a new book I read a lot of books from Gutenberg.
|
||
|
|
I thankfully have a terrific appreciation of 19th century literature and that means that
|
||
|
|
I can get tons and tons of stuff to read for free from Project Gutenberg and I will have
|
||
|
|
a link in the notes for Gutenberg.
|
||
|
|
If you've never gone there then you should.
|
||
|
|
If you're a reader and you like public domain fiction Project Gutenberg is awesome.
|
||
|
|
And as a test case I'm going to use a book that I read recently from there called Washington
|
||
|
|
Square by the American author Henry James.
|
||
|
|
Now I normally will go right to the Gutenberg website and download the book and I'm actually
|
||
|
|
going to put a link to this book in the show notes as well.
|
||
|
|
And I download the ePub version of the book even though the caliber eBook manager cannot
|
||
|
|
sorry the Kindle does not read ePub format.
|
||
|
|
The Kindle reads a different format AZW3 or MOBI, either one of those.
|
||
|
|
I normally download the ePub anyway and then I work on it and convert it to the AZW3 format.
|
||
|
|
So I'm going to download the ePub file and I'm on Firefox on Linux.
|
||
|
|
Everything I'm doing is using the Linux versions of everything.
|
||
|
|
So I download it and it puts it into my downloads folder and then I go to my caliber eBook
|
||
|
|
management program that's caliber spelled C-A-L-I-B-R-E.
|
||
|
|
It looks like Calibre which would make sense.
|
||
|
|
I mean the word Libre implies books but whatever I think it's supposed to be pronounced caliber.
|
||
|
|
And I will have a link to the caliber website also.
|
||
|
|
There are versions of caliber for Linux, Windows and Mac and I have used it on all three
|
||
|
|
works beautifully.
|
||
|
|
This is a caliber is a great tool for organizing your library, keeping track of everything
|
||
|
|
you can add tags, you can sort things by title, author, date and so forth.
|
||
|
|
And you can use it to side load books over to your reading device.
|
||
|
|
And so far I've only used it with the Kindle and with the Nook color but for both of those
|
||
|
|
devices as soon as I plug it in it recognizes that a device has been attached and it will
|
||
|
|
load up the library on that device and you can easily transfer books back and forth to
|
||
|
|
it.
|
||
|
|
So I've downloaded the Henry James book and it's in my downloads folder right now.
|
||
|
|
So what I need to do is add it to my caliber library and I will do that by clicking the
|
||
|
|
upper left hand button in the caliber interface that says add books.
|
||
|
|
When I do that it opens up a file selector window and I'll go and find the file in this
|
||
|
|
case it's pg2870.epub and it is adding it to my library.
|
||
|
|
I used to have this I actually deleted it from my library and then it says it's already
|
||
|
|
here so I'm just going to select add it anyway.
|
||
|
|
Not sure what's going to happen here okay so it's in my library now.
|
||
|
|
And when I select it it shows a funny looking ebook reader device image over there on the
|
||
|
|
right hand side.
|
||
|
|
There are a few things that you can do with it.
|
||
|
|
One thing I like to do is go find a picture for the cover because the Project Gutenberg books
|
||
|
|
do not come with cover images they just have plain text and so I will often if it's a
|
||
|
|
book I know I want to keep around I will go and find a picture of some addition of that
|
||
|
|
book on an image search and then add it in the metadata editing window.
|
||
|
|
For now I'm just going to open up the book and start poking around with the style sheet
|
||
|
|
to see and you know to make the adjustments that I like to make.
|
||
|
|
The most important adjustments for me are the justification change it from full justification
|
||
|
|
to left and also the line height and if there has been any kind of indication about font
|
||
|
|
size I remove that at least from the body text of the book.
|
||
|
|
In general ebooks should be formatted as simply as possible so that they can just adapt naturally
|
||
|
|
to whatever ebook device is being used to view it.
|
||
|
|
Like in my own style sheets for ebooks I never indicate a specific font for the main
|
||
|
|
body text because I want to be able to use the embedded fonts or the built-in fonts
|
||
|
|
on my devices for that.
|
||
|
|
I think you're by specifying certain fonts you're kind of interfering with a user's ability
|
||
|
|
to choose what fonts he or she wants and I'm all about choice.
|
||
|
|
So the style should be fairly simple and normally the books that I get from Project Gutenberg
|
||
|
|
are pretty good in that respect.
|
||
|
|
Sorry I just took a look at my recorder to make sure it was still recording.
|
||
|
|
One time I did this and I got finished talking half an hour later and realized that I had
|
||
|
|
not been recording so that's why I took a moment and looked there.
|
||
|
|
I'm going to open up Washington Square by right clicking on it and choosing edit book and
|
||
|
|
it opens up the ebook editor that is part of Caliber.
|
||
|
|
When you open that up you can see a great big blank gray spot in the middle and then
|
||
|
|
a left hand file browser and then over on the right side there's a live preview area.
|
||
|
|
This one appears to be done in one giant HTML file.
|
||
|
|
Best practice would be for each chapter to have a separate HTML file and that's something
|
||
|
|
that will happen when I run the conversion to make an AZW3 here in a couple of minutes.
|
||
|
|
When I open it up by the way a little knowledge of HTML goes a long way in editing an ebook
|
||
|
|
because ebooks are essentially HTML files that are packaged up in a certain way.
|
||
|
|
This one it appears that every chapter heading is done with an H3 and I would prefer to
|
||
|
|
have it done with H2 because my conversion settings on Caliber are done so that whenever
|
||
|
|
it detects an H2 or heading level 2 it will insert a page break there to make sure that
|
||
|
|
the new chapter starts on a new page.
|
||
|
|
The first thing I'm going to do now that I have opened this up and I'm looking at it I'm
|
||
|
|
going to change all of the H3s to H2s and the way to do that is once you have what I did
|
||
|
|
first was under the text area in the left hand file I selected the second of the two HTML
|
||
|
|
files.
|
||
|
|
The first one normally is just some random front matter.
|
||
|
|
The second one in this case is where the whole book is and so actually you know what it
|
||
|
|
looks like I was wrong about that I'm sorry they've got two HTML files.
|
||
|
|
The first one has maybe the first half of the book and the second one has the second
|
||
|
|
half and as I look through it I see a few things that I want to change.
|
||
|
|
First of all it does not have any indentation of paragraphs.
|
||
|
|
This one is basically done like it would be if you were going to read it on the web rather
|
||
|
|
than as a book so it has a good bit of space between every paragraph and no indentation.
|
||
|
|
What I want to do is remove most of the space between the paragraphs and then do a first
|
||
|
|
line indent on all of those.
|
||
|
|
And as I mentioned the chapter headings are done heading level three and I want to change
|
||
|
|
those to heading two.
|
||
|
|
So underneath the source code there's a little search and replace thing or if you don't
|
||
|
|
see that you can do control F and it will appear control F for find.
|
||
|
|
So I'm going to find H3 and I'm going to replace it with H2 and there are a couple of
|
||
|
|
options here there's a mode I'm going to use normal mode you can also use reg X mode
|
||
|
|
which allows you to use regular expressions and I'm going to have it search through all
|
||
|
|
text files you can also search through just the current file or all of the style files
|
||
|
|
or whatever I'm going to use all the text files and I have in the find field I put H3
|
||
|
|
and in the replace field H2 and I'm going to click replace all and it did 68 times so that
|
||
|
|
looks like there are 34 chapters it does an opening and closing tag for each chapter.
|
||
|
|
So now all of the headers are H2 and that's what I want.
|
||
|
|
Now let's look at the style sheet.
|
||
|
|
The style sheet will be on if on the file browser on the left hand side this one is called
|
||
|
|
pgepub.css that would stand for I assume Project Gutenberg ePub.css I'm going to select it
|
||
|
|
and then press enter and I can see the style settings that they have here.
|
||
|
|
It's this is a very very simple style sheet which in general I like I appreciate that
|
||
|
|
I don't like it when they get too fancy.
|
||
|
|
It has a few settings for body has a couple of settings for H2 oddly because it didn't
|
||
|
|
have any H2s in the whole thing it only had H3 and then it has a couple of settings
|
||
|
|
for the Project Gutenberg disclaimers and various things.
|
||
|
|
So the first thing I'm going to do is delete all of this and select all and backspace because
|
||
|
|
I have my own basic ebook style sheet that I always start with I call it basic ebook.css
|
||
|
|
I'm going to copy and paste my style sheet into the little style sheet source code
|
||
|
|
window and I have a link in the show notes to my paste bin site where I put the style sheet
|
||
|
|
there.
|
||
|
|
Now suddenly everything is different.
|
||
|
|
The line height is set at 1.25em.
|
||
|
|
I set the margins to have 0.1em above and 0.1em below on each paragraph and then I set
|
||
|
|
the text indent at 1em, em is a unit of measurement that's used in CSS.
|
||
|
|
You could also use pixels as a unit of measurement but I normally use either em or a percentage.
|
||
|
|
So now I also have in my style sheet a setting for H2 and H1.
|
||
|
|
This is one place where I do sometimes change the font family I changed it to sands and
|
||
|
|
that's certainly not necessary but I like to do it for my own ebooks.
|
||
|
|
If I were publishing this I probably would not do that.
|
||
|
|
I would leave it undefined and let people's ebook readers determine what font is shown there.
|
||
|
|
For my headings I also have a good bit of margin below and that allows it to have a little
|
||
|
|
bit of separation between the text of the paragraph and the chapter heading.
|
||
|
|
What other settings do I have?
|
||
|
|
So right now all of the paragraphs have a first line indent of 1em.
|
||
|
|
Now that's not ideal because in normal books you may never have noticed this but the first
|
||
|
|
paragraph of a chapter normally is not indented and then all subsequent chapters are.
|
||
|
|
So what I'm going to do is look in here and find there's a way to fix this where every
|
||
|
|
first paragraph of a chapter will have a will not have an indent and what I do is I
|
||
|
|
look for the closing header 2 tag so it's less than slash H2 greater than followed by a
|
||
|
|
new line followed by less than P so that would be the closing H2 tag followed by a blank
|
||
|
|
line followed by the opening paragraph tag.
|
||
|
|
I'm going to search for that by pressing Ctrl F and that string automatically appears
|
||
|
|
in the find field.
|
||
|
|
Actually I'm going to copy it to and then in the replace field I'm going to replace it
|
||
|
|
with the same thing except add a class to it and that is my class equals no indent.
|
||
|
|
I have a class in my style sheet called no indent which has a first line indent of 0 and
|
||
|
|
I'm going to click replace all and it did 35 times so that should be correct and now when
|
||
|
|
I go through there the first paragraph of each chapter has no indent and then every
|
||
|
|
subsequent paragraph is indented 1em.
|
||
|
|
So part of my style sheet is to align everything on the left and I do that in the body part
|
||
|
|
of the style sheet what else.
|
||
|
|
If you want to get really fancy with this if it's a favorite book or one that you are
|
||
|
|
going to want to share with other people or something and you want to make it look really
|
||
|
|
nice you can do a drop cap which is something I think I did when I was reading this book the
|
||
|
|
first time I'm not looking at my own copy of this right now I'm looking at one that I'm
|
||
|
|
doing on the fly for this podcast but a drop cap is the very first letter of a chapter
|
||
|
|
will sometimes be big enough to span about two or three lines vertically and the way you
|
||
|
|
do that is to go into the source code and find the first letter of the paragraph there.
|
||
|
|
In this case it says win the child was about 10 years old and so on the word win I can
|
||
|
|
select the W or just I can select the W and then there's a little tool here actually I
|
||
|
|
can't use that what you have to do is put span tags around that W so span and then after
|
||
|
|
the W put a closing span tag and then you have to give that letter a class and I have it
|
||
|
|
I call it the drop cap class I think yeah in my style sheet I have a dot drop cap so my drop cap
|
||
|
|
class will tell that letter to float left I have a font size of 2.8m and then sets a couple of
|
||
|
|
margin settings and so when you do that that one letter is going to be much bigger than all the
|
||
|
|
others and it will span a couple of lines and it looks kind of nice it makes it look a little bit
|
||
|
|
more like a real book and one more thing I typically do with project Gutenberg books is to smarten
|
||
|
|
up the punctuation because they use all straight quotes and straight single quotes and I like that
|
||
|
|
the look of the smart quotes and they have a little tool called smart and punctuation if you look
|
||
|
|
at your set of buttons across the top there one of them has a pair of right hand quotes and if you
|
||
|
|
hover over it says smart and punctuation so I'm going to click that now and it will turn all of
|
||
|
|
those straight quotes into smart quotes and it will also take things like double hyphens and make
|
||
|
|
M dashes out of them and so that's it's a nice touch so when you're done with these things or
|
||
|
|
whatever else you want to do you want to save the file by doing control S and at that point you
|
||
|
|
can exit out of the eBook editor and transfer the book over to your reading device or email it
|
||
|
|
to yourself or something like that now this one is still an ePub and I would convert it over to
|
||
|
|
AZW3 to be able to read on my Kindle and that might be information for another episode how to
|
||
|
|
optimize an eBook in the conversion process what essentially will happen is when I convert this
|
||
|
|
it will chop those two giant HTML files up into probably 35 HTML files one for each chapter plus
|
||
|
|
some front matter and so forth and that way it will always have a new page and for each new chapter
|
||
|
|
anyway hope you guys have enjoyed that all of this relates to editing books that are not
|
||
|
|
covered by DRM now you can open up books with DRM on them if you've got certain plugins installed
|
||
|
|
I'm not going to go into how to do that but there is ample information online on how to make
|
||
|
|
caliber do that I've done it on my laptop because even books that I buy that are published and have DRM
|
||
|
|
I don't want to have them fully justified I want the left justification so I fix it
|
||
|
|
so anyway hope you've enjoyed that go grab yourself an eBook hack it and then read it it's fun talk
|
||
|
|
you all later bye
|
||
|
|
you've been listening to Hacker Public Radio at hackerpublicradio.org we are a community podcast
|
||
|
|
network that releases shows every weekday Monday through Friday today's show like all our shows
|
||
|
|
was contributed by an hbr listener like yourself if you ever thought of recording a podcast
|
||
|
|
then click on our contribute link to find out how easy it really is Hacker Public Radio was
|
||
|
|
founded by the digital dog pound and the infonomican computer club and it's part of the binary
|
||
|
|
revolution at binrev.com if you have comments on today's show please email the host directly leave
|
||
|
|
a comment on the website or record a follow-up episode yourself unless otherwise status today's
|
||
|
|
show is released on the creative comments attribution share a life 3.0 license
|