Episode: 2372
Title: HPR2372: Docbook
Source: https://hub.hackerpublicradio.org/ccdn.php?filename=/eps/hpr2372/hpr2372.mp3
Transcribed: 2025-10-19 01:52:37

---

This in HPR episode 2,372 entitled.book, it is posted by Klaatu and in about 55 minutes
long, and Karima Cleanflag, the summary is how to.book.
This episode of HPR is brought to you by AnanasThost.com.
Get 15% discount on all shared hosting with the offer code HPR15, that's HPR15.
Get your web hosting that's honest and fair at AnanasThost.com.
Hi everyone, this is Zach from Public Radio, my name is Klaatu and I talk about Dock Book.
I'm going to talk about Dock Book in probably two episodes.
One, this one, will be an overview on how to use Dock Book, because I feel like Dock Book
has a kind of mystery about it so that people are afraid to even approach at times.
So I want to talk about how to actually use Dock Book, make it work for you.
In the second episode, I'll be talking about why you should use Dock Book,
because I do feel that Dock Book has mildly a reputation of being old.
It's kind of like the old documentation style.
We all use Markdown now, or Restructured Text, or ASCII Dock, because Venus Torvald said it was cool.
And that's kind of, I don't want to talk about Dock Book at the expense of Markdown,
or Restructured Text, or ASCII Dock, but I do want to talk about it in a very positive and kind of like,
hey, this is really useful, and maybe you should look at it, depending on how it, you know, what your needs are.
So I was going to actually do that in reverse order.
I was going to talk about why you should use Dock Book, and then tell you how to use Dock Book.
And I found that that was really boring, because even as I was talking, I was trying to sort of describe all these great things about Dock Book,
and in the back of my mind, I'm thinking people don't even know what you're talking about yet,
because you haven't shown them.
So I'm going to show you Dock Book first, and then I'm going to tell you about Dock Book and why I love it later.
So let's get started.
It's pretty easy, actually, to start with Dock Book.
It takes a text editor.
That's all.
It's a plain text format, you know what I mean? That's what it is.
Dock Book is, for instance, HTML.
I'll probably use similes or examples of HTML quite frequently, because it's quite a lot like HTML.
In fact, HTML once was a lot like XML, because they were actually trying to invent this thing called HTML.
Well, they did invent it.
It exists.
Luckily, HTML5 clobbered it and took over, but HTML was HTML within an XML namespace, and it was horrible to use.
I hated it.
That was actually when I was trying to start learning HTML for the first time.
It was XHTML, and I couldn't understand why I couldn't understand it.
And then HTML5 hit, and it was just like, oh, this is so easy. This is so beautiful.
But anyway, I digress.
The point is, XML and HTML are a lot of similarities, so we're going to talk about one using the other frequently,
because I think most of us know, at least enough of HTML, that's a useful common point of discussion.
So, Dock Book, plain text editor, that's all you need.
I use EMAX, but you can use EMAX, or you can use K-Rite, or Kate, or G-Edit, or whatever text editor that you could use for HTML, you can use for Dock Book.
Just make sure that it's not something really weird that when you go to save it, it secretly saves, I don't know, an RTF version of it for you.
Even though you think, oh, I'm just saving plain text. It's like, nope, I'm going to style it for you. Don't do that. Use a real text editor.
So, the first line of a Dock Book document is the declaration of what kind of document it is.
So, that's pretty standard. And, I mean, you could think of it as the first line in a well-structured HTML document where you're kind of telling everyone, hey, this is an HTML document, and it uses this character set.
So, in XML, that is angle bracket, question mark, XML space version equals 1.0, space encoding equals UTF-8, question mark, angle bracket, and of course, around the attributes of version and encoding, I use quotes.
So, that's the declaration of what kind of document we're dealing with. The next line is the declaration to Dock Book, telling it what kind of Dock Book schemas that we should be using in namespaces.
So, this is the complex line, and this is, I would say, the most complex part of Dock Book. Well, I'm sorry, of writing Dock Book.
So, I actually think of this as the HTML header, the HTML head section, right? Because nobody really knows the HTML head section. No one knows what's to put in there.
We just know a head title, some stuff about my CSS, I think, and then close head, and then I'm in the body, and I'm safe.
So, this is kind of like that. It's one of those things that you'll memorize it eventually if you really think about it, but more often than not, you'll be copying and pasting. And that's fine.
So, the tag opening, the tag that you're using, is one of three different ones, at least that I can think of. There's one called article, there's one called book, and there's one called set.
So, an article assumes that you're writing an article. So, let's say one page to, I don't know, 30 pages, 40 pages, something like that. Certainly not a document that you're going to think of putting chapters into.
Now, maybe you'll have sections, like an introductory statements, and then a lecture, and then an exercise, but you won't have chapters in it.
So, the other kind of document that you might write in doc book is a book, and that would presume that you're going to have things like a preface, and chapters, and bibliographies, and things like that, an index.
And then the third is a set, which assumes that you're writing a set of books. So, you'll open it with sets, and then you'll eventually start your first book, and you'll write it, and then you'll close your first book, and then you'll start your second book, and so on.
I've used, I think, I've actually used all three of those, but I've used two of them well, article and book, set technically, yeah, probably stay away from that one.
But I mean, I don't know, I don't know you, if you want to write a five book fantasy series and doc book, go for it, that would be the set.
So, let's assume that we're doing a, let's say, a book for now. So, we'll do a book, and then space, xmlns. So, that's XML namespace, xmlns equals, and quotes HTTP colon slash slash docbook.org slashns for namespace slash docbook.
Close quote, that's pretty simple. Let me give you really think about it, that's not that hard. So, you're defining what kind of XML you're writing.
So, I don't know how much you know about namespaces, but just if you think about namespaces in programming, it's the same thing here. If you don't know that, think about a namespace in real life.
Things differ depending on what you're actually talking about. For instance, a table is a table in the real world, right?
But if you go to a database person and say, hey, I want to make a new table, they're not going to build you a physical table. They're going to make you a database table.
So, that's a namespace, a table, same word, completely two different meanings within different contexts. So, in this case, XML is super, super flexible.
And so, you're saying, hey, the XML namespace that I want you to use here is the docbook one. So, whenever I talk about, you know, through, I'm talking about the docbook through, not any other through in any other XML that you might have ever read.
Simple, right? The next one is a DTD. And a DTD is a document type definition. And what that has to do is with legality, strict, unforgiving, legality.
So, if you have a tag, for instance, let's say it's a paragraph tag, paragraph in docbook, P-A-R-A.
And then you decide that in the middle of your paragraph, you want to start a new chapter. XML or docbook, rather, will fail. It will absolutely stop you.
It will say, no, well, it won't stop you while you're writing it. But when you try to process it later, it will say, hey, you can't do that. You've got a paragraph, and you've started a whole new chapter inside of the paragraph. That's just not, that doesn't make any sense.
So, you'd have to go back in and move that chapter outside of the paragraph, and then write your paragraph. And I think that's pretty fair. That's one of the most fair legal definitions of the docbook DTD.
There are others that are a little bit tougher to justify, and then, conversely, others that I wish it was a little bit stricter about. But I won't get into that right now.
Point being, there's a certain order to the universe of docbook, and if you go against that, it will fail, because it doesn't know what to do with that. It's not that it doesn't like you. It's just, it doesn't know how to process the commands that you're giving it.
So, in order for docbook to know what those legal tag orders are, you need to tell it the DTD, the document type definition. And that would be, for me, personally, I like to do it on my, you know, I like to set that to where I keep my DTD on my computer system.
Now, that's not necessarily the best way to do it. I will admit, because if you then, if I save this docbook document to a thumb drive and hand it to a friend, and I'm using Slackware and they're using Fedora, the DTD might be located in a different place, but actually it's definitely located in a different place on the system.
And then, when they go to process the document, they'll get an error about, there's no DTD in the place that you told me there is a DTD.
So, that's a little bit of a problem, but you can get around it pretty easily by either bundling the DTD with your, your source, and just making its path relative to the document, or you can set it to a network location.
But I'm going to defiantly give you the location, I use the location of the DTD on Slackware, because that's where I do all of my docbook work.
Well, it's not all of my docbook work, that's for sure. I do this a lot at work, and that is on a Fedora system.
So, DTD equals quote slash user, slash share, slash XML, slash docbook five, slash schema, slash DTD, slash 5.0, slash docbook dot DTD, quote quote.
Space, version equals quote 5.0, quote quote space, XML colon lang equals quote E in, quote quote quote, close angle bracket.
So, that's a really big and ugly and scary looking tag, but I promise you that's pretty much the worst of them all.
That's the one with the most little arguments in it that you may not always remember.
Although, honestly, after a while you do start to remember it, it's not actually as scary as it looks after you start to really kind of start to use it.
Yeah, it's less frightening.
That said, copy and paste, when in doubt, just do it once, and then copy and paste that line or the whole thing, really.
To be honest, all of this front matter that I'm doing that I'm talking about right now for docbook, I don't even bother copying and pasting it.
I just have it in a file called header.xml, and then I cat the contents of that header into whatever next header.xml I'm doing.
And then I adjust whatever I need to adjust, and that's how I do it. And that just makes it really easy, and it also gives me a consistency to all of my documents.
Because the next important bit is the, I guess, the metadata about your document.
So the information about, well, for instance, the title of the document, the summary, like an abstract, the author, the date of authorship, the license, the revision history, all that sort of thing.
And that's a whole other, I mean, that's a whole chunk of docbook stuff that you can insert into your document or not, but it's just something that is available to you in docbook.
And I would say that of all the killer features of docbook, I would say that the explicitness of docbook is actually really, really one of the, for me, it's one of the killer features.
Because this isn't one of those documents where you just kind of throw it together, and then just through the memory of whatever you retain, kind of remember, the circumstances around which this document was created.
This is one of those formats where you know, because you have it all here.
So you might open your info section with the info tag, and then give the document a title, or the book a title rather, and that's a title tag title.
And that, you could read, you could write some text as to what the title is going to be, and then you'd close the title tag, and then you might give it an abstract.
So that would be the abstract tag, and then in that you could do some paragraphs, and paragraph, and write a paragraph about what the document is about, and then you would close the paragraph, and then you would close the abstract.
And then you could do authors, you could do several authors, and an author is a block of specific information, such as, well, first you'd open it with author, and then the next tag would be person name, and then within that there might be the first name, and then the surname, and then you close the person name.
Close the author, do that same thing as many times as you need to list new authors.
Same principle with copyright, copyright tag, year tag, holder tag, and it closed the copyright.
You can do a volume number, an issue number, a publisher, a publisher name, publication date, pub date, revision history, revision rev number date, rev remark, closed revision, closed rev history, closed info block.
XML likes to be modular, so if you do this once, and save it to a file called, for instance, header.xml, then pretty much that's all you have to remember, because you will just keep using that same header file for as long as you ever use Docbook, and it'll be beautiful and sublime, and you won't even notice it.
Now, that file will not understand that it is, well, it will correctly believe that it is an incomplete XML file, because look, you didn't close the book tag.
But that's okay, it doesn't have to be complete, you can concatenate it together later and process it as a full document.
I just like to keep it modular, and I don't finish, you know, I don't close the book tag, obviously, because I want there to be a book there, or an article there.
So, I leave the header file as a self-standing thing, which I then utilize with the rest of my files.
So, this is where Docbook becomes really, really, I don't want to say intuitive, because it's not necessarily intuitive, if I just sat someone down and said, okay, write me a Docbook document, that's not going to be intuitive.
Whereas, something like Markdown, if you ask someone, okay, write me a Doc, Markdown document, there's actually a 20% chance that they'd accidentally do something that Markdown would approve of.
They might actually, well, they will certainly make paragraph breaks in a Markdown-approved way, probably, unless they're super old school and are used to indenting paragraphs.
But, generally speaking, like, they're going to write paragraphs in a Markdown-valid way. Whereas, Docbook, there's basically a 0% will, 1% chance that they would hit upon the way to do that.
So, it's not intuitive, as I'm saying. However, once you see it, it is quite obvious, retrospectively obvious, what isn't.
So, for instance, if you wanted to start a new chapter, I might ask you, the listener, what kind of tag do you think that would be?
I mean, you've only been using Docbook for, like, 10 minutes here, and that for a header file that you couldn't even see. But just guess. And say it out loud.
Even if you're in an office with other people, just go ahead and say it. You don't have to shout it. Just say it.
Yeah, if you said chapter, then yes, you are correct. So, the chapter tag is the way that you open a chapter. So, you would then do a chapter.
Now, I like to make an ID for my chapter. I like to explicitly within the Docbook, within the XML, give a unique identifier to that chapter.
The way I do that is not with a number, because you never know. You might reorder certain things. So, I don't use the numbers. I just use strings of things that make sense.
So, for this work, we might do something like chapter, and then space ID equals, quote, how to, dash, Docbook, close, quote, close, ankle bracket.
So, now I've opened the chapter and I've given it a unique identifier. And that does have to be a unique identifier. If I tried to make a chapter later on and called it how to dash, Docbook, within the same book, it would fail.
I would say, hey, you've got two chapters that are uniquely identified as how to dash, Docbook, but the definition of unique is that there is one of them. So, don't do that.
The next part of the chapter would be the chapter title. And once again, like if I had to ask you the listener, and this time you don't have to say it out loud, what the tag might be for a title, you might say back to me title, and I would say, yeah, that's it. So, title tag. So, it's angle bracket, T-I-T-L-E,
close angle bracket, that's the title tag, and then you would type how to space Docbook, and then close title tag. And then if you wanted to write a paragraph, this is actually a trick question, what would you think the paragraph tag would be? And you'll say, paragraph, and then I'd cut you off really quick. And I'd be like, yes, paragraph. And then you'd say, graph, and I would ignore that part. So, it's paragraph, yes. For whatever reason, Docbook decided to be nice to people who don't like to type on the paragraph tag.
And it's just paragraph, angle bracket, P-A-R-A, close angle bracket, that's a paragraph tag. And so, a word on the length of tags. So, you might think, well, you know, H-T-M-L has this really great thing where like most of the tags are maybe two letters. And that is super nice. I admit, I really like H-T-M-L's tagging for that reason. Well, I should say I'd love it and hate it, right? Because H-T-M-L,
H-T-M-L uses really brief tags for some reason. And, you know, their paragraph tag is just the letter P, angle bracket, P, angle bracket. That's super easy to type. But then again, like, if you look at it, I guess, again, from context, you might say, oh, P for paragraph, got it. But really, like, angle bracket, H-1, angle bracket, I don't know how really intuitive that is.
You might figure it out. But, you know, I mean, out of context, I think that might be hard to figure out. Div, I don't even know if that many people really do understand what Div means. Like, it's just a div. Like, that's what it is. It's a thing. You know, so there are some tags in there that are a little bit harder to kind of understand. But not really. Basically, it's great. It's short.
Docbook doesn't do that. It is very explicit. And as I've said, I think that is one of its killer features, actually, because looking at a Docbook document, it might not look like plain text to the way that Markdown looks like. You know, Markdown, you look at that source document, and you think, wait, is this the final delivery? Because it looks perfect. It looks like beautiful, easy to read plain text. And then you look at Docbook, and you're like, yeah, that's ugly. I can't even tell the tags from the text.
But I honestly think that that's one of its strengths is that there is information about your information. There is data about your data. It is, it is meta-heavy. It has lots of information in it.
And so the fact that we have to type out chapter and title and emphasis, the word emphasis for emphasis, note and important and tip, all these section, all of these ordered list, it's not OL like an HTML.
It is literally ordered list. It is not a UL unordered list, like HTML, it's an itemized list. And you have to type that out, and it's a lot of typing.
But I think that's a feature, not a bug. I think it is something that is indicative of the way that Docbook thinks about itself. It's kind of, well, Docbook doesn't think.
But that's how the Docbook community, and certainly Norman Walsh, the creator of Docbook, that's how he thinks of this information. It's information about the information.
So let's not abbreviate into c-tags and p-tags and t-tags. It's chapters and pera and title. It is self-describing, and that's a good thing. It adds a lot of weight to what we're typing.
Now, in EMAX and probably a lot of other modern text editors, there's all kinds of options for autocompletion. So I don't actually type out a lot of titles. I type angle bracket c, and then it says, I think you're going to do chapter and I hit return, and we're done.
And then when I want to close that tag, I do an angle bracket forward slash whatever this is, and it closes. It looks for the most recently, or it looks for the unclosed tag closest to it, and then it closes that tag.
So the length of the tags while maybe it's a little bit disturbing if you're totally only used to HTML, but in real life, it doesn't really affect you, and frankly, to me, it really helps describe the document, and that's something that we'll talk about more as we go on.
So we got a chapter, we got a title, we got a paragraph, I think, right? So we do pera, and we type in text like this is how to do a doc book document, period, close, pera, close chapter, and then we could close the book here. I'm not going to, and I'll tell you why, because if we want to add another chapter later, we don't have to go back in and erase a book tag, it's silly.
So what we'll do is we'll make a footer.xml, and in the footer.xml, we'll have one tag, and that is angle bracket slash book, angle bracket, or article, or whatever you're writing.
Well, article you might actually do in one document, but assuming it's a book with lots of chapters and every file is a chapter, you got header, you got chapter chapter chapter, and then you got footer.
And the footer is just, it's just the closing of the book. And the advantage to that is that when you are deciding how the book is going to actually get assembled together, you can just do a cat header, chapter one, chapter three, chapter two, and then chapter four, and then footer, and then pipe that into a temp.xml, and now you've got your whole book in order that you want.
And it's completely valid XML, no one's going to complain about it.
So that's a doc book document, that's a doc book, a book specifically. And again, articles basically the same, except instead of book, you'd have used article, instead of chapter, you'd have used a section, and you can use sections in chapters as well, so if you wanted sections in your book, you can do that.
It's pretty easy, and then you close everything, and then you've got a file, and it's a thing. It's a doc book document.
Now you might be thinking, well, you make it sound easy, class two, because you know all the tags, and it's not a big deal to you, but I mean, like, how would I make something bold?
Oh, I guess I would make it bold, right? B-O-L-D. Well, no, unfortunately, not every, you know, you can't just make up your own tags in doc book, even though that's kind of how I've been presenting it.
It's been as if, though, oh, look, you just know the tags naturally, because you just think of them, and that's really super easy.
No, it's not easy. There's a lot of tags in doc book. There are many more tags I would venture to say than in HTML.
And again, that's probably a critique of doc book, people might say, oh, those two me tags, it's over verbose, it describes too much.
I don't think that's a bad thing. I think it's a great thing, and the super great thing about it is that you can find all the tags either online, or if you're using Emacs, you can find them in an Emacs menu in XML mode.
And so if you're in XML mode, I'm not stuttering, by the way, I'm saying it's called in XML mode, so the letter in, and then XML dash mode.
That brings up, at least on my system, I can't remember. I might have installed it separately, but there's this thing called doc book menu for Emacs, it's not hard to find.
And then it brings up, in certain modes, a doc book menu, which lists all of the tags for you in either alphabetical or logical order.
It's alphabetical being strictly alphabetical by A, B, C, D. Logical being more like, what are you looking for? You're looking for like a block element, and if you are, is it a graphic, or a call out, or a Q&A section, or whatever, and then inline what computers is in software, mathematics, things like that.
So you've got kind of scanned through the menus, and just figure out what am I looking for, and then kind of find the tags that apply to you.
And then if you select that tag, it'll even open up the web page of doc book, the doc book guide online, and explain how to use the tag.
So the doc book website is docbook.org, but I always just bookmark the latest doc book, the definitive guide is what they call it.
So that's TDG, the definitive guide, TDG.docbook.org slash TDG slash and in the version number.
So for me, I've been using five series a lot, so right now I have bookmarked for whatever reason 5.2.
So that's TDG.docbook.org slash TDG slash 5.2, that really didn't work well.
Anyway, that's where it is, I'll put the link in the show notes.
So that lists, I mean that's the whole book, so it's a book that you could buy in the store as well.
But it's by O'Reilly, so they do this really cool thing where they just have it online for free as well.
And I coasted on the free version for a long time, and then finally one day I woke up, and I just thought, oh my gosh, how can I not have a doc book definitive guide on my bookshelf yet?
And that's not like me at all, honestly, I don't usually do that, I'm very not about physical books.
And so what I did in the end was I purchased the book as an e-pub, so I got to have my cake and I was able to eat it as well.
And it's been great, I've absolutely loved it.
So it is online though, I've got it on my e-reader now, but it's also online, and I have a bookmark because I'm always going there.
And it lists all the tags that you could ever want.
And what I have done in the past is I will literally just do a find on that page to find something that I want.
So let's say I did want to make something, if I wanted to define a variable, maybe I'm writing about something, and there's an important environment variable that you could use when you're launching a certain application.
So I want to somehow highlight the fact that this is a variable, this is the thing that you're going to type in, and it's a variable.
So in HTML I would probably do something like, well give it an e-m tag or a strong tag, or maybe I would give it a span, and then set in that span a style, and I would call that style font, dash style, or font family I guess, colon mono, and that's how I would do it, right?
So in Docbook you don't have to do that, for a lot of things you actually have a tag for it.
So if I do a search, or if I do a find on this web page for var, for instance, I have, oh, in var, in VAR, a software environment variable, what do you know?
So what if it's a variable in a programming guide, like I'm telling you how to program in a new language, and I want to list a specific variable name.
Well, if I look further on the page, there is a VAR name tag, which is the variable, the name of a variable, it says right there, and there's a lot of tags like that, and there's not, I mean, there's, there's something for everyone, but there's not everything.
So sometimes you won't find a tag that you really, really want to use, and you'll have to do some investigation, but I mean, the same goes for anything else, right?
I mean, any other format that you're ever writing, there's some trick that you don't know yet, and you have to learn it, so I don't think that's such a big deal.
Now, because this is XML, it is not just very flexible, but it is also almost paradoxically complex and strict, I should say strict.
So sometimes you might find something that you think, well, yeah, that sounds like what I want, and then you go to use it, and it gives you an error later on.
So that's not how you use this tag, what are you thinking? And then you'll look further in the tag and realize, oh, that's not just an everyday tag, that's some kind of highly specialized thing that Doc Book wants to make into something special.
So for that reason, every tag listed on this website, you can click on it, and it tells you exactly how, and very frequently actually it even shows you how to use it.
It's got an examples section down at the bottom of every page, and it'll tell you the correct way to use that tag, and then it will tell you the legal parents that a tag can have.
So in what other element block can it appear, it will tell you the valid children that it may have.
So tags like Para, for instance, which is a pretty broad tag, if you click on that one, you'll see that it's valid parents is at least about, I would say, a fourth of this page, and then the valid children is about three-fourths of the page.
I mean, it's huge, it can have so much in it, and it can appear in so many other things.
But there are other tags, like Keyword, which is actually quite limited in how you can use it, and probably not what you think it is.
Keyword, when I first found that tag, I thought, well, this is perfect, I wanted a way to highlight this specialized term, I will use the Keyword tag, and it turns out that that's not what it is at all.
It's a specialized metadata thing for the info block. It's a key word about your book, for searching and stuff.
So yeah, there are so many tags that sometimes they can be complex to understand, but it's also XML, so you might actually find that there are ways to make it something even more exciting than what you thought it was going to be.
Because, for instance, for emphasis, that's the emphasis tag, you spell it out, e-m-p-h-a-s-i-s, you think, okay, emphasis is cool.
I'll use emphasis, and then you look at your render later on of the actual output of Docbook, and you realize that emphasis actually basically meant italics.
That's not what you wanted, you wanted bold, you wanted something big and bold, so you go back and you're looking for a bold tag, and you cannot find it.
Well, it turns out that if you go into the emphasis tag and read up on it, there's a part in the description of it called Processing Expectations, and it says, okay, this is an inline format.
Its emphasized text is traditionally presented in italics or boldface, a role, R-O-L-E attribute, of role or strong is often used to generate boldface, if italics is the default presentation.
So, that means that if you go back through your document and you make all your emphasis, emphasis space role equals bold, now you get rendered output of bold instead of italics.
So, in other words, I tried to make it super easy sounding for the basics, and in real life, when you get down in the thick of it and start actually writing Docbook, you're going to find that yes, you have more tags than you know what to do with,
until you need something and then you can't find it. And that's aggravating, but again, I think it's par for the course for anything that you are working on ever.
I've never, ever, ever started using some kind of system and just never had to refer to the documentation that just doesn't happen, even with markdown.
And I hate to say that because I know that markdown is precious and special and perfect, but it's true. You don't know how to do a code block at first until you learn that for whatever reason markdown decided that four spaces from the left margin equals code.
Okay, so not that there's anything wrong with markdown, I'm just using it to kind of give context.
Okay, so we've got our Docbook files, right? We've got three right now in our example. We've got header XML, we've got chapter 1.xml or chapter.xml, whatever, and then we've got footer.xml.
There are three parts of a thing that need to be one. I've already said that in order to get those into one, it's a simple cat command, catheader.xml, chapter.xml, footer.xml, and then cat that out into something like temp.xml and there you've got your full book.
And that's just the easiest way I find to manage my Docbook files. Again, you don't have to do it that way, you can do it as one big file, monolithic file.
And I have done that once or twice with an article where it just kind of seems silly to break it up into different files. But generally I do like to keep things modular.
So you do that, let's say. You can cat them, you can catanate them together. Now you need to process them. And then I've said, as I've said, the HTML stuff out there gets processed by a web browser.
You don't show people your raw HTML and say, look at my web page, you don't print it out. Just like, look, here's my web page, isn't it beautiful? You don't do that. You show it to them in the web browser.
And the web browser looks at all those tags, makes them invisible, and then it renders the output for your readers in some certain way.
XML also has processors, but it's not real time. It's not like an HTML browser. You don't have an XML browser that you tell people, hey, go to open up your XML browser and look at my Docbook document.
That's silly. You take your Docbook source and then you process it in some way into some other format that you think your readers or your audience would prefer.
So there are, I would say, three processors out there that you should look at. The first one is definitely the easiest and can be the most hands off. It's called PanDoc, P-A-N, as in all, PanDoc, DOC, as in document.
So PanDoc is a Haskell, Haskell, Haskell-ly programmed little application. So if you're compiling it from source code, there's a whole heck of a lot of Haskell dependencies that you'll have to install.
I think Fedora, I know, has a static build of it, which is pretty handy actually. And otherwise, you're probably just using a package manager anyway, so who cares? It'll just pull everything in and install it.
So PanDoc can be a very dumb program in the best way. I'm talking about PanDoc, we said our document was temp.xml, right?
So we'll say, PanDoc, temp.xml, dash O for output. My book dot, what do we want? EPUB? Yeah, let's go EPUB. EPUB. And PanDoc will, with that much information, take your XML file, valid or not, it will try its darnedest to process it and turn it into an EPUB that you can then open and read on your computer using F.
Reader, using F.B. Reader, or EPUB Reader on Firefox, or on your E. Reader, or whatever. Simple as that, super easy. Same goes for something like PDF, or maybe you want to go out to HTML, same type of command.
Now you can be more explicit, you can do a dash dash from doc book, dash dash two, whatever format you want, maybe you want specifically EPUB 3.
So then you would do a dash O for output, and then you'd do my book dot EPUB again, or maybe HTML, no chunks, or maybe you want chunks, I don't know.
So there are a bunch of different options, and if you run PanDoc, dash dash help, it'll tell you all the different options that you have available to you.
And it's a big one, it's a big list. But it's great, it's fantastic. I mean it goes from just absolutely practically anything to practically anything else.
I mean generate a document to JSON, or media wiki, or ODT, or org mode, RST, or everything, latex, docX, even, I don't even think I really, oh input, yeah, I knew that it would take docX.
So anyway, lots and lots of different options. So, and super, super simple, like you can make it that, you know, you can just use it as kind of a big hammer to convert documents, and it just works, you don't really have to think about it.
Now, with that ease of use comes a little bit of loss of independence maybe, not really, but let's just say that for now. Meaning that, you know, it's certainly the way that I use PanDoc. I basically just fall back on its own very clever styling options.
And I basically just don't get involved. I just tell it, hey, PanDoc, convert that document into a nice pretty PDF for me so I can then post it on this site over here so people can come download it.
And I don't, my hands are completely off of the styling process. So I just, and believe me, you can do that comfortably because PanDoc uses super nice looking style sheets. It really does. It looks super slick, like really, really properly nice.
I'm just saying, if you do that, then you're not really involved with the styling. And maybe that's fine. It just depends on your use case.
And then if you want to be more involved, you certainly can be. I mean, you can get in there and muck around with all the styling and point it to different style sheets and stuff like that.
So you don't, what I'm trying to say is you don't have to do it where you're not dealing with a style, but generally speaking, and especially since I'm kind of pitching PanDoc to you as the easy answer, just use PanDoc. Don't worry about the style.
Like, what are you, what are you an elitist? Just get the thing out of Docbook and dump it into an ePub and distribute the thing. It's not that hard. Or HTML, whatever.
So that's great. And the cool thing about doing that, like, Docbook to HTML or whatever, is that it's got a lot of, like, if you've got a chapter and another file with another chapter, or it doesn't even have to be another file, you know, I mean, you got chapter things.
Well, Docbook is smart enough, and I think last time I checked PanDoc is smart enough to make those into separate pages, you know, so you'll be reading a page and then you'll get a next button. And you'll think, wow, where did that next button come from?
My friend, it's Docbook. Of course, it's going to give you a next button. And previous button, too. So all that stuff just comes here free. Having said that now, I actually don't know that I've used PanDoc all that much for conversion to HTML. So I might be lying, but other processors do that for you. So on some level, I am correct. And I might even be correct with PanDoc. I just haven't used it in a while for that.
So the other processors, since we are talking about features now, are XML2 and XSLT proc. So XML2, that's XMLTO, not the number. It's XML2, is a shell script, actually, by some guys at Red Hat. And it is a pretty nifty, useful little shell script that, again, kind of takes, you know, almost anything and processes it to almost anything else,
where anything is more defined along the lines of XML land. Not necessarily, because there are some non-xml output formats, but mostly it's very much XML as the name sort of implies. It's not PanDoc, it's XML2.
So with XML2, you can do something like XML2, and then you tell it what you want to go to. So, for instance, it might be, let's just use EPUB again. So you might say EPUB. And then you would do the output directory that you want it to go into. So it defaults to the current directory. So I guess we won't do anything there. And then you give it the file that you want it to process, which is, in this case, we said temp.xml, I think.
So the command, again, is XML2epubtimp.xml. So that's actually pretty intuitive. It's almost like a sentence. You want to go from XML2epub.xml. And that's it. It takes that XML file, processes it, does whatever it needs to do, spits it out as an EPUB for you in your current working directory.
It's a great little command, and it still amazes me to this day that it's just a shell script. It's not even a pearl script. It's like bash. It's really, really cool. It's like about 622 lines, so it's a fun read.
But it can go from, well, certainly doc book. And then from there, it can go to EPUB. It can go to foe. So foe, which is something like format output or something like that. I never really exactly remember what that stands for. But it's the, it's the, it's kind of a post script D type of thing where it's like describing how to print a document essentially.
It can get turned into PDF. It can go out to HTML. It can go out to HTML dash, no chunks. It can go to a man page. It can go to PDF. If you have the right PDF subsystem installed SVG, TXT, XHTML, XHTML, no chunks, probably a couple more that are less interesting. So or more interesting, but less common. So yeah, that's, it can, it can process your doc book stuff.
Now, the nice thing about XML2 as well. And I think this kind of goes, I think this is true also for, for Pandoc, except I think it's kind of explicit in Pandoc, which is kind of interesting. But I guess the explicitness of it makes it easier. But anyway, in, in XML2, if you want it to kind of it to be more tolerant of errors in your XML, you can do a dash, dash skip, dash validation. And then it will just not validate your XML. And it will process things as it, as well.
As it can, there's also the string param argument, which is dash dash string param, and then space some parameter name equals some parameter value. And you might think, okay, great, cool. I have no idea what that is. Well, what it is, is it's a, it's an on the fly styling option.
And this gets into some really juicy doc book, SSL kinds of stuff, but it's believe me, super powerful. And there may be something equivalent to this in, in Pandoc. I've, I've just never used Pandoc for that stuff. I, Pandoc again, it's my, it is, it's my really quick instant, don't want to think about style options. So when I'm using Pandoc, I am by, by definition of me using Pandoc, I'm not thinking about the styling stuff.
So I don't know a whole lot about Pandoc styling options, but with XML too, and certainly with XSLT proc, it's super easy to pass through on the fly formatting choices to your processor. And that is the string, dash dash string param option.
So for instance, you may do something like dash dash string param, space page dot width, space eight, I in, and that would set the page width in the, uh, XSL in the styling of the XML. So XSL, I should, I should say that's a bit like CSS for HTML. It's the presentation side of, of your, of your data.
So you, you can define, for instance, the page dot width, or the page dot height, or the body dot font dot family, or the title dot font dot family, all kinds of things like that. And, and once again, you're probably thinking, well, okay, that's great, but how do I know what those parameters are? Well, yeah, I admit that's, that's harder to find out, and it takes learning, and you have to look it up. And it's not super easy, and it doesn't always work the way that you want it to, and it's a lot, it's, it's not as easy as CSS.
I'm assuming, of course, that you know CSS really well, because otherwise, you'll think easy as CSS. CSS is super hard. I could never get that stuff down. So, yeah, there, there are levels of complexity here.
But again, if you don't want to get into this stuff, then, um, don't write. No, I'm just kidding. Then, um, then don't, don't use XML to, just use pan doc, or use XML to, but don't, don't go into the styling side of things.
And you have to admit, um, well, I won't, I won't go on with that thought. I'll just, I'll just say that there are options to get into string parameters here, so that you can, you can bypass the, the style settings of whatever doc book style sheet you are using by default, and override them according to what your output destination actually is.
So, that's XML 2. The third one is XSL T proc, and that's, I'm going to say kind of basically the same for all intents and purposes. You could say that's the same thing as XML 2. I mean, it's completely different.
But, um, it's, it's kind of the same principle. You're doing XSL T proc, so you're applying a style to your XML document. That's why it's XSL T proc. Come to think of it, I don't know what that T is supposed to stand for. Never really thought about that.
Um, but anyway, XSL T proc, um, will, will give style to your XML, and for that, you need to have a style sheet.
Now, doc book ships with a bunch of style sheets, and they are, they are not known for their beauty. They are known for their functionality, but they're not known for beauty.
And so the command for XSL T proc, the most basic command would be XSL T proc dash, dash output, um, build dot, or build slash TMP dot F O, for instance, because we're, we're going to send it out to, to an F O file, and I'll tell you why in a moment.
Um, and then it would just be, uh, the name of the file, temp dot XML. That would be the, the basic command. Now, if you had your own style sheet involved, then you can pass that before the file, your, your actual file name.
So it would be my style, maybe dot XSL, and then temp dot XML. And then furthermore, if you wanted to insert string parameters, then even before your, my style, that XSL, you can do the dash dash string.
And then some, some, some key equals some value, or no, some key space, some value. And that's how you can use XSL T proc to generate what is called a dot F O file.
And again, I don't remember exactly what F O stands for. It's something, it's honestly, I swear, it's something like, um, formatting output or format, format, um, formatting objects. That's what it is.
Formatting objects processor. So formatting object. And that's what an F O file is. And with an F O file, you can then feed that into yet another processor called FOP, which is formatting objects processor by Apache, uh, it's Java application.
So you do FOP. And then you give it the input, which is, in this case, build a slash TMP dot F O. And then the output, which would be, let's say, distribution slash temp dot PDF, or not, temp my book dot PDF.
And that would generate your PDF file. So obviously I started with the most simple one, which is pan doc. It's basically one command you're done.
Then X and L2, which could be one command, could be two, could be some arguments thrown in there. And then XSL T proc, which is a lot more complex.
But those are your options. And that's great. It's great to have options because now you can choose your level of involvement with what you see in the end.
Like I say, a pan doc gives you some really pretty output. It just happens to be someone else is pretty, you know, like someone else is defining what this is going to look like. And, and that's that. Now you can probably get in there and try to modify some, some style.
But for me, if I'm going to do that, I may as well just use some, some fairly raw tools like X and L2 or XSL T proc. And that's pretty much it. That's your doc book workflow. And it only took me an hour to convey it to you.
See how simple it is? No, but I'm serious. I mean, I've talked a lot here. But the real basics of it is, hey, go get doc book, install it, whether it's your package manager or just download it from the doc book site and put it in, you know, put it in your documents folder or whatever.
And then start writing doc book, you know, just open up a file, do your little header thing. And again, the doc book site gives you very, very good examples of valid doc book snippets. So for every tag that you look up, you'll see the things that you need to know to make that work.
So if you're doing an article or a book, then just go to article tag and down at the bottom, there's a full article. I mean, it's short. It's like, you know, one liner. I mean, the actual contents of the article will be like one sentence. But that'll still show you the structure.
And then maybe you think, well, I really want to put the authors and some more information in there. So then you look at the info block. And again, it gives you a full little snippet of how you would implement info in your thing.
You do that. And then you get it into one file somehow, whether it's just natively in one file because you're a simple person or you keep it modular and make it complex. And then you cat it into something. And then you throw it through a processor.
So first time I vote use pan doc super simple pan doc from dash dash from doc book dash dash to, let's say, e pub or PDF, whatever. And then dash over outputs, my doc book file dot whatever e pub or PDF or HTML, whatever. And then the input file itself, which is temp.xml or, you know, my doc book file.xml, whatever you've named it. And you're done. That's it. That you have now completed the doc book training.
Super, super simple, not scary. Try it out. Try it today. And hey, I'll hit you again with another episode about doc book and why I love it.
You've been listening to hecka public radio at hecka public radio dot org. We are a community podcast network that releases shows every weekday Monday through Friday. Today's show, like all our shows, was contributed by an HBR listener like yourself.
If you ever thought of recording a podcast, then click on our contribution to find out how easy it really is. Hecka public radio was founded by the digital dog pound and the infonomicom computer club. And it's part of the binary revolution at binrev.com.
If you have comments on today's show, please email the host directly, leave a comment on the website or record a follow-up episode yourself. Unless otherwise stated, today's show is released on the creative comments,
atribution, share a like, 3.0 license.