Files
hpr-knowledge-base/hpr_transcripts/hpr3367.txt
Lee Hanken 7c8efd2228 Initial commit: HPR Knowledge Base MCP Server
- MCP server with stdio transport for local use
- Search episodes, transcripts, hosts, and series
- 4,511 episodes with metadata and transcripts
- Data loader with in-memory JSON storage

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-26 10:54:13 +00:00

442 lines
40 KiB
Plaintext

Episode: 3367
Title: HPR3367: Making books with linux - part 1
Source: https://hub.hackerpublicradio.org/ccdn.php?filename=/eps/hpr3367/hpr3367.mp3
Transcribed: 2025-10-24 22:00:44
---
This is Hacker Public Radio Episode 3367 Fortusity, the 29th of June 2021.
Tid's show is entitled, Making Books with Linux, Part 1.
It is hosted by Andrew Conway and is about 56 minutes long and carries a clean flag.
The summary is a discussion about assembling books, using simple tools commonly found in most Linux distros.
This episode of HPR is brought to you by AnanasThost.com.
Get 15% discount on all shared hosting with the offer code HPR15.
That's HPR15.
Better web hosting that's honest and fair at AnanasThost.com.
Hello everybody, welcome to Hacker Public Radio.
This is Dave Morris and today Nallu and I, that's Andrew, are having a bit of a chat about
a particular subject. So I think you're going to kick off today, Andrew.
And do you like to talk about where we're coming from with this?
Yes, well thanks Dave, yes and hello to all the HPR folks out there.
Yes, well this is, I think, classic HPR material in that it turned out that Dave had an itch
and I had an itch and we were both scratching our respective features,
Antiscofford, that they had something in common in how we were doing the scratching.
I am talking purely metaphorically here, of course.
That's a relief.
Yep, with coronavirus restrictions, I can't scratch Dave and I don't think he'd want me to.
But anyway, what I'm talking about to be less generic is that we're both generating documents
to be published, made public and we want to do it with simple,
maybe sort of unixilinux-like text processing tools. So we both have ended up starting from
Markdown and we want to do a lot to transform it into something we can put on the web, for example,
or republished in some way, but we also are interested in doing processing, for example,
for my case, for references and also to make an index. I think it was generating an index
of, for some material, that's the question that you asked that I latched on to,
some questions as well. That's right. Yes, I was searching for a generally available way of
making an index out of a Markdown thing, without really thinking it through and you said that
you'd done this and pointed me at your methodology, which approaches it from a root that I hadn't
quite thought about. So yeah, there's lots of mileage there for talking about how it's done and
you know what we wanted to do and ways and means of achieving it. Indeed, and I should also
mention that the route that I started down in generating a book, because that's what I'm doing,
I'm creating a book to be published by actually a regular publisher, ultimately. But the reason
that I started down this route years ago, I think, was an HPR episode by John Culp, where he was,
if I remember correctly, I think he might have been taking an out of print music book and
republishing it under Creative Commons, or public domain, I don't recall details, but I really like
the way he just kept it simple with a bit of, I don't know if he started from Markdown or HTML,
but one of the two, a little bit of CSS, it was such, there was so light touch, and I thought
that it was just so simple. That's what I'm going to do, and I really have never regretted it.
It's been, I'm able to automate everything about the whole process, and yeah, well,
rather than talk, talk around it, maybe we should just get stuck in.
Yes, yes. Well, we decided in our sort of pre-chat that there was probably enough material here
for a couple of shows, which I'm sure Ken would be uploading as a distance, cheering.
So we would sort of give a summary of our two different positions, and our needs, and now we've
had to solve them and solve them and maybe have a chat at the end. So today, you are going to kick
off with your situation, I think Andrew, yeah. That's right, and you can quiz me and pull me up
if I'm not being clear, and then we'll switch roles for the next one where you'll discuss what you're
doing, and I'll quiz you as we go. Okay, so yeah, I'm generating a book, which is composed of
chapters, and I have figures, like graphs and charts, that kind of thing, and I also have
tables in the book. But my starting point is essentially a text file, so each chapter
is a text file, and it's in Mark, I write it in Markdown, and I can also throw, I actually do
throw occasionally some HTML in there for something that's either not supported in Markdown, or
ambiguously supported, depending on flavor of Markdown. So it's mainly Markdown with a little bit
of HTML, and one of the actual bits of HTML comes in with the figures. Now, when I write, and I want
a figure in there, I'd actually just write in the text, I think I write a greater than sign,
which I think means an indentation or something, I forget what the greater than sign means in Markdown,
so, but, and then I write figure, and then ampersand, NBSP, same equal on, for non-breaking space,
and then I will write a tag, like age underscore distribution, that shows the distribution of ages
in the population, you know, and then when I want to discuss this figure somewhere in the text,
I will just use that same tag, that when that tag appears in the text, and an idea is that some
post-processing, these tags will be filmed, numbered in order of, in the chapters of the first figure
will be in chapter three, will be 3.1, the second with 3.2, and I don't need to worry about their
references. Now, I mean, that's, I'm sure there's other tools out there to do that, and I know I've
used latex in the past to do this, but latex was just too big a versatile hammer for this job,
I want to keep it much more simple than that, as I said before. So that's the first job,
is that I have references for the figures and tables. Now, not only that, but the, the other thing
that my post-processing will do is it will, where it finds one of these figures, it then knows from
that tag, like I forget what I said before, but like ages underscore population, say, it will then go
off to a directory, and it'll look for ages underscore population dot CSV, and if it finds that,
it'll then fire up another script, which will turn that CSV into my figure, and the CSV file will
contain not only the data, but some meta information about what the graph should look like, whether
it should be a bar chart, where the legend should be, if there should be a legend, all this kind of
stuff and the scale of the graphs. So, so the principle was to everything, every bit of material for
the book, it starts life as a text file in the soul processing, and so the workflow is to write
the source material, which I do entirely separately, and then when I'm ready, I then run a script,
and the script then it literally just takes the chapter, some chapters don't have any figures,
let introductory chapters, no figures, so I can just literally, I actually want to do, is I take
that chapter, and I, and I, I cat it to all a file called all that marked on, and then I use
Python, some very short Python programs to go through and put in the references that I just
described, and it also will generate the image file, and the IMG tag, which is a bit of HTML that
will go with the figure caption is in the chapter, and then it spits all of that out, and it
pins it to the all dot marked on file, so I'll call that for chapters one, two, three, and it'll
just go through and do the whole lot, and then tack on at the very end, it will spittle using cat,
again, or an echo commands, just the, the end material that goes right at the end of the HTML file,
to close off the tags and stuff like that, and then I run it through markdown underscore pie,
that takes the all dot marked on file, and generates the HTML file, and then the script that pulls
all this together, well then I can tell it to create a draft, which will then open up in the web
browser automatically for me, so I can even, I have got the set up so that as soon as I save one of
the chapter files, I use a command that will, will monitor the text file, see if they've changed,
and if they have changed, run the script, and it will build everything, and then immediately refresh
the web browser, so that I've, I've got almost a live rendering of the book, or a part of the book,
as I'm writing it, because sometimes that's useful if you're checking layout, and proofreading when
that's brief. I do something very similar, and it is immensely useful, because you can, you can
prang your, the look of your duck, I can then you do, I shouldn't say you can, I can make a horrible
mess of my document without realizing, it looks fine in the markdown, but it's awful when it gets
converted to HTML, so seeing that in as close to real time as possible helps me a huge amount.
Yes, I mean I think, and it's not whizzywig, but it's as close as you can get to it with this method,
I mean that is the downside of this method, it's not whizzywig, but then if you're used to,
you know, writing HTML, or latex, or any of the, or markdown, you're used to not seeing what you
exactly, what you're going to get until later in the process. I should mention that I just looked,
the, the way I did that automatically, it's just a one line thing on the command line,
if I'm working in chapter five, the command I would issue would be LS, chapter five, dot markdown,
pipe, and then ENTR, enter space, and then the name of the script, which in my case is called
markdown to HTML, so that ENTR command is the one that effective in monitors, in this case, chapter five,
and it'll notice a five changed anything to do with chapter five, and then regenerate the regenerate
chapter five accordingly, should I, should I change it? So, so the other things that I,
other options I have is I've got, I've got basically four options, one is draft, in fact there's five,
one is draft, one is just to check references, that's minima references, of course web links,
so the check references, all it does is goes through and checks all the links are valid and tells me
if there's any, it gets any 404s or 403s or something, what's the move to one, whatever the move to
one is, or sort of server ever, that comes occasionally, so the check refs is one, draft is the working one
I mentioned, this print, which is basically generating the final thing for the publisher,
and then there's web, which I don't really use, but that I could use to put a version on a website,
I haven't used that in a while, but the last one is, will generate an ebook, so it'll generate,
I think, looking at it, it generates an ePub, I think, although maybe it's actually more flexible than
that, I wrote this so long ago, I don't actually quite remember what it does, but it definitely does
generate an ebook of some kind, I can see that, I see lots of opf and ncx files,
I should also mention that a lot of the formatting takes place in the CSS file, that's a very
important part of the process, in that I say nothing about how it should look beyond tags,
formatting tags, the formatting, what h1 means and what the p tag means, all of that stuff
is kept strictly in the CSS file, and that actually greatly helps with keeping the draft,
the print, the web, the ebook, version all quite distinct, because that difference all takes place
in the CSS file, really. So yeah, that you have covered areas that I've been tangling with,
it's not just the book in my case, I actually came up with something similar for making my
HPR show notes for any show of any complexity that I do, so I actually do two types of show,
as far as the computation is concerned, one of which is fairly simple, it's just like
the notes are in the database on the HPR site, which is when you post stuff through the form,
it gets dropped into the, eventually gets dropped into the database, and that's what served up
when you go and look at an HPR page, but I also do a thing where I write longer, more complicated
notes with images and whatever, and examples as a separate file that gets put up alongside the
the audio and stuff on the HPR website, so I wrote a thing which manages all of that, and I used
make to build it, and I've got a thing that creates a make file, depending on what type of thing it is,
and whether it's got pictures and stuff, so yeah, I ended up looking at my book requirements
with those eyes thinking, oh yes, I could make a, in fact I have written a make file to manage it
all, so you can do make PDF and bang out comes a PDF and that type of thing, but yeah, different
approaches to the same sort of idea, fascinating, that we've come at it, come at the same similar
problem in such different ways. Yes, yeah, now it's interesting to mention makes, I did look
going down that route, just, you know, it does this feels like you're compiling the book,
aren't you, I mean it's like you compile computer code, it doesn't feel very much like that,
that I'm compiling the book, now why did I not go down the make route, I did look at it and then
decided against, I think, I think it was just, I think it was just another layer of complexity I
didn't want to tangle with, I felt that the bash script, I mean I'm looking at everything I've
just described, it takes place in the bash script, and the bash script is only 57 lines long,
and about a quarter of those lines are comments, you know, just, so, even the majority of it,
yeah, it's very simple actually, and the bash script really only calls Python, I do refs.py
through Python, and it uses, as I mentioned, Markdown underscore pi, so that's it, you know,
I don't think there's anything else other than that, it's all text-based, you know, cats and
echoes and pipes redirects to a file, that's all there is. Yeah, so I think that was part of what I
was, part of the simplicity that I was going for is that really I wanted to use as few tools as
possible, and Python is the only what you might call dependency that I've got here.
Yes, yes, I came at this, as I said, I don't want to digress too much, but I came at this
originally thinking, wouldn't it be nice if there was a way in which you could make your HPR show
relatively easily, I started writing a bash script to hand out to the to the world,
that would allow you to do things like bring together notes and maybe turn them into HTML
through some route or whatever, and would even submit the show for you to HPR with all of your
credentials in the days when we use FTP for it, but the thing grew and grew and grew, I'm terrible
at coming up with an idea, and then, you know, like Wallace and Grommit is attaching a few planks
on the end and the big nail and stuff, and it grows, and you know, you know, that's stable,
so I'll just put another chicken on the end of that, and you know, so yeah, you've come at this
from a much cleaner position, I think, much simpler and more maintainable, I would imagine
go. Well, I don't know. Yeah, I don't know, I mean, you were thinking of, you mentioned there,
you were thinking of other people using your script, weren't you, and that was a consideration
what you were doing. It was, yeah, originally. Yeah, and I think that's a difference, because this
was, this is just to scratch my own edge, this whole file, I never, I mean, I'm perfectly happy,
this particular file that the script I'm talking about, it isn't, I haven't put that
out online anywhere for people to hack or own with, but the other components of this whole thing
are the bit that does the figures, for example, and the index, they're all online, we can put the
I can put the GitHub link in the show notes for those, but it's actually the screen in the script
isn't, you know, I'm still, it's only for my own personal consumption, so I'm slightly embarrassed
about bits of it, and of course, there's bits of it that have still have bespoke links to pass
that only makes sense in my file system, so I have to, you know, I have to do a bit of work
before I can share it with the world. I feel, a lot of the people might say no, I just publish it,
but, you know, there's also the thing is, did I accidentally write my password in this window,
I don't think I'll have it. I'm going to say the same sort of mental process is, oh,
looks interesting, I haven't got time to check it now, I'll do it later.
Yeah, well you know that thing where you're, you've got a command line window and you're,
I mean, yeah, I use SSH keys, but still sometimes you have type in the password,
and you type in like, you know, for example, if you're doing a pseudo or something like that,
yes, you to root, and you know that thing where you're typing, and you don't, you've forgotten
which window you're in, and your password goes in plain text into another file, and you don't notice,
yes, I've done that so many times, well, maybe two or three times, but you don't notice at the
time, it's like, why is my password not working, I'm pressing return, and then later on you're
looking at text file, thinking why is there, why are there like ten new lines and my password in
plain text in the text file? Oh yes, oh yes, yes, I have paranoia about this, I have to build
systems that prevent me being an idiot in order to avoid, just on that subject actually,
the I'm using a thing called eChain that comes from Fun2, which lets you manage through a
SSH agent, you can set, you give it a passphrase, I've got a SSH passphrase solution I've
had before, give it a passphrase at the start of the day, and then it runs all day long as long as
your machine's up, and it feeds keys to whoever needs it, and that sort of stuff, so that's made
life a lot easier for me, I hardly ever typed my passwords in. All right, okay, and maybe
that's something I should look into, that's not as useful. Anyway, we digress a little bit,
which is fine, you know, but the other bit that I wanted to talk about, unless there's anything
else that you wanted to go over first, I'll talk a bit more about my stuff in the next show,
I think, so rather than keep interrupting you. Okay, no problem, the next bit that I wanted to talk
about is how I create the index, and this is where we cross paths initially, and this was born
of a conversation, I was sat over with a publisher from a book over in Edinburgh, and I live in Glasgow,
Edinburgh seems like a long way away, which is where you are today, of course.
Yeah, so I'd go over to especially, and we're having this conversation, and he said, you know,
my book's full of facts and figures about Scotland, that's what it's about, and he was saying,
oh, yes, well, we don't really need an index in this book, and I think my look in my face must have
been of utter hoarder, like, as a former academic and nerd, a book with an index, especially one
that's factual, it's not like a novel or something, which doesn't need an index, I always like,
I wish somebody had taken a picture of me because I was horrified at the suggestion, and I pretty
much said to him, I'm horrified, is probably, you know, I tried to modulate, you know, my reactions,
but I was really genuinely horrified that he would suggest my book would not have an index,
and then he went on to explain the technical difficulties, and I went, and then the
in a bit of a broad old character, and I acted, act generating an index like, like that, and it's
played a snap of fingers, so, and he went, well, if, if Andrew, if you think you can generate an
index that quickly, then then then yeah, let's do it, so the deal was that we'd get through the
proofs, I'd make all my corrections, the very final version of the book, just before it went to
the printers, they would send me the PDF, and my job was to then create an index. Now, I have talked
to other authors, and they sit and they read through the book, the final copy of the book, and they
write down a word, and then they write down the page number, I'm not having any of that nonsense,
I'm far too lazy for that, so I don't blame you, yeah. Now, so the first thing I checked, as I said
from, like the PDF is a text-based PDF, it's not like an image, or that's going to the printers,
and it wasn't, it was genuinely a text-based PDF, which is important, of course, because you can't
parse an image, well, you have to use optical character recognition, of which, of which actually
can, has just released an episode about using some kind of character recognition, hasn't you?
Yes, I saw that in the last couple of days, I think I did toy with that once, but it's very
difficult to get right, no, much better if it's a text-based PDF, so that was the first win,
that it was a text-based PDF, not the figures, but as in the graphs, but all the text-based elements
were in fact text, so my first job was, well, how can I turn that into a text file that I can
then search, because I want it as text, I want to get rid of anything that's not text, and just
keep the words, because then, and the page numbers, you know, I need to know what the words are,
and what page is there on, so I actually did quite a lot of hunting of different tools, and eventually,
in Slackware, it comes with some PDF tools, I can't remember the package that they're in,
but the command that I found on Slackware, pre-installed, part of the Slackware install,
was PDF to T.O. text, PDF to T.O. text, and it did everything I wanted, and I had to
fatal with the command line switches and read the man page a bit, but essentially what it can do
is you can give it a page range, and of the PDF file, it will suck in the PDF file and spit out that
page as text, and so that's the ideal thing, because if I can just produce one page of the PDF
at a time of text, I know which page this is on, I've got the words, I can then do a script where I
search through for a search term, and then I know that search term, let's say the search term is
GDP, for example, gross domestic product, GDP, I can then search for GDP in caps,
on I find it in that page, I then have an entry for the index, so I essentially just wrote a
bash script that working in that principle went, it reads in a set of a text file, I think it's
called terms.text, and these are words that should appear in the index like GDP, or economy,
or population, that kind of thing, and of course there are times when you might want the word,
you might want to find the plural, you might want an acronym, so the way I set it up is that each
line of the text file had GDP, maybe I think I used a pipe character, then gross domestic product,
and then I think I had pipe, then I had keywords for plural that it would identify a plural,
and the code is actually quite simple, but it can distinguish words that should have a plural
where it's yes, yes, or yes, that's kind of stuff, so I just literally take a text file, and I
in this very simple syntax, write down every search term that I want, and then this bash script
uses PDF to text, and then a bit of some kind of regular expression searching to check whether
which terms are on each page, and then at the end of course you then have to sort the terms into
alphabetical order, and then put the list of page numbers, or page ranges, because if you've got
a hit on for a GDP on page 101, 102, 103, you don't want to list all of these individually,
you want to be 100, 200 and 3, whatever, 103, you know, is the style that's used in the book,
or a conversation to separate the list of pages, and actually there was a few gotchas,
there was a few weird characters, the upset and invisible characters, but I was able to catch all
of them, you know, just a bit of gripping, search and replace, reg X's crafted for the job,
and it worked extremely well with a very small amount of fettling at the end, there was a few
times where it really went to tone on certain, like EU was a particular problem, I don't
seem to remember, because though you wouldn't think that EU kept appearing inside other words, and
I can't remember, there was a problem, I had a problem with EU, and it wasn't reg X's,
not the EU's party, I don't mean that, but I remember some reason that generated a huge number
of hits to EU, more than made sense, and I couldn't quite get to the bottom of why,
so I think it was a very short acronym, and that was basically the problem,
so I had to go in and do a bit of fettling and improve the script a little bit,
but then I sent within a day, this was all turned around within a day, went back to the publisher,
and they were astounded, they never seen an index turned around that fast before, and said,
oh, could you share that script with us please? And I'm thinking, really, don't publishers have
a standard tool for this job? I mean, I know they're a quite small publisher, these guys, but
you know, it was like, as if it did feel like I'd discovered some kind of gold to them,
unfortunately though, I couldn't actually get it working the script, wouldn't work reliably
on windows at the time, there was a few problems, I don't know what they were, never got it working,
and there was also problems with it working in Mac, which surprised me, because I thought
that would be closer to Unix, and so those problems I never was quite able to solve, I should
go back, but I think the main problem I couldn't solve in the Mac, which they were using is that
I didn't have a Mac to test out on, so that would be an interesting project for someone else in the
Mac. Yes, yes, so you pointed me out the GitHub repository that contained the tool to do this,
which is a Python script, so is that the sort of later development? Yes, sorry, it's a Python
script, I said bash script, there is a bash script as well, that was an earlier version of it,
and then I found the better way to do it, and as you do, I'm doing it in bash and then going,
oh no, I can't get this to work, I need something with it, or oomph to it, so yeah, quite a reasonable
thing to do, so yes, I've had a look at you, I've actually tried running your Python script,
and it's a great job, really good, nice idea, and as you said, it's fairly simple in concept,
but there's a whole bunch of things you need to cater for, but the principle of taking the page,
looking for particular keywords, and then keeping a record of what you found, and then consolidating
it all and printing it out at the end is great, it's perfect. Yeah, no, it seems to, you know,
actually, it's one of these things where when you start it, you think, oh god, I think we've
got a bit carried away when I said I could do this, and when you finally get it working, you think,
oh, that wasn't so bad, but there wasn't, along the way, there was quite a lot of gotchas,
which don't, you know, like you don't see them, and when I look at the script, it doesn't look that
complicated, but I have to remind myself, I think a lot of the gotchas I got around by selecting
command line options to PDF to text, and I can see there that I've got the URL end of line option,
the Unix option, in particular, are the two magic ones that solved a lot of my problems.
Yeah, PDF to text is a bit of an odd thing, isn't it? I have used it myself, and not fully understood
all the options, bit trial and error was needed. Yes, I think the thing that, I mean,
I mean, there was a lot of trial and error, but the trial and error was I tried different tools,
and PDF to text was just the one that threw up the least number of problems, they all had problems,
but quite a few of them were really couldn't handle boxes, you know, I don't know how, I don't
really understand how PDFs work, but some of them really just couldn't handle figures and tables
that like broke up the text, and they would just go a bit mental, and throw a wobbly at that point,
and the rest of the text, and on that page was garbage, but PDF to text just ignored them,
completely, which I quite liked, you know, it said like, this isn't text, I'm not interested in this.
On we go. Yeah, yeah, PDF is a strange, strange beast, isn't it? I think, in some forms,
it's effectively post-script embedded in a sort of container thing, I think, and that can be
pretty hairy. Yeah, post-script, no, that's something I've not tangled with. I mean, post-script is almost
like a language, isn't it? Yes, it's trying to complete language. Yeah, I mean, I remember being able
to, in my latex days, in the 90s, being able to read post-script and troubleshooting, I mean,
not like, I could really understand it, but it always looks really arcane and strange.
Yeah, yeah, yeah, it is. It needs a completely different mindset, it's quite fun in its way,
you've got nothing better to do. But yeah, yeah, it's, I think it's a great solution.
My problem was that I had looked at doing this with EPUB because EPUB is a whole different issue
because there aren't any pages as such in, we're not sort of locked down pages, is that right?
I mean, it basically is an HTML document in a container, isn't it? Yes, it's just, there's no
concept of pages as far as I'm aware by default, and it just reflows the text depending on what
size your screen reader wants to display. Now, now, having said that, I've got a feeling that I
have read books that somehow have some notion of page numbers inside them, but I don't really
understood, it seems to depend on how the EPUB or whatever format was created. So I don't, I
wouldn't swear to EPUBs being unable to mark real page numbers. I can see it being a useful thing
in a textbook, you might want to refer to an actual physical page in the print book, but might
have only access to the EPUB. So it would seem to me that that would be a useful feature for EPUBs
to support. It's, having said that, it's probably worth going and unpacking a, you know, a textbook
some sort, I'm sure I have some EPUB textbooks knocking around that, and if you, it's a zip
g-sip thing, you can just explode it and then look at all the bits. I know, this is another
John culpism, by the way, he was the first person I ever heard who explained what was inside
any oven, I never knew. So he's done a lot of hacking of EPUBs over the years, I think.
Yes, I think you're right. I think the, let me just look it back at my script. I think all this
HTML, CSS, OPS, NCX, those are the files that you will find inside an EPUB. And if so, then there
will be no way of tracking, at least in the, in the EPUBs I created of my book, you're correct,
there will be no way of tracking the pages that in the print version. That would need to be,
if that is possible, I don't know if it is, that would be, need some other clever tools to come
along and compare the PDF or whatever was generated for print with the EPUB. You know, I don't
even know if that's possible. I mean, I could see how to do it, actually, in principle, but I don't
know if EPUB in any way supports that. No, no. I think the Pandoc processor for Markdown has got some
sort of a, you know, it's got a rough, a cross-referancy type feature to it, I think.
And I'm a bit vague about this because I haven't dug into this in detail. But I think the principle
is that you put an anchor against a word, which is something you want to index. And then you make
an index table effectively that refers to those instances through their anchors. Does that make sense?
No, it doesn't really. Well, you've got GDP everywhere. Could you anchor it multiple times?
I'll get. I don't know. Actually, it's a good point you bring up. I did look at, when I was
researching how to do this at the beginning, I did look at Pandoc. I mean, Pandoc is fantastic.
You know, is it the one that calls itself the Swiss Army knife of something or other?
You don't know. It's brilliant though. It's very clever. It is brilliant. What's it written in?
Is it? It's Haskell. Haskell, that's right. Yes, I had two problems with Pandoc. The first one
is it's Haskell. Nothing against Haskell, but just the way Haskell comes in about a bazillion
different packages. Yes, install Pandoc from scratch, and you wait a long time for all of the
All of the Haskell stuff. And that is one really rare times that I've found the package management
on Slackware to be a problem is that with a package like that, which you have so many dependencies
just because of the way it's packaged. Anyway, that's one thing. But the other more fundamental thing
that I had with that is, you know, you don't want to eat your dinner with a Swiss Army knife.
I mean, you could, in principle, with two Swiss Army knives, using it, turn them into knife and fork
meat, but you rather just use a knife and fork. So yes, so this is my problem with Pandoc here.
Is it did a lot of the things I wanted to do? Perhaps all of them. But I just felt it was
cumbersome. It got away from the simplicity, you know, using a knife and fork. That would do me. My
project was not complicated enough or tricky enough to deserve Pandoc, I don't think.
Had I already been acquainted with Pandoc, I might have used it actually, but I wasn't well enough
acquainted with it. I would have to then install everything from scratch in this occasion because
I haven't used it for a while. So Pandoc is great, but I just felt, you know, it was, it was too
complicated. Yes, not a sledgehammer to crack a knot, but using Swiss Army knife to eat your dinner
would be another job of using it. Yeah, you might leave one of those blades out and cut
yourself in the nose. Yes, that's it. Yeah, yeah, the bottle probably spring open and puking
in the iron thing. So it'd be trimming your nails with the scissors back, so I don't think
something like that. Yeah, yeah. No, I take your point. I have been playing with Pandoc for a
long time now, so I don't feel too uncomfortable using it, but I can, it is and it changes quite
often and you think, oh, that doesn't seem to work. I wonder why and you go and look at it, so we
improved. Yeah, okay. The simpler approach is less, less full of surprises. Indeed, yeah.
It's just, it's personal preference. It's just how I wanted to do it, you know, I mean,
quite like understanding what I'm doing and using the, I really just like this
unix philosophy of lots of simple tools that are focused on one particular job.
So yeah, I find that works best for me generally. I find a, you know, I'm losing my hair naturally
as I age anyway, but it means I don't have to pull any more out while I'm frustrated.
No, fair enough, fair enough. Just just a couple of points on the subject of tools.
They've had two approaches made apparent to me. One is Yurun, who's a contributor to
HBR, who has written quite a number of books, and he has become very, very much enamored
of Asky Dock or Asky Dock tour, which is a rewrite of the original Asky Dock, and he
reckons that that is better than Markdown, et cetera. I do use it a little bit, but I couldn't
say what my opinion was on, you know, bookmaking with it. The second one is my son, who did an
open university maths course, which whether you got extra points for submitting your stuff in
later, and he's currently doing an MSc in computer science, where you do get a few browny points
if you said stuff in that later. So he is really quite knowledgeable about it, and he said,
it's easy to make an index with later, which it wasn't in the days then. I used it back in the 1980s
or something, the late 80s, but apparently it is. And, you know, it doesn't look like it used to
back in the day anymore, because there's lots of whatever they are, extensions that let you
produce really nice looking documents. So they're both quite long learning curves, I think. So
just for the record, it's worth knowing that these two possibilities may be exist.
Yeah, absolutely. I did look at Asky Dock. I don't think that, you know, I think I could have
happily used that to the markdown. Why did I go with markdown? I just already knew it,
and it was, I mean, there's hardly any formatting that I use for markdown. It's very little,
it's mainly just text that I'm using, so it didn't really matter. I didn't want to use HTML,
because, you know, that is cumbersome to write compared to the markdown. When I'm writing,
I just like to write in plain text. I'm writing the book that the pro is, I don't want writing a
plain text and a simple text editor is my preference. As for latex, well, yeah, I used,
I mean, I wrote my PhD thesis and latex and lots of papers when I was on academic. If you want to
do maths to this day, I really latex is the way to go. It just produces the most beautiful type
set maths. There's no maths in my current textbook, well, in the book I'm publishing here.
If there had been, I would have gone down the latex route like a like that. It would have been
the obvious choice, but because I wasn't, I don't know, I just, I felt, yeah, I could have
usually taken, I could really could have done. I haven't really even thought of why it didn't
other than there was no maths in it. It was the case that whenever you produce something with
latex, it just looked the same. It looked like enormous margins and the font used was,
well, I'm talking about my day, which was the early days of latex, perhaps in about, yeah, 87 onwards.
But that's, some people got a little bit prejudiced against that because everything they
produced looked the same, whether it was a paper or a, you know, shopping list or something.
So, but I think that any of those feelings should be, should be reviewed.
Yes, yes. I mean, yeah, I know exactly what you mean about latex or having the same look.
But, yeah, I'm trying to think why that would have been back then. Why, you didn't see,
yeah, you're right. I don't remember ever seeing anyone who used anything other than standard font
that it gave you. But it looked really nice, but, you know, everything was the same. You couldn't
make a fancy looking book where, you know, you had margin notes or, you know, interesting chapter
headings and stuff that you would, you could do it, but it all was in the same font and looked like,
oh, that's a latex. Yeah, obviously. You could, I mean, you could do it because I submitted
papers to journals and when they, they would, at that even at that time, this is in the 90s,
they preferred you, latex was such a smooth process for them. It cut down on all their
types of setting overhead. They preferred to use it, but the final product did not look like latex,
it looked like the house style for that journal. A few journals ended up like latex, but those
ones, those longstanding professional academic journals, no, even then they were able to put it
in two column format with a custom fonts and heading styles. So I think it was, it's probably always
there, it's just that you needed to, you know, open the door to the the tech underneath it and then
be a bit shocked by how complex it is. Yeah, I think that must be the point. Yeah, that's
I mean, back then, I would have been very up for doing that, but the only person I ever remember
going and getting involved in that was the same person who could pretty much speak,
post script. Well, you need a post script for that. I'm just saying that's the level of intellectual
geekery that he had ascended to, where, you know, he could really, you know, I always joked
with him, but he could go up to a post script printer and speak to it and it ended up in
a perfectly formative graph or something. Yes, and how are you, I'm trying to thank you.
Very good, very good. Yes, so that was, yeah, so I think that's discovered all the stuff
I've done, so we'll look forward to talking to you next time, Dave. Yeah, yeah,
what you've been up to for each part. I'll give you a summary of a similar nature, and we can
you can maybe sort of kick around the ideas and come up with some different views, etc, etc.
That'll be fun, looking forward to it. Okay. Okay. So, well, we'll say goodbye to everybody and
see you next time. Great looking bye-bye. Bye-bye.
You've been listening to Hacker Public Radio at HackerPublicRadio.org.
We are a community podcast network that releases shows every weekday, Monday through Friday.
Today's show, like all our shows, was contributed by an HBR listener like yourself.
If you ever thought of recording a podcast, then click on our contribute link to find out
how easy it really is. Hacker Public Radio was founded by the Digital Dove Pound and the
Infonomicom Computer Club, and it's part of the binary revolution at binrev.com.
If you have comments on today's show, please email the host directly, leave a comment on the website
or record a follow-up episode yourself. Unless otherwise status, today's show is released on the
creative comments, attribution, share a like, 3.0 license.