Initial commit: HPR Knowledge Base MCP Server

- MCP server with stdio transport for local use
- Search episodes, transcripts, hosts, and series
- 4,511 episodes with metadata and transcripts
- Data loader with in-memory JSON storage

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
Lee Hanken
2025-10-26 10:54:13 +00:00
commit 7c8efd2228
4494 changed files with 1705541 additions and 0 deletions

565
hpr_transcripts/hpr3394.txt Normal file
View File

@@ -0,0 +1,565 @@
Episode: 3394
Title: HPR3394: Be an XML star with xmlstarlet
Source: https://hub.hackerpublicradio.org/ccdn.php?filename=/eps/hpr3394/hpr3394.mp3
Transcribed: 2025-10-24 22:38:32
---
This is Hacker Public Radio Episode 3394 for Thursday,
the 5th of August 2021.
Today's show is entitled,
BNX ML Star with his Mustarlet.
It is hosted by Clot 2 and is about 27 minutes long
and carries a clean flag.
The summary is,
Parsex ML from the terminal.
This episode of HBR is brought to you by
an honesthost.com.
Get 15% discount on all shared hosting
with the offer code HBR15.
That's HBR15.
Better web hosting that's honest and fair
at An Honesthost.com.
Hey everybody, thanks for listening to
Hacker Public Radio.
My name is Clot 2 and in this episode,
I want to talk about XML Starlet.
This is a tool for parsing XML and admittedly,
it's a little bit limited in scope.
It is exclusively a terminal command.
It is not something that you are going to use
in your Python code or your Java code or any code
because it is a program.
It is an application.
It's a command that you use.
But what I have found about XML Starlet is,
well number one, it is handy.
If you're just looking to pull some subset of information
from an XML document quickly, well, good luck.
But also, if you're looking to do that relatively quickly,
then you can do it with XML Starlet right there in your terminal.
You don't have to script a beautiful soup script
to ingest a whole XML.
You don't have to do all that stuff.
Type some stuff.
Get the data that you need and then move on.
Forget the XML document ever existed.
So it's handy for that.
But I think almost as importantly as that
is that XML Starlet uses some really basic XML principles
and really reinforces that for you.
So that as you use XML Starlet,
you start to get the hang of XML in general
and that helps you, I think, eventually.
I think that'll help you when you do start doing bigger things.
Like you do have a Python script
that you need to develop with beautiful soup
or whatever you're going to use.
And I mean, while beautiful soup and other frameworks
for parsing XML can be very, you know,
it can make things really easy.
It can be equally as frustrating as any other tool
because you don't know whether you're what to look at.
Are you looking at the name tag in general?
Well, how do you get to the name tag from way up here
in your document tree?
And maybe you're not even looking at the name tag.
You're actually looking at this other tag
that contains the name element
and you want to extract the content of the name element anyway.
So you're not really even looking at name.
You're looking at the content of name.
So it can be confusing.
But if you sort of start getting those basics down
with a tool like XML Starlet,
then when you go to use the tools that make it quote unquote easy,
you understand the options.
And so you're giving your inputting sensible requests
to your quote unquote easy tool
and thereby actually making the process easy
rather than you stumbling around in the dark
with a tool that's supposed to make it really easy.
So why aren't you getting the data that you want?
Believe me, I've been there.
Don't do that.
Start out with a basic tool
that maybe doesn't make it super easy
but reinforces the basics for you
so that you start to understand the process.
Or at least that's what I do.
I guess is what I'm saying.
I mean, you know, everyone's different.
You do what you need to do.
But in this episode, either way,
whatever path you end up taking,
I'm going to talk about XML Starlet.
And I've found XML Starlet to be really useful
in real life for either, like I say, those quick,
like I need a subset of the data
that I know exists in this documents.
I'll pull that out.
Or kind of almost as an interactive mode for X-Path.
You know, kind of like I know what I need.
I just don't quite know how to get there.
So let me map it out with XML Starlet
and then I can translate it with X-Path.
Okay, so XML Starlet is actually a really big tool
and I'm not going to be able to cover everything
but I think that what I can cover are some basics
that'll get you started.
XML Starlet Dash Dash Help.
That's a great place to start honestly.
Sometimes it's not.
So I'm just saying that in this case,
XML Dash Dash Help, it does give you a nice overview
of the commands, they say.
I mean, the command is XML Starlet.
So to me, these are sub-commands,
but XML Starlet calls it commands.
So XML Starlet has a couple of different sub-commands
such as ed for edit, cell for select,
or you can just type select.
TR for transform.
I've never used that in XML Starlet.
Val for validate and so on.
I'm going to probably focus mostly on select, cell for select
because in my experience, that tends to be the one
that's really, really I think most useful.
I mean, I say most useful for the use case
that I'm describing, in other words,
pulling data out of a document when you need it.
So the first thing to do, I guess,
is to figure out what kind of data your XML document
actually contains.
This sub-command is worth the price of admission, honestly.
This one weird trick is justification
for this entire episode.
I kid you not, get an XML document.
I have one in the show notes.
Get that XML document.
It's called planets.xml, or call it planets.xml,
and then run this command, XML Starlet.
Oh, I should say XML Starlet is a command
that you'll have to probably install.
It does not come in by default on your distribution
or your OS of choice.
So go to, well, if you're on Linux,
then just get it from your software repository.
So sudo dnf install XML Starlet, or apt install,
sudo apt install XML Starlet, or whatever,
you know, your package manager is.
And if you're not, then, well, there
are package managers for everything these days.
So you can probably find it in whatever package manager
is available.
But in the worst case scenario, go to xmlstar.sourceforge.net.
And that is where you will find the tool.
OK, so assuming you have XML Starlet now,
you could type XML Starlet elements,
and that's element within s at the end, s for Sierra.
So plural elements, XML Starlet elements,
planets.xml, in this case, because that's
the document that I have.
That command strips out all of the sort
of confusing malange of XML tags and data.
It just shows you sort of a map of all of the elements
in that XML document.
So for this document, for planets,
and if you'll recall, we had this document that we did
in the previous episode about XML here on Hacker Public Radio.
You can go listen to that episode before listening to this one.
It'll be amazing.
You'll fall in love with XML.
And we had a document, and we started with the XML tag.
And the next one under that was
Sol for the Solar System.
And upon further reflection, I realized probably it
would have been better to call it system.
But I'm not changing the schema now.
We're too invested in this one, so we're stuck with it.
XML, and then Sol, and then Planet.
And then within the planet tags, we
had a name, element, and an albedo element.
So we could name a planet, Terra, and then the albedo, 0.39.
And then we could do another planet, name, albedo.
And then another planet.
We could do that seven times, or six times,
or three times, as the case may be,
because I got tired of listing planets.
So I got up from Mercury to Venus, to Earth, or Terra.
And then quit.
But we see that after XML, StarlitElementsPlanets.xml,
it maps that out for you.
And that's important, because that gives you
a visual map of how an XML document is laid out.
Now, when it's a small document admittedly,
it's a lot easier to look at.
Now, if this was a complex user manual written in dock
book, or something like that, or an SVG,
it can get a lot messier.
In fact, let me just randomly XMLElements.
And then I'm going to go into my Graphics folder,
which I guess I must have, yeah, there we go, Graphic.
And here's the logo to getportal.svg.
So I'm going to do an XML Starlit on that.
And that's not as bad as I thought.
I thought I was going to be scrolling.
So I'm going to do a pipe that to wc-l 156 lines, really.
That's not bad.
And I mean, a lot of those admittedly
are basically nonsense.
So it's SVG.
Well, and then there's actually not nonsense.
There's very clear stuff.
And then SVG slash g, SVG slash gg, SVG slash g, slash g,
slash g, I mean, admittedly, that gets a little bit.
That's not very descriptive.
But at least you can see the paths that exist.
And then if you are looking for data within one of those paths,
now you know what to target.
So that's huge.
Trust me, that's a big deal.
If you're not sold on XML Starlit from literally that one sub
command, then you might not be looking for XML Starlit
because that's a really neat feature.
OK, so now I'm going to look at something a little bit
more useful probably.
But the elements, I mean, that's the starting place,
to just kind of get a sense for what you're dealing with,
whether you generated the document yourself or not,
just being able to kind of get a broad overview of the document
can be very helpful.
You can also do something like XML Starlit Validate,
or just Val for short, for instance, Planets.xml.
And it returns whether it is valid or not valid.
So if you're in doubt that you're dealing
with a valid XML document, validate first.
I would probably say as a general statement,
and I can't imagine anyone arguing with me on this,
I don't think you can validate XML enough.
And I say that because if it's not built into your workflow
or even into your broader pipeline,
then it's easy to forget.
So if you think, gee, I wonder if I should validate
this XML document, then the answer is yes.
I either use XML Lint or use XML Starlit Val,
because those two things will expose problems in your XML.
And you want to know about problems in your XML.
You want to know about problems in anything
that you're dealing with very early on
so that you don't spend your whole afternoon wondering
why this XML Starlit command isn't returning
the name of the third planet or whatever.
And it turns out that there was a name tag
that was misplaced.
Oops, you'd know that if you validated it.
Okay, so those are two really quick easy tips.
XML Starlit Elements, XML Starlit Validate.
Now the thing that you're actually going to do,
you know, sort of the loop that you're going to enter.
And that is XML Starlit Select,
or for short, SEL, XML Starlit Cell,
or just select, I actually usually spell it out
because it feels clearer to me.
And let's try to get sort of the names of the planets
that are in our document.
That seems like an approachable,
so actually I'm going to restate,
I'm going to redo my Elements command real quick
just so I have that on my screen
so that, because again, that kind of gives you the map,
that gives you your road map.
So I'm going to do XML Starlit, select, dash, dash, template,
and we'll do dash, dash, value, dash of.
So this is where we give it sort of the path
to the thing that we want the value of.
And in this case, I'm going to say
that we want slash XML,
and I'm just following along with my little visual map here
that I got from XML Starlit Elements.
So slash XML, slash soul, SEL, slash planet, slash name.
We want the value of all the names in our document.
And I'm also going to do a dash, dash, in L for new line.
You don't have to do that.
It just usually makes it a little bit easier to read
if you insert, like if you kind of give it permission
to insert some extra new line characters
where convenient.
And then I'm going to point it at the document
that I want it to look at, which is planets.xml,
and then I'm going to hit return.
And I get, as my output, pretty much what I expect,
Mercury, Venus, and Terra.
So simply by doing dash, dash value of,
and then giving it the path to the elements
that I want to look at,
I got exactly what I was hoping for,
which was the contents of that element.
It is as easy as that.
Now it's not always as easy as that.
And there are different ways that you can look at this information
so as to filter it out a little bit differently.
But that definitely, that's a good start, I think.
We could broaden our search.
Like maybe we don't know that we just,
that we want just the name.
I mean, quite possibly because you've done XML elements,
maybe you do know that you need the name,
but let's say for whatever reason you need a report
on the planets themselves,
you don't need just the name for whatever reason.
In that case, you could do sort of an XML start,
and you can anticipate this, I'm sure,
because we've just done this,
but we're just, I'm just doing it one level higher.
So XML startlet, dash dash template,
the dash dash template or a dash T for short
is required for select,
because it doesn't understand, from what I understand,
it doesn't understand that you are providing it
in line instructions on what to look at.
That's what the template is.
So dash dash template sort of activates,
I think the in line options.
The man page doesn't mention template,
and the help barely mentions it,
so yeah, I haven't really looked that deeply into it,
but it's something that's required.
So XML startlet, oh, I forgot to select,
select, dash dash template,
dash dash value, dash of,
and then we'll do an slash XML,
and again, I'm just following the output of XML elements file.
XML slash soul slash planet,
and then pointed at planets,
and there I get the values of whatever the contents
of the planets are.
So mercury zero dot one one,
venus zero dot seven,
tera zero dot three nine.
Now, I'm missing the context of the XML,
but I mean, that's kind of part of the point
of XML startlet,
it's designed to return the data from XML,
it's not necessarily designed.
When we're getting the value of,
it's not gonna give you the tags around the values,
because you're getting the value of the thing.
But of course, you have a little bit more flexibility
than that, because this is XML startlet and XPAP,
I mean, the power of XPAP, I guess.
So XML startlet, let's say select dash t
for template, dash dash or value of,
single quote slash XML slash soul slash planet,
and then here's where we're gonna get a little bit crazy,
rather than just saying sort of a blanket statement
of what could give me the planet node,
we're going to qualify,
when we, or what part of, rather,
which instance of the planet node we want?
There are different ways to qualify that.
The first one I'll do, actually maybe,
maybe the first one I should do,
is sort of the simplest one of all,
which would be planet square bracket,
one closed square bracket.
So if you've programmed it all ever with arrays
or dictionaries or lists or anything like that,
well, actually, anyway, that sort of thing,
then you might already know what we're doing here.
I'm saying planet square bracket, one closed bracket,
and then let's just, well, actually, I guess again,
to keep it simple, let's leave it at that.
Hit return or dash dash in L to get new lines
where necessary, and then planets.xml.
Hit return, and instead of getting all of the planets,
I just get the first, the first planet node,
as listed in the, in the XML file.
Obviously, if, and so that's mercury zero dot one one,
that's what it gives me.
Obviously, if I go up to planet and change that node
to square bracket two, then I get venus at zero dot seven,
and if I were to change that to finally square bracket three,
then I get tera, and sorry, venus zero dot seven,
and then three, I get tera, and zero dot three nine.
So I'm selecting, I'm qualifying what I want it to give me.
And similarly, I could do that.
Let's say I just wanted the albedo of the third planet.
So planet square bracket three slash,
so sorry, square bracket three, closed square bracket,
slash name, no.
Well, I could do that, but I'm gonna,
I'm not a slash albedo, and that gives me,
so it navigates to the planet nodes,
it selects the third one,
because those are in square brackets,
and then it continues into that node
and gives me the contents of the elements albedo.
So we're really kind of querying this thing
more or less as you might query a database,
which I think is pretty powerful stuff.
And yet again, we can, we can get more powerful.
So we could do XML starlet select, dash t for template,
dash dash value, dash of, quote slash XML,
slash soul, slash soul, slash planet, square bracket.
What do we know about square, about planets?
Well, we know from XML starlet,
elements, planets dot XML,
we know that planet, the planet nodes contain two elements.
And that's name and albedo, name and albedo,
name and albedo.
So let's take a square bracket albedo,
greater than, so angle bracket pointing to the right,
greater than zero dot one, well, zero dot two,
let's do that, zero dot two,
close square bracket and slash name,
close quote dash dash in L, planets dot XML.
So in this case, where we're saying,
give me a planet node as long as the albedo
is greater than, or I should say,
the contents of the albedo element is greater than zero dot two.
So this should, when I hit return,
I should see results for Terra, certainly,
and Venus, but not Mercury,
because I happen to know that Mercury is zero dot one one.
And I'm saying it's got to be greater than zero dot two.
And of course, that's exactly what I get,
Venus and Terra.
And I wouldn't have to filter down on the name,
I could just leave it as planet square bracket albedo,
greater than zero dot two, close square bracket,
close quote dash dash in L, planets dot XML.
And then I get Venus zero dot seven,
Terra zero dot three nine.
So now it's really feeling like a database,
because I can even, my queries can even be conditional.
Okay, so let's see what else, what next?
Well, we could do, for instance,
well, there are functions in X path.
And I believe X path is a W3C specification.
So you can find a list of all the functions
in the specification, or you can go to Mozilla developer network,
MDN, and they've got a pretty good list
of all of the different X path functions.
And these functions work, I mean,
we're kind of using one when we say
albedo is greater than zero dot three nine.
But there are better functions than that.
I mean, that's, I don't know that that's a function
so much as it is, just kind of, I don't know, syntax.
But there are functions like text, parentheses,
parentheses, last, parentheses, parentheses,
first, parentheses, parentheses,
position, parentheses, parentheses.
So for instance, let's,
I just hit control P to go up in my terminal,
and I realized that I wasn't in my terminal,
so I was just printing the Mozilla,
almost printed the Mozilla developer network page.
So we'll do like a,
let's see if we can just do a XML starlet select dash t,
dash dash value of, quote,
x and slash XML slash soul slash planet square bracket,
last, parentheses, parentheses,
square bracket, quote, dash dash in L,
planets dot XML.
And as you might expect, as you might guess,
the square bracket, square bracket,
that's sort of like a breakout box
for x-path expressions and, or functions, I guess.
And last, parentheses, parentheses,
is a function that's built into x-path.
And it knows to, to use that,
to do whatever kind of processing it needs to do
to determine which one is the last node
of the, of a group of, of like nodes,
select the last one and return that.
So that, that instruction,
XML soul, planet, square bracket, last,
parentheses, parentheses, square bracket,
gives me tera 0 dot 3.
And I could do that, what I couldn't do exactly that.
I could do, for instance,
I couldn't do that either.
I was trying to think of something I could do
on the name element,
but there's really not a whole lot that I can think of
to do on that, to, for in terms of selecting,
that just doesn't quite make as much sense.
But yeah, you get the idea.
I mean, position, parentheses, parentheses,
is a positional argument.
So you can tell it to select the planet
in position number one, two or three,
or greater than position, two, for instance.
That would give you just tera greater than or equals two,
would give you venus and tera, and so on.
So that, that's kind of handy if you don't know,
you know, if you, if you want to get a range,
or anytime you have a condition where just a square
bracket one or square bracket two,
that won't, that won't give you everything that you want.
You have functions that give you a little bit more flexibility.
XML starlet is a really, really handy utility.
And it is kind of, it's, it is a utility.
I mean, it's not, I don't think,
it isn't really like, you know, I mean, it is a command.
But what I'm trying to say is that it,
it is a build it yourself type of tool.
It is, it's pretty raw in the input that it takes.
It will take X path.
It'll take, it'll help you transform XML
according to an SSL style sheet.
It'll validate stuff, it less you edit stuff.
It can get, it can get very complex,
and you do have to, you have to kind of learn the syntax
of what you're trying to do in order for XML starlet
to be able to help you.
And that can be, that can be tough sometimes.
X path itself can be a little bit like RegX sometimes.
But XML starlet is a really handy way of leveraging that
in shell scripts or just for really quick tasks
that you have, that, that you just,
you don't have any better way to do it.
XML starlet is a great way to look at XML,
to edit XML programmatically, to query XML, and so on.
Hopefully this has kind of given you an idea of what's possible
and has inspired you to look into XML starlet.
Like I say, the things that you learn from XML starlet
are probably transferable over to a bunch of other stuff
that deals with XML.
Because if you're using XML starlet and you're also
thinking about the structure of XML,
you're thinking about the, the way to traverse that,
that XML and that document object model, and so on.
So XML starlet can be very, very useful,
just to kind of get you comfortable with XML
and comfortable with how horrible it feels to be lost in XML.
And eventually, you know, you'll come to terms with that
and you'll start to love it like I have.
That's it, that's all I've got to say.
I think for XML starlet, so thanks for listening
to this episode of Hacker Public Radio.
Hope to hear you on this show soon.
Talk to you next time.
["Hacker Public Radio"]
You've been listening to Hacker Public Radio
at HackerPublicRadio.org.
We are a community podcast network
that releases shows every weekday, Monday through Friday.
Today's show, like all our shows,
was contributed by an HPR listener like yourself.
If you ever thought of recording a podcast,
then click on our contribute link
to find out how easy it really is.
Hacker Public Radio was founded by the digital dog pound
and the Infonomicon Computer Club.
And it's part of the binary revolution at binrev.com.
If you have comments on today's show,
please email the host directly.
Leave a comment on the website
or record a follow-up episode yourself.
Unless otherwise status,
today's show is released on the creative comments,
attribution, share a life, three-dot-o license.
["Hacker Public Radio"]