Initial commit: HPR Knowledge Base MCP Server

- MCP server with stdio transport for local use
- Search episodes, transcripts, hosts, and series
- 4,511 episodes with metadata and transcripts
- Data loader with in-memory JSON storage

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
Lee Hanken
2025-10-26 10:54:13 +00:00
commit 7c8efd2228
4494 changed files with 1705541 additions and 0 deletions

286
hpr_transcripts/hpr3985.txt Normal file
View File

@@ -0,0 +1,286 @@
Episode: 3985
Title: HPR3985: Bash snippet - be careful when feeding data to loops
Source: https://hub.hackerpublicradio.org/ccdn.php?filename=/eps/hpr3985/hpr3985.mp3
Transcribed: 2025-10-25 18:19:37
---
This is Hacker Public Radio Episode 3,985 for Friday the 10th of November 2023.
Today's show is entitled, Bash Snippet Be Careful, When Feeding Data To Loops.
It is part of the series, Bash Scripting.
It is hosted by Dave Morris, and is about 27 minutes long.
It carries an explicit flag.
The summary is, a loop in a pipeline runs in a sub-show.
Hello, welcome to Hacker Public Radio.
My name is David Morris.
I'm going to talk today about a Bash subject.
I haven't done many Bash shows of light, but this one suddenly struck me as worth doing.
It's about when you're using loops connected to pipelines in Bash.
And what prompted it was a show that Ken Fallon did, number 3962,
which he used a Bash pipeline of some quite funky commands to feed their output into a wild loop.
In the loop, he processed the lines produced by the pipeline
and used what he found to download audio files belonging to a series using WGET.
This was in response to a previous show where a technique for doing this was discussed.
It was a really great show and had some good advice, excellent advice.
But one of the things in it reminded me of the gotcha I warned about
a my own show 2699 some years ago.
This was where you create a pipeline of commands which feed into a loop.
I thought it was worth revisiting this.
So the actual issue is to do with pipelines and how they work.
So I've done various shows on Bash, but I haven't covered pipelines.
The intention was I would be doing sort of fairly regular shows, but somehow my energy has dwindled over the years.
Anyway, what's the pipeline?
Pipelines, they're pretty amazing things which come from the Unix world
and are available in all manner of shells, such as Bash.
And that's what I'm talking about here, obviously.
The general format is that you, on the command line, you, or indeed in a script,
you type a command.
That command is expected to generate some output and use a pipe symbol, which is the vertical bar.
And then you have a second command, which gets the output from the first command.
So it's important to know though that each command runs in a sub-shell.
That is, it creates another sub-process, which runs the bash shell and runs in that.
But the input and output are connected together with other things in the pipeline.
So the output of command one goes to the input of command two.
That's a fantastic way of grabbing some data from something, maybe querying a database or that type of thing.
And maybe grabbing a subset of that data, reformating it, doing all sorts of things of that nature.
In Ken's example, he is feeding XML data into tools that can manage the XML,
which is not an easy thing to do.
And he has a lot of expertise in that area.
So I've got a simple example of a pipeline in the notes here.
It's just a printf, the printf command will print a string.
But you can include, or manner of things in the strings, a bit more complicated than echo.
So it's printing in single quotes world, backslash n, hello backslash n.
So it print two lines world and hello.
Then the pipe symbol, then sort, command sort.
So the two lines sent by printf are received by sort, which has sought some into alphabetical order.
So it comes out as hello world, separate lines.
So the commands in a pipeline can be more complex than this.
We'll look at some of them, some more of the complexities.
And obviously the commands that you run can be extremely complex with lots and lots of options and things to twitle.
But one of the things that you often see in loops in pipelines is loop, such as a while loop.
So my next example has the same printf of world and hello, separate by new lines.
Fed into a sort, and then the output of sort is fed into a while loop.
Now, while loop requires that you use the word while followed by some test action,
something that returns to a false result.
What we've got here is read line, the read is a bash building.
Line is the name of a variable to receive whatever's read.
So what we're getting is individual lines of text being sent to the while loop.
And the while loop is using read to read those lines, one at a time, each iteration of the loop.
It then puts it using echo, but it includes the variable line inside to parentheses, and that's the end of it.
So you get out the other end, hello world, each in parentheses, just to demonstrate going on.
I tend to write these types of pipelines in one line, they don't have to be.
You do need to break them.
I can't just put a new line in an arbitrary place in between the pipe symbols or anything,
but you can put backslash new line and wrap it that way.
And when we come on to further script, there will be an example of that.
So that's all fine, that's great.
It works.
Pipes going into a while loop while it's doing stuff with the data.
Now what about if you want to do something more sophisticated?
Say you want to number each of these lines.
So here we've got the next example.
It's all written on one line, which you can do.
I don't know if I've ever said this, but it's possible to put a bunch of commands together on the same line,
separating them by semicirons.
Obviously you don't want that to be too long,
but wherever bash normally expects a new line, you can replace that by semicolon.
I do this quite a lot.
I write one line as to do various crunches on files to pick out specific bits and that type of thing.
I think you may well know this or if you don't, then it's worth maybe looking into it
because it's a great way of getting things done, a bit of text manipulation type of thing in bash.
Anyway, this line starts with i equals zero.
So it's variable i, which is being set to zero, semicolon.
Then the print that we've dealt with before, which is piped into sword, which is piped into wire.
And the wire is doing the same thing, reading a line, and then after the semicolon do,
there is a bracketed expression, a parenthesis expression with double parentheses,
and it contains i plus plus.
Now you don't have to put a dollar on the i in these cases.
I have covered this subject in one of my earlier shows on bash.
What it's actually doing is incrementing the variable i.
Then the echo consists of the contents of the i, dollar i, an open parenthesis, and then dollar line.
So what you get is the two lines printed out with one and two stuck on the front of the parenthesis.
Now I've set variable i to zero before the pipeline.
It didn't have to be on the same line.
It could have been anywhere.
I don't want to dwell on that subject too much, I'm sure you all got that.
You might expect at the end of that loop, i's obviously been incremented to one from zero.
There's been incremented to two, and then the loop stopguss has no more lines to read.
So you'd expect it to be two.
If you print it out, then you'll find that it's zero.
The reason for that is that the i that you set up at the beginning of the multi-command,
a multiple element command or list, is set to zero.
But that's one that's an i in the shell that you start off in.
Each of these elements of the pipeline runs in its own shell,
and when a shell subshell starts up, it copies variables from the parent shell from which it's
invoked, and runs with this copy throughout the process, the subshell.
Incrementing what we're incrementing is another i in the subshell.
So did I say that when it's cloned, its value is also copied.
So when the subshell that has been incrementing i ends,
the version of i, the variable called i in that context, is deleted.
So when you ask for the value of i, you're asking for the value of the one in the shell above,
and it's zero, because it's not changed.
That's because subshells or processes cannot pass back information to variables in the calling shell or process.
I should say actually that subshell and process are the same thing if what you're doing is
creating another bash instance under the current one.
So you can actually type the command bash to your command line, and you will get another subprocess
which is running bash, and then if you do control d or exit, you exit out of that one and you back
to the one before. So that's the sort of thing that's happening in this case.
But changes in the subshell or the process gets lost when it closes.
So how do you prevent this loss of changes in a loop?
Well, the loop needs to be run in the original shell, not a subshell.
So the pipeline, the bunch of commands, which presumably is a pipeline, need to be attached to the
loop in a different way. So the example that I've got for you is very, very similar to what we've
seen already. We have a variable i set to zero, but this time we're starting the while loop
straight away. We're doing a while read line, etc, etc. We're incrementing i in there.
We are echoing the line with the number on the front of it.
But at the end of the loop, after the done part of it, we have a less than sign, which is the way
in which a loop can read from a file. It's part of the subject read direction, which again I haven't
covered yet, but it's, if I, when I get back, I say it shouldn't say if, when I get back into gear with
the bash series, I should talk about the one that's quite high in the priority list.
So after the less than sign, we have this strange construct, which I've talked about before in
previous bash scripting series, which is a bunch of commands in parentheses, just a list,
in this case it's a pipeline. The first open parenthesis is preceded by a less than sign, which
with no intervening spaces. What that thing is called is a process substitution. So what happens
here is a process is started up. It runs what's in the parentheses and then it returns the result.
It's to be treated as if it's a sort of a file, even though it's a fairly dynamic entity.
And inside these parentheses is the print app we've been seeing before and the sort. So this
process substitution actually returns the words world and hello and then sorted, which is then
received by the while loop and is made available displayed. It's a temporary thing, the process
substitution thing is, is temporary, it's not stored anywhere, well, it's stored somewhere temporarily
in memory I guess you'd say. But there's no record of it afterwards, which switches its power really.
So redirection, I just mentioned here in the notes, the redirection feature lets you read from a file
in the loop and the example is a generic thing while read variable do then do something with the
variable and then done to close the loop and then less than sign in the name of a file. So we'll
actually read line by line from a file if you wish it to. And this process substitution is doing
the same thing as presenting lines to the to the reader as if it was a file. So the thing about
this though is if you run this particular loop in the example and then look at the value of i at
the end of it it will in fact be two because the i that's created with i equals zero is also the i
that's being incremented in the loop and therefore when the loop and the finish the value has been
incremented to two. So it's a it's a slightly odd way of doing things. I mean I always consider
release I think I I am of the opinion that bash is not a conventional programming language
you would never have a structure like this in a in a programming language. It's more of a
of a command line environment which has different constraints and different needs.
The thing that feeds data to the loop occurring after the loop as you type it seems like a
counter intuitive way of doing things but that's the way the bash gets around that particular problem.
There is another construct using pipelines and stuff which looks similar to this that I haven't
to stumble upon as I was so preparing this show and that's the example again setting i to zero.
So we call on while read line do echo line increment i and finish the loop and then a less
than sign. In this case it's using the password file that's available on or Unix and Linux systems
it's readable but it doesn't contain anything particularly interesting but anyway it's listing
it's reading from that file then that such a password file name is followed by a pipe symbol
and there's a head command head minus n5 and then an echo for the i variable. Now what that does
is it reads all the lines from the file and prints them out while incrementing the variable.
It's not using the variable but it's incrementing it anyway it's partly just to prove what
it will be once it comes out the other end but all of those lines however many there might be
it might be several hundred will then be passed through the pipeline to the head command where
head has been told just show the first five lines and ignore the rest so it looks as if it might
be similar to the one with the process substitution but it isn't it's a pipeline and when you examine
i at the end of it although within the while loop it has been incremented the one available once
the pipe is closed is the one containing zero so the wireless is running in a sub shell in other words
i would never think to do that to be honest it took me a few looks to realize what it was doing
so piece of advice then his mind by summarise it's a tldr really use the pipe connected to loop
layout if you're aware of the pitfalls and don't and won't be affected by them you don't care
about variables changed within the loop okay use the one which reads from a process substitution
if you want your loop to be complex and read and write variables in the script i certainly do write
scripts that do that so i tend to always use the second form in my script even if i don't need to
i do it because later on i might come in think oh i know i'll set a flag when i encounter such
and such a thing and then i can use it later on in the script if we don't set it up in that way
the flag will be set and then it'll be deleted so i sort of saved myself from falling down the
the hole repeatedly okay that's really all there is to say but i i what why should i stop now he
as i was thinking about this looking at Ken's script and um i was thinking what
there's loads of pipe pipes uh there's a one pipe and there's lots of sub-process or sub-shells in it
but how do you ever get to know about this how how does this become visible to you as a as a user
when you may not care of course in which case it's probably best just to skip the end but um
you might just want to know what's going on so i try to explain sort of a top-down overview of this
the process that you log into and you log into um your units or Linux box it's a thing called a
shell or command it is it's called a shell it's quite confusing i think the use of shell
it's not always as consistent as i think it should be anyway a shell runs a command language
interpreter command language interpreter is bash in this case so that's called a shell
they call bash a shell as well which i was that's what confuse me anyway this executes commands that
are read from standard input or from a file so you type things to bash and it does stuff
and now processes in units and Linux are quite lightweight you can create one and destroy one
quite quickly and there's not much overhead before getting into units and Linux way back in the
1980s i used to work with the digital equipment corporations open vms operating system which also
has a command interface obviously and uses processes but you're discouraged from using them
because they are expensive to create and destroy their slow and there no way near as useful
as the unix and Linux version in bash pipeline is discussed they use subshells and the
description on the man page for bash says each command in a multi command pipeline where pipes are
created is executed in a sub shell which is a separate process so a pipeline is a bunch of
commands with vertical bars between pipelines are the sort of data structures that connect one
command to the next and the symbol says do make a pipe between these two commands so a subshell
in this context yes i've alluded to this before is a child process of the main login process or
some other parent process which is running bash the running bash really makes it a subshell
you can create processes or subshells in other ways and one way is to place a collection of commands
in parentheses and they can just be simple bash commands separated by semicolon or they can be
pipelines if you wish so give an example here the the dreaded hello world thing in parentheses the two
commands echo world semicolon echo hello and then close parentheses so what that process will do
will be to generate the two lines world and hello then that's fed into a pipe through pipeline
into sort and of course you get hello world in the right order coming out the other end
so that is just completely similar to the original thing where i use print f it's using a process
but it's not it's not using more processes than the other one it can be quite useful to have
bunch of things sending output so you've got a command which just squirts out a bunch of numbers
you might want to put a heading and a header in a footer on that you can do that by putting an
echo for the header and calling the program itself and then putting an echo for the footer all
put that all into parentheses then the output of that if you redirect it to a file or put it into
a pipeline to do other work on it is an entity in itself so that's quite useful technique to be
able to do that if you wrote it without the parentheses echo world semicolon echo hello and then
sort then what you get would be world hello because the world text is written in the first command
which is then then you start a pipe where the pipe echoes the first command is echo hello into sort
sort gets one line you can't sort one line so it just returns it so you get world produced by
the first command and then the pipe the waste of time pipeline generates hello that's my quick
summary of pipes and processes and stuff each process has a unique numeric id it's called the
process id or usually pid often written in capital you can see these if you use tools like ps or
h top or various other things that let you check what's running on your system interestingly each
process has a variable a bash variable called bash pid all in capital a simple variable which holds
the pid the process id number so knowing all of this stuff I decided to modify Ken script from show
3962 just to see if I could make it show the processes being created and so forth so I've
including it here in its hacked about form if in case it's of interest so what Ken script does is
it defines a URL from which to download the XML of an RSS feed and it also defines a download
directory I've added to this file which is to contain pid information and it's to be slash
tmp slash edgbr 3962.sh.ag all that pid file I set a variable count to zero then the first thing
I do is to echo the message starting pid is then the contents of bash pid so that would be the
pid of the process running the script you will be aware I imagine that when you run a script it gets
run in a separate process in most cases so it won't be the same as the parent pid from the
from the process you you invoke this from so other than that each of the steps that Ken
put together he starts with a wget to get the the RSS feed then he puts that through XML style it
says it's a very powerful tool for parsing out XML then he passes that to sort and then they're
on to a while loop now for each of these commands that Ken originally used I've enclosed them
in parentheses and put an echo command in front of them with the echo echoes a number in square
brackets I've just numbered each of the echoes by that mean so there's what there's there's four
of them in total so you can actually see which echo has produced what and then after the square
bracketed number there's the contents of bash pid in the assumption that will be different other
than that in the loop I've added stuff to echo the bash pid within the loop and I've used a counter
to determine how many to do there could be many many iterations in this loop so I've limited it
to to two lines being written to the pid file I didn't say that the echoes all go to the pid file
but they do other than that the rest of the script is the same except I've commented out the
wget that goes and gets the audio file because I didn't really want a big pile of audio files to
demonstrate this the final additions to the script were to say what the final value of the count
variable is and then to report that the pid file is where where the numbers are so when the script
is run all you see on the terminal is final value of count equals and it's zero of course count
it's being incremented in the loop so that just makes that point again then it tells you where the
file is so if I list the files I'm not saying much but it contains the message starting pid and
that's 80255 in this case obviously it will be different for you and different next time I run
of them we get echo one where the the wget is triggered is 80256 2 is 573 is 58 and 4 is 59
so you can see each one of these gets consecutive process id number that's just cross the machine I'm
running on is not doing much so processes are not being created all the time if there were some
other thing that was generating processes you wouldn't get those contiguous numbers the other
interesting thing is that the process the loop runs in is the same you can see that from the two
instances of echo four producing the same process id number so it makes the point that pipe lines
consist of lots of processes depending on how many steps there are and processes are independent
they can't talk to one another but the the pipe allows output of one to go to the input of the other
command and yeah that the fact actually the fact that we've got echoes in there is interesting because
you think couldn't that mess up the data that's coming out of the command it's in that that
particular process and yeah it could but what I'm doing there is I'm redirecting that I put to a file
so it doesn't interfere if you ever want to play around with this sort of stuff you'd need to
bear that in mind you could write it on a different channel which would not get in the way but
that's far too complicated for this which is meant to be a snippet it was about half an hour long
so apologies for that so I just found this was interesting and quite revealing and it just
confirmed what I I sort of knew but I wanted to see it acted out in front of my eyes you may think
well it says so in the book so I believe it and that's the end of the story which case good for you
okay that's the end there's some some links to various bits and pieces you might find useful
but otherwise that's that's it okay bye
you have been listening to Hacker Public Radio at Hacker Public Radio does work today's show was
contributed by a HBR listener like yourself if you ever thought of recording podcast
you click on our contribute link to find out how easy it really is hosting for HBR has been
kindly provided by an onsthost.com the internet archive and our syncs.net on this although I
stated today's show is released under a creative commons attribution 4.0 international license