Initial commit: HPR Knowledge Base MCP Server
- MCP server with stdio transport for local use - Search episodes, transcripts, hosts, and series - 4,511 episodes with metadata and transcripts - Data loader with in-memory JSON storage 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
158
hpr_transcripts/hpr2113.txt
Normal file
158
hpr_transcripts/hpr2113.txt
Normal file
@@ -0,0 +1,158 @@
|
||||
Episode: 2113
|
||||
Title: HPR2113: sqlite and bash
|
||||
Source: https://hub.hackerpublicradio.org/ccdn.php?filename=/eps/hpr2113/hpr2113.mp3
|
||||
Transcribed: 2025-10-18 14:28:50
|
||||
|
||||
---
|
||||
|
||||
This is HPR Episode 2113 entitled, SQLite and Bash.
|
||||
It is hosted by first-time host Norris and is about 15 minutes long.
|
||||
The summer is using cron, SQLite and Bash to find directory rows.
|
||||
This episode of HPR is brought to you by an honesthost.com.
|
||||
Get 15% discount on all shared hosting with the offer code, HPR15, that's HPR15.
|
||||
Get your web hosting that's honest and fair at An Honesthost.com.
|
||||
So I'm going to talk real quick about problem, I needed to solve and some tech that I used
|
||||
to solve it.
|
||||
I work for a company where we let users add content or upload files and we also have some
|
||||
processes that create files and the file system was getting a little out of hand and I was
|
||||
having to work kind of hard to keep it trimmed.
|
||||
I wanted a way to track down directories or files that were growing and be able to track
|
||||
the growth of some files and directories.
|
||||
So I started working on the problem, I went through several sort of processes and first
|
||||
I thought I could just have a list of directories I wanted a monitor and that could come up
|
||||
some way to loop through that list and figure out the file sizes and I figured out how
|
||||
to do this using Python and it was terribly inefficient, it's not that Python's fault.
|
||||
It's just, I didn't do a good job, it's not a good method to start with a list of files
|
||||
and to loop through the list and individually calculate the files out of it.
|
||||
So I had this, I had written this Python script and it didn't work real good and it didn't
|
||||
do what I wanted, it didn't run it often enough so it was just kind of bad.
|
||||
So the next thing I wanted to try was doing everything in BASH instead of Python so I knew
|
||||
the EU command so I sort of thinking was I could do some similar loop through a list
|
||||
of files, run a DU on every one of those files or directories, make some output, it was
|
||||
having trouble coming up with a list of files and you know I knew that if I was a static
|
||||
list that if someone added something new that it wouldn't pick it up so I was just, at
|
||||
some point I was in the directory that I wanted a monitor and I just typed DU without
|
||||
me real arguments and I realized that whenever you just ran DU that it listed out every directory
|
||||
in that directory so it listed out every sub directory.
|
||||
It basically walked down the directory tree listing every directory and it saws and it was
|
||||
sort of a little aha moment that I had that I didn't need to loop through anything, I didn't
|
||||
need to give it a list, I could just run the DU command and it would print out every
|
||||
directory that I wanted a monitor and it saws.
|
||||
So when I finally figured that out I didn't know how I would compare things week to week
|
||||
yet but I knew that I needed to at least start recording these file sizes and then I would
|
||||
figure that out later so the obvious way to run this was from a cron job so I set up a
|
||||
cron job to run weekly to basically just run the DU command so I would give it the dash
|
||||
M flag so a DU dash M and then the path to the sort of the overarching melt point where
|
||||
all these files that I wanted a monitor were located and then I would send that output
|
||||
to file and I would name the file based on the date.
|
||||
So just a quick diversion, I feel like there's not even episodes on brace expansion and command
|
||||
substitution and I use command substitution to get the date and to output the file to
|
||||
a file name that included that date so I'll try to put some example commands but you'll
|
||||
see if I'm able to do that you'll see where I have the date command.
|
||||
Okay so now what I've got is once a week I've got a job that runs and gives me a list
|
||||
of all the directories I want a monitor and their sizes so I need to figure out how to
|
||||
put a couple of those lists together and figure out what the difference is you know how
|
||||
much you know I've got one file path one week it's the size and the next week it's
|
||||
the size okay what's the difference and is it growing or shrinking and which ones are
|
||||
growing the most so I went through I don't know a few different things to try I tried
|
||||
to use in a the diff command and that I mean that would find differences but it doesn't
|
||||
really do anything other than on differences and it's really what it's for so I thought
|
||||
about how I could loop through these files and say you know read read the file one line
|
||||
of the time and get the file path and the directory size and then try to match that
|
||||
up with the file path and directory size and another file and this is sort of the looping
|
||||
process like I started out writing it and it just got it became you know a loop within
|
||||
a loop and came hard to keep up with what was where and then I had to figure out how
|
||||
to do the math to figure out what was what and then you know I had to figure out a way
|
||||
to do the math to figure out the difference between you know the you know the first file
|
||||
the second file and how much the growth was and then figure out how to sort it and it
|
||||
just got it got complicated fast and I was thinking of myself if I could just get this
|
||||
information into a database I know enough SQL to do to run a query and get basically the
|
||||
output I'm looking for and it seemed kind of silly to set up a database just for this
|
||||
and I really considered it making you know setting up a my SQL database and basically making
|
||||
a new either a new database or a new table every week and looping through the you know running
|
||||
the manually running the SQL and getting a lot of the information I was looking for but
|
||||
I really I knew that if I had to do it manually at some point I quit doing it and I forget
|
||||
about it so I wanted a way to automate it and I knew at some point I remember picking
|
||||
up that SQL light has a in memory database so you can use SQL light with a file obviously
|
||||
you can come back to the file and run commands in it but it also it it's really light and
|
||||
really fast that was a bus it's really light and really fast if you can do everything in
|
||||
memory so when I started experiment with is taking the output of a couple of these DU files
|
||||
and loading them into my SQL light tables and then figuring out the query to give me what
|
||||
I was looking for you know what what what directories grew the most over the course of a week
|
||||
I came up with some SQL that would do it that would load the two files into their own
|
||||
tables so I created a table called old and a table called new and then I would load the
|
||||
older file into the old table the newer file into the new table and then I could execute the
|
||||
SQL and you can do math in the SQL so you can if you have the old size of the new size you can
|
||||
make SQL light you know calculate the difference for you just by doing you know new size minus
|
||||
old size so that the query that I ended up running is basically select the file path and old
|
||||
size minus new size and I would natural join the two tables and when you do a natural join in
|
||||
SQL light it looks for two fields with the same name and if it can find two fields with the
|
||||
same name it will put those rows together you know the two fields the field in the old table is
|
||||
has a size and a path and the field in the new table has a size and a path and the old table
|
||||
is the sizes called old size and in the new table the size field is called new size but the path
|
||||
is called path and both tables so when you do the natural join it will put put those two fields
|
||||
or put those rows together matching them on the path so if there's a row in the old table with
|
||||
a path and a row in the new table with the same path it'll put those all in the same line and then
|
||||
if you do old size minus new size it will give you the different so also in the query I had
|
||||
to where old size is less than new size so it only showed me the paths that grew so if a
|
||||
directory shrank for whatever reason I don't care and then order by the size difference
|
||||
descending so the query would give me a path it's old size it's new size and the difference I know
|
||||
it listed by difference and it would only show me the ones that were growing okay but that
|
||||
process was still a little manual because I would have to well manually execute that SQL
|
||||
every time and each time I would have to substitute in the when I imported the files into a table I
|
||||
have to substitute the most recent I'm gonna have to stop it a little bit it's loud so I figured
|
||||
out that you know to run this thing manually I had to know the names of the files and substitute
|
||||
them in and then load those into the tables and then execute the SQL and it worked great
|
||||
wasn't definitely not automatic so to automate it I wanted to write a script that would
|
||||
create the database load the file it's execute the SQL and then email me the results
|
||||
that way it could just be automatic and I wouldn't have to think about it and once a week there
|
||||
would be the report I'm looking for in my inbox so one one problem I ran into now I wasn't
|
||||
expecting in hindsight it makes sense but at first I would just try to execute these SQL
|
||||
light commands you know to create the tables and to run the report sort of a single line at a
|
||||
time in bash and it didn't occur to me that whenever you know you invoke SQL light and you don't
|
||||
give it a file name it automatically uses an in-memory database so you run SQL light give it the
|
||||
command to create the table and then SQL like what exit well then that in-memory database would be
|
||||
gone so when you open SQL light you got to do everything at once all at once if you're gonna do it
|
||||
in memory or it just goes away so when I had the bash script do instead of executing these commands
|
||||
one of the time was echo the commands out to a file and then runs SQL light and give it that file
|
||||
file name and then it would SQL light there's one SQL light command basically just open the file
|
||||
looping through looping through all the commands in that one file and then outputting it to a CSV
|
||||
so the way that looks is it's just a bunch of well first is a variable that
|
||||
first there's a couple variables that set the name of the files and because I use the date command
|
||||
when I created the files back when I had the cron job I can use the same date command to
|
||||
set the name of the variable and the file names and so I do that with you know with today's file
|
||||
and then in the script I call it yesterday's file but it's actually last week's file and you can
|
||||
when you run the date command you can give it a dash dash date equals and in this case I give it
|
||||
in quotes seven days ago so when it prints the date it'll print the date as of a week ago which was
|
||||
convenient because that's the last time I ran the command to generate the file list so then
|
||||
after I set the variables there's just a bunch of echo commands that sort of build one line at a
|
||||
time the SQL file and then once that's once the SQL file is built I just run the SQL
|
||||
like command and send it the file that I created and then it spits out a CSV the last thing in
|
||||
the bash script is to mail me the CSV file so I do that with a malex command so malex dash a you
|
||||
can give it an attachment so do malex dash a dash s with a subject and then my email address and
|
||||
that that bash script runs in a cron job just a few hours after the first one so that way I know
|
||||
that's a the job that creates the file runs at three o'clock in the morning and the job that
|
||||
sends me the script runs me runs at eleven o'clock at night summarise all that sort of all those
|
||||
rambly bits that you just heard I have a cron job that runs the DU command the output of the
|
||||
DU command the output of the DU command is ultimately a list of files and their sizes a list of
|
||||
directories and their sizes separate about tab so I do that every week the next step is to
|
||||
the next thing I do is run a bash script the bash script basically just builds a SQL file
|
||||
the SQL creates two tables one for the old one for the new imports the old file and the new file
|
||||
sets set the couple variables so that'll output a nice looking CSV and then
|
||||
runs the query to give me the list of directories that have grown
|
||||
and orders it by the amount they have grown and then finally in the bash script after it builds
|
||||
the SQL executes the SQL it emails me the resulting CSV I feel like I just made the worst episode ever
|
||||
and if you're hearing this obviously sit it in anyway hopefully there'll be some hopefully
|
||||
there'll be some pretty decent notes to go along with it so you can maybe read along and see
|
||||
what up done I've got a handful of other topics in mind so if you didn't think the background noise
|
||||
was too annoying or if my rambly style of explaining things not very well was too bad let me know
|
||||
maybe I can cover cover a few other things I know there's a list of requested topics and there's
|
||||
probably a few on there I could do you guys have a great day
|
||||
you've been listening to hecka public radio at hecka public radio dot org we are a community podcast
|
||||
network that releases shows every weekday Monday through Friday today's show like all our shows
|
||||
was contributed by an hbr listener like yourself if you ever thought of recording a podcast
|
||||
and click on our contributing to find out how easy it really is hecka public radio was found
|
||||
by the digital dog pound and the infonomican computer club and it's part of the binary revolution
|
||||
at binwreff.com if you have comments on today's show please email the host directly leave a comment
|
||||
on the website or record a follow up episode yourself unless otherwise status today's show is
|
||||
released on the creative comments attribution share a live 3.0 license
|
||||
Reference in New Issue
Block a user