- MCP server with stdio transport for local use - Search episodes, transcripts, hosts, and series - 4,511 episodes with metadata and transcripts - Data loader with in-memory JSON storage 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
159 lines
13 KiB
Plaintext
159 lines
13 KiB
Plaintext
Episode: 2113
|
|
Title: HPR2113: sqlite and bash
|
|
Source: https://hub.hackerpublicradio.org/ccdn.php?filename=/eps/hpr2113/hpr2113.mp3
|
|
Transcribed: 2025-10-18 14:28:50
|
|
|
|
---
|
|
|
|
This is HPR Episode 2113 entitled, SQLite and Bash.
|
|
It is hosted by first-time host Norris and is about 15 minutes long.
|
|
The summer is using cron, SQLite and Bash to find directory rows.
|
|
This episode of HPR is brought to you by an honesthost.com.
|
|
Get 15% discount on all shared hosting with the offer code, HPR15, that's HPR15.
|
|
Get your web hosting that's honest and fair at An Honesthost.com.
|
|
So I'm going to talk real quick about problem, I needed to solve and some tech that I used
|
|
to solve it.
|
|
I work for a company where we let users add content or upload files and we also have some
|
|
processes that create files and the file system was getting a little out of hand and I was
|
|
having to work kind of hard to keep it trimmed.
|
|
I wanted a way to track down directories or files that were growing and be able to track
|
|
the growth of some files and directories.
|
|
So I started working on the problem, I went through several sort of processes and first
|
|
I thought I could just have a list of directories I wanted a monitor and that could come up
|
|
some way to loop through that list and figure out the file sizes and I figured out how
|
|
to do this using Python and it was terribly inefficient, it's not that Python's fault.
|
|
It's just, I didn't do a good job, it's not a good method to start with a list of files
|
|
and to loop through the list and individually calculate the files out of it.
|
|
So I had this, I had written this Python script and it didn't work real good and it didn't
|
|
do what I wanted, it didn't run it often enough so it was just kind of bad.
|
|
So the next thing I wanted to try was doing everything in BASH instead of Python so I knew
|
|
the EU command so I sort of thinking was I could do some similar loop through a list
|
|
of files, run a DU on every one of those files or directories, make some output, it was
|
|
having trouble coming up with a list of files and you know I knew that if I was a static
|
|
list that if someone added something new that it wouldn't pick it up so I was just, at
|
|
some point I was in the directory that I wanted a monitor and I just typed DU without
|
|
me real arguments and I realized that whenever you just ran DU that it listed out every directory
|
|
in that directory so it listed out every sub directory.
|
|
It basically walked down the directory tree listing every directory and it saws and it was
|
|
sort of a little aha moment that I had that I didn't need to loop through anything, I didn't
|
|
need to give it a list, I could just run the DU command and it would print out every
|
|
directory that I wanted a monitor and it saws.
|
|
So when I finally figured that out I didn't know how I would compare things week to week
|
|
yet but I knew that I needed to at least start recording these file sizes and then I would
|
|
figure that out later so the obvious way to run this was from a cron job so I set up a
|
|
cron job to run weekly to basically just run the DU command so I would give it the dash
|
|
M flag so a DU dash M and then the path to the sort of the overarching melt point where
|
|
all these files that I wanted a monitor were located and then I would send that output
|
|
to file and I would name the file based on the date.
|
|
So just a quick diversion, I feel like there's not even episodes on brace expansion and command
|
|
substitution and I use command substitution to get the date and to output the file to
|
|
a file name that included that date so I'll try to put some example commands but you'll
|
|
see if I'm able to do that you'll see where I have the date command.
|
|
Okay so now what I've got is once a week I've got a job that runs and gives me a list
|
|
of all the directories I want a monitor and their sizes so I need to figure out how to
|
|
put a couple of those lists together and figure out what the difference is you know how
|
|
much you know I've got one file path one week it's the size and the next week it's
|
|
the size okay what's the difference and is it growing or shrinking and which ones are
|
|
growing the most so I went through I don't know a few different things to try I tried
|
|
to use in a the diff command and that I mean that would find differences but it doesn't
|
|
really do anything other than on differences and it's really what it's for so I thought
|
|
about how I could loop through these files and say you know read read the file one line
|
|
of the time and get the file path and the directory size and then try to match that
|
|
up with the file path and directory size and another file and this is sort of the looping
|
|
process like I started out writing it and it just got it became you know a loop within
|
|
a loop and came hard to keep up with what was where and then I had to figure out how
|
|
to do the math to figure out what was what and then you know I had to figure out a way
|
|
to do the math to figure out the difference between you know the you know the first file
|
|
the second file and how much the growth was and then figure out how to sort it and it
|
|
just got it got complicated fast and I was thinking of myself if I could just get this
|
|
information into a database I know enough SQL to do to run a query and get basically the
|
|
output I'm looking for and it seemed kind of silly to set up a database just for this
|
|
and I really considered it making you know setting up a my SQL database and basically making
|
|
a new either a new database or a new table every week and looping through the you know running
|
|
the manually running the SQL and getting a lot of the information I was looking for but
|
|
I really I knew that if I had to do it manually at some point I quit doing it and I forget
|
|
about it so I wanted a way to automate it and I knew at some point I remember picking
|
|
up that SQL light has a in memory database so you can use SQL light with a file obviously
|
|
you can come back to the file and run commands in it but it also it it's really light and
|
|
really fast that was a bus it's really light and really fast if you can do everything in
|
|
memory so when I started experiment with is taking the output of a couple of these DU files
|
|
and loading them into my SQL light tables and then figuring out the query to give me what
|
|
I was looking for you know what what what directories grew the most over the course of a week
|
|
I came up with some SQL that would do it that would load the two files into their own
|
|
tables so I created a table called old and a table called new and then I would load the
|
|
older file into the old table the newer file into the new table and then I could execute the
|
|
SQL and you can do math in the SQL so you can if you have the old size of the new size you can
|
|
make SQL light you know calculate the difference for you just by doing you know new size minus
|
|
old size so that the query that I ended up running is basically select the file path and old
|
|
size minus new size and I would natural join the two tables and when you do a natural join in
|
|
SQL light it looks for two fields with the same name and if it can find two fields with the
|
|
same name it will put those rows together you know the two fields the field in the old table is
|
|
has a size and a path and the field in the new table has a size and a path and the old table
|
|
is the sizes called old size and in the new table the size field is called new size but the path
|
|
is called path and both tables so when you do the natural join it will put put those two fields
|
|
or put those rows together matching them on the path so if there's a row in the old table with
|
|
a path and a row in the new table with the same path it'll put those all in the same line and then
|
|
if you do old size minus new size it will give you the different so also in the query I had
|
|
to where old size is less than new size so it only showed me the paths that grew so if a
|
|
directory shrank for whatever reason I don't care and then order by the size difference
|
|
descending so the query would give me a path it's old size it's new size and the difference I know
|
|
it listed by difference and it would only show me the ones that were growing okay but that
|
|
process was still a little manual because I would have to well manually execute that SQL
|
|
every time and each time I would have to substitute in the when I imported the files into a table I
|
|
have to substitute the most recent I'm gonna have to stop it a little bit it's loud so I figured
|
|
out that you know to run this thing manually I had to know the names of the files and substitute
|
|
them in and then load those into the tables and then execute the SQL and it worked great
|
|
wasn't definitely not automatic so to automate it I wanted to write a script that would
|
|
create the database load the file it's execute the SQL and then email me the results
|
|
that way it could just be automatic and I wouldn't have to think about it and once a week there
|
|
would be the report I'm looking for in my inbox so one one problem I ran into now I wasn't
|
|
expecting in hindsight it makes sense but at first I would just try to execute these SQL
|
|
light commands you know to create the tables and to run the report sort of a single line at a
|
|
time in bash and it didn't occur to me that whenever you know you invoke SQL light and you don't
|
|
give it a file name it automatically uses an in-memory database so you run SQL light give it the
|
|
command to create the table and then SQL like what exit well then that in-memory database would be
|
|
gone so when you open SQL light you got to do everything at once all at once if you're gonna do it
|
|
in memory or it just goes away so when I had the bash script do instead of executing these commands
|
|
one of the time was echo the commands out to a file and then runs SQL light and give it that file
|
|
file name and then it would SQL light there's one SQL light command basically just open the file
|
|
looping through looping through all the commands in that one file and then outputting it to a CSV
|
|
so the way that looks is it's just a bunch of well first is a variable that
|
|
first there's a couple variables that set the name of the files and because I use the date command
|
|
when I created the files back when I had the cron job I can use the same date command to
|
|
set the name of the variable and the file names and so I do that with you know with today's file
|
|
and then in the script I call it yesterday's file but it's actually last week's file and you can
|
|
when you run the date command you can give it a dash dash date equals and in this case I give it
|
|
in quotes seven days ago so when it prints the date it'll print the date as of a week ago which was
|
|
convenient because that's the last time I ran the command to generate the file list so then
|
|
after I set the variables there's just a bunch of echo commands that sort of build one line at a
|
|
time the SQL file and then once that's once the SQL file is built I just run the SQL
|
|
like command and send it the file that I created and then it spits out a CSV the last thing in
|
|
the bash script is to mail me the CSV file so I do that with a malex command so malex dash a you
|
|
can give it an attachment so do malex dash a dash s with a subject and then my email address and
|
|
that that bash script runs in a cron job just a few hours after the first one so that way I know
|
|
that's a the job that creates the file runs at three o'clock in the morning and the job that
|
|
sends me the script runs me runs at eleven o'clock at night summarise all that sort of all those
|
|
rambly bits that you just heard I have a cron job that runs the DU command the output of the
|
|
DU command the output of the DU command is ultimately a list of files and their sizes a list of
|
|
directories and their sizes separate about tab so I do that every week the next step is to
|
|
the next thing I do is run a bash script the bash script basically just builds a SQL file
|
|
the SQL creates two tables one for the old one for the new imports the old file and the new file
|
|
sets set the couple variables so that'll output a nice looking CSV and then
|
|
runs the query to give me the list of directories that have grown
|
|
and orders it by the amount they have grown and then finally in the bash script after it builds
|
|
the SQL executes the SQL it emails me the resulting CSV I feel like I just made the worst episode ever
|
|
and if you're hearing this obviously sit it in anyway hopefully there'll be some hopefully
|
|
there'll be some pretty decent notes to go along with it so you can maybe read along and see
|
|
what up done I've got a handful of other topics in mind so if you didn't think the background noise
|
|
was too annoying or if my rambly style of explaining things not very well was too bad let me know
|
|
maybe I can cover cover a few other things I know there's a list of requested topics and there's
|
|
probably a few on there I could do you guys have a great day
|
|
you've been listening to hecka public radio at hecka public radio dot org we are a community podcast
|
|
network that releases shows every weekday Monday through Friday today's show like all our shows
|
|
was contributed by an hbr listener like yourself if you ever thought of recording a podcast
|
|
and click on our contributing to find out how easy it really is hecka public radio was found
|
|
by the digital dog pound and the infonomican computer club and it's part of the binary revolution
|
|
at binwreff.com if you have comments on today's show please email the host directly leave a comment
|
|
on the website or record a follow up episode yourself unless otherwise status today's show is
|
|
released on the creative comments attribution share a live 3.0 license
|