Episode: 2113 Title: HPR2113: sqlite and bash Source: https://hub.hackerpublicradio.org/ccdn.php?filename=/eps/hpr2113/hpr2113.mp3 Transcribed: 2025-10-18 14:28:50 --- This is HPR Episode 2113 entitled, SQLite and Bash. It is hosted by first-time host Norris and is about 15 minutes long. The summer is using cron, SQLite and Bash to find directory rows. This episode of HPR is brought to you by an honesthost.com. Get 15% discount on all shared hosting with the offer code, HPR15, that's HPR15. Get your web hosting that's honest and fair at An Honesthost.com. So I'm going to talk real quick about problem, I needed to solve and some tech that I used to solve it. I work for a company where we let users add content or upload files and we also have some processes that create files and the file system was getting a little out of hand and I was having to work kind of hard to keep it trimmed. I wanted a way to track down directories or files that were growing and be able to track the growth of some files and directories. So I started working on the problem, I went through several sort of processes and first I thought I could just have a list of directories I wanted a monitor and that could come up some way to loop through that list and figure out the file sizes and I figured out how to do this using Python and it was terribly inefficient, it's not that Python's fault. It's just, I didn't do a good job, it's not a good method to start with a list of files and to loop through the list and individually calculate the files out of it. So I had this, I had written this Python script and it didn't work real good and it didn't do what I wanted, it didn't run it often enough so it was just kind of bad. So the next thing I wanted to try was doing everything in BASH instead of Python so I knew the EU command so I sort of thinking was I could do some similar loop through a list of files, run a DU on every one of those files or directories, make some output, it was having trouble coming up with a list of files and you know I knew that if I was a static list that if someone added something new that it wouldn't pick it up so I was just, at some point I was in the directory that I wanted a monitor and I just typed DU without me real arguments and I realized that whenever you just ran DU that it listed out every directory in that directory so it listed out every sub directory. It basically walked down the directory tree listing every directory and it saws and it was sort of a little aha moment that I had that I didn't need to loop through anything, I didn't need to give it a list, I could just run the DU command and it would print out every directory that I wanted a monitor and it saws. So when I finally figured that out I didn't know how I would compare things week to week yet but I knew that I needed to at least start recording these file sizes and then I would figure that out later so the obvious way to run this was from a cron job so I set up a cron job to run weekly to basically just run the DU command so I would give it the dash M flag so a DU dash M and then the path to the sort of the overarching melt point where all these files that I wanted a monitor were located and then I would send that output to file and I would name the file based on the date. So just a quick diversion, I feel like there's not even episodes on brace expansion and command substitution and I use command substitution to get the date and to output the file to a file name that included that date so I'll try to put some example commands but you'll see if I'm able to do that you'll see where I have the date command. Okay so now what I've got is once a week I've got a job that runs and gives me a list of all the directories I want a monitor and their sizes so I need to figure out how to put a couple of those lists together and figure out what the difference is you know how much you know I've got one file path one week it's the size and the next week it's the size okay what's the difference and is it growing or shrinking and which ones are growing the most so I went through I don't know a few different things to try I tried to use in a the diff command and that I mean that would find differences but it doesn't really do anything other than on differences and it's really what it's for so I thought about how I could loop through these files and say you know read read the file one line of the time and get the file path and the directory size and then try to match that up with the file path and directory size and another file and this is sort of the looping process like I started out writing it and it just got it became you know a loop within a loop and came hard to keep up with what was where and then I had to figure out how to do the math to figure out what was what and then you know I had to figure out a way to do the math to figure out the difference between you know the you know the first file the second file and how much the growth was and then figure out how to sort it and it just got it got complicated fast and I was thinking of myself if I could just get this information into a database I know enough SQL to do to run a query and get basically the output I'm looking for and it seemed kind of silly to set up a database just for this and I really considered it making you know setting up a my SQL database and basically making a new either a new database or a new table every week and looping through the you know running the manually running the SQL and getting a lot of the information I was looking for but I really I knew that if I had to do it manually at some point I quit doing it and I forget about it so I wanted a way to automate it and I knew at some point I remember picking up that SQL light has a in memory database so you can use SQL light with a file obviously you can come back to the file and run commands in it but it also it it's really light and really fast that was a bus it's really light and really fast if you can do everything in memory so when I started experiment with is taking the output of a couple of these DU files and loading them into my SQL light tables and then figuring out the query to give me what I was looking for you know what what what directories grew the most over the course of a week I came up with some SQL that would do it that would load the two files into their own tables so I created a table called old and a table called new and then I would load the older file into the old table the newer file into the new table and then I could execute the SQL and you can do math in the SQL so you can if you have the old size of the new size you can make SQL light you know calculate the difference for you just by doing you know new size minus old size so that the query that I ended up running is basically select the file path and old size minus new size and I would natural join the two tables and when you do a natural join in SQL light it looks for two fields with the same name and if it can find two fields with the same name it will put those rows together you know the two fields the field in the old table is has a size and a path and the field in the new table has a size and a path and the old table is the sizes called old size and in the new table the size field is called new size but the path is called path and both tables so when you do the natural join it will put put those two fields or put those rows together matching them on the path so if there's a row in the old table with a path and a row in the new table with the same path it'll put those all in the same line and then if you do old size minus new size it will give you the different so also in the query I had to where old size is less than new size so it only showed me the paths that grew so if a directory shrank for whatever reason I don't care and then order by the size difference descending so the query would give me a path it's old size it's new size and the difference I know it listed by difference and it would only show me the ones that were growing okay but that process was still a little manual because I would have to well manually execute that SQL every time and each time I would have to substitute in the when I imported the files into a table I have to substitute the most recent I'm gonna have to stop it a little bit it's loud so I figured out that you know to run this thing manually I had to know the names of the files and substitute them in and then load those into the tables and then execute the SQL and it worked great wasn't definitely not automatic so to automate it I wanted to write a script that would create the database load the file it's execute the SQL and then email me the results that way it could just be automatic and I wouldn't have to think about it and once a week there would be the report I'm looking for in my inbox so one one problem I ran into now I wasn't expecting in hindsight it makes sense but at first I would just try to execute these SQL light commands you know to create the tables and to run the report sort of a single line at a time in bash and it didn't occur to me that whenever you know you invoke SQL light and you don't give it a file name it automatically uses an in-memory database so you run SQL light give it the command to create the table and then SQL like what exit well then that in-memory database would be gone so when you open SQL light you got to do everything at once all at once if you're gonna do it in memory or it just goes away so when I had the bash script do instead of executing these commands one of the time was echo the commands out to a file and then runs SQL light and give it that file file name and then it would SQL light there's one SQL light command basically just open the file looping through looping through all the commands in that one file and then outputting it to a CSV so the way that looks is it's just a bunch of well first is a variable that first there's a couple variables that set the name of the files and because I use the date command when I created the files back when I had the cron job I can use the same date command to set the name of the variable and the file names and so I do that with you know with today's file and then in the script I call it yesterday's file but it's actually last week's file and you can when you run the date command you can give it a dash dash date equals and in this case I give it in quotes seven days ago so when it prints the date it'll print the date as of a week ago which was convenient because that's the last time I ran the command to generate the file list so then after I set the variables there's just a bunch of echo commands that sort of build one line at a time the SQL file and then once that's once the SQL file is built I just run the SQL like command and send it the file that I created and then it spits out a CSV the last thing in the bash script is to mail me the CSV file so I do that with a malex command so malex dash a you can give it an attachment so do malex dash a dash s with a subject and then my email address and that that bash script runs in a cron job just a few hours after the first one so that way I know that's a the job that creates the file runs at three o'clock in the morning and the job that sends me the script runs me runs at eleven o'clock at night summarise all that sort of all those rambly bits that you just heard I have a cron job that runs the DU command the output of the DU command the output of the DU command is ultimately a list of files and their sizes a list of directories and their sizes separate about tab so I do that every week the next step is to the next thing I do is run a bash script the bash script basically just builds a SQL file the SQL creates two tables one for the old one for the new imports the old file and the new file sets set the couple variables so that'll output a nice looking CSV and then runs the query to give me the list of directories that have grown and orders it by the amount they have grown and then finally in the bash script after it builds the SQL executes the SQL it emails me the resulting CSV I feel like I just made the worst episode ever and if you're hearing this obviously sit it in anyway hopefully there'll be some hopefully there'll be some pretty decent notes to go along with it so you can maybe read along and see what up done I've got a handful of other topics in mind so if you didn't think the background noise was too annoying or if my rambly style of explaining things not very well was too bad let me know maybe I can cover cover a few other things I know there's a list of requested topics and there's probably a few on there I could do you guys have a great day you've been listening to hecka public radio at hecka public radio dot org we are a community podcast network that releases shows every weekday Monday through Friday today's show like all our shows was contributed by an hbr listener like yourself if you ever thought of recording a podcast and click on our contributing to find out how easy it really is hecka public radio was found by the digital dog pound and the infonomican computer club and it's part of the binary revolution at binwreff.com if you have comments on today's show please email the host directly leave a comment on the website or record a follow up episode yourself unless otherwise status today's show is released on the creative comments attribution share a live 3.0 license