Episode: 3205
Title: HPR3205: Backups of your Backups of Backups
Source: https://hub.hackerpublicradio.org/ccdn.php?filename=/eps/hpr3205/hpr3205.mp3
Transcribed: 2025-10-24 18:47:00

---

This is Hacker Public Radio Episode 3205 of Friday, 13 November 2020. Today's show is entitled
Backups of Your Backups of Backups. It is hosted by Operator
and is about 53 minutes long
and carries an explicit flag. The summary is,
do you have backups of your backups? Well, you better.
Listen to this rant.
This episode of HPR is brought to you by An Honesthost.com.
Get 15% discount on all shared hosting with the offer code
HPR15. That's HPR15.
Better web hosting that's Honest and Fair at An Honesthost.com.
Hello everyone and welcome to the episode of Hacker Public Radio with your host operator.
This one's going to be about backups and cloud storage.
And while it's not going to be super technical, it's just going to be an overall thought process
of where I'm at today with backups and cloud backups.
So if you're not interested in backups and having solid backup and backup plans,
this probably isn't for you. So this goes back to as early as Dropbox.
My website has been backed up in various locations locally.
Here within the last couple of years, I've tried to put stuff in the cloud
because I've done some improper local backups.
So I wanted to have cloud backups with retention and things like that in the pipeline.
I've had numerous successfuls and failures.
So I'm going to kind of start from the beginning probably eight years ago.
I started using Dropbox, started backing up just my website stuff and personal files
off of the USB mainly security applications and proof of concept stuff.
Problem with Dropbox is they started marking my stuff as bad every time I tried to share something.
This includes episodes of uppack and public radio that were being blocked as like pirated music or something.
So they must have some kind of algorithm that looks at music and says whether or not it's bad or good or indifferent.
And then it was marking my binaries as bad.
So I want to try to share a binary. No one would be able to see it except for me.
And it would mark it as something silly like sharing it has been disabled for this final or something like that.
No one actually tell you why the account has been blocked globally for sharing files.
So I'd go back and forth to Dropbox and get my account logged and then it would still kind of get messed up.
And it was a mess. So finally gave those guys again it was only partial data.
So I had about a terabyte of music and I had an issue where I had done some improper restores
or something happened locally where I ended up losing all of my music.
I don't even remember how that happened. But I have a tendency to somehow improperly do local backups and end up clobbering myself.
So what I wanted was local backups of my music terabyte of my music and then cloud backups and then I kind of started to run out of space.
So I said, okay, well, let me put some of this in the cloud.
So I have just a local, a local music and then the music backed up to the cloud and that's it.
I don't have a local backup. That proved to be not worth the time and effort based on this current restore.
So go back a little bit more. I tried to use Amazon.
First of all, I started using Amazon because I had an Amazon account.
They blocked things like our clone and waste to do backups yourself without their backup plant.
Their Windows backup plant was awful. Amazon's backup cloud backup thing is awful.
Their photos I think might be better. But the Amazon backup cloud thingy ran as Chrome.
And if you had any issues with your Chrome browser and the Chrome it was running it was like they were talking to each other and freezing up plugins.
And it was causing issues across the board while running that thing.
My wife is running it now on her own account that I don't even have access to.
So there's no risk there.
And the Dropbox stuff I mentioned Google Drive I used for a little while.
They did the same thing. I backed up a terabyte worth of stuff and I started having issues with sharing and things like that.
And a lot of these backup clients don't really give me the worm and fuzzies about the data that's being backed up
and if it's all there. So what I want is here's the local, here's the backup, here's the difference.
Here's the local, here's the backup, here's the difference.
And none of these backup clients seem to really tell you whether or not everything has been backed up successfully and things like that.
I'll say Dropbox is one of the most intuitive ones where you could search for files that have been deleted.
It's very user-friendly. But you start getting into bulk restores or bulk checking of backups and validity of backups.
Things just kind of get weird. Especially with the spider-oat client here.
I had no idea that my website wasn't being backed up properly.
And it's not an error on mine because it was some of it's there and some of it's not.
So it would be fine if my scripts folder was completely empty.
That'd be fine. I could understand that maybe I screwed something up or my stuff folder was completely empty.
And maybe there's something there that I screwed up and I deleted something or moved something around and it deleted it or I lost the archive.
But it was like partial. So the scripts folder had some stuff in it, but there should be, you know, maybe five gigs worth of stuff in the base folder of my scripts.
And it was missing all but like four files. It was like missing everything but like four files.
And they weren't even, they were like new files. So somehow some kind of thing got out of the sink where at some point in time.
I think something happened where there was a shot taken and said, okay, this is the stuff I want to back up and that's it.
And it was not actually backing up the stuff I wanted it to back up.
And somehow I had maybe somehow something came happen with a config where
you know, all I know is that there's no easy way to look at it and say, okay, here's your backups. Here's what you're backing up.
This folder has this many files in it. This folder has that many files in it or just by size even.
Okay, this folder is eight gigs and this folder is 23 gigs. Something's obviously wrong here.
Let's throw an error saying that, you know, these files aren't backed up. Are you sure you don't want to back up these files type of things?
Because these local databases are like, you know, postgres or local databases using like SQLite or something like that.
And they get corrupted and they don't get updated right or something gets flagged on them.
And I guess what you don't have what you think backed up.
So, you know, I've tried both approaches. I've tried the R-Sync method.
I've tried Ardith backup, which is something similar. I've tried Board backup, which is also similar to kind of like an R-Sync that does transactional stuff.
And I just feel like it's me. It's obviously me. I'm trusting too much of the tool.
And then when I go to I go to restore something, it's either not there or, you know, the custom database file is corrupt
or the restore is corrupted somehow in some way. It didn't finish or something.
And then I end up trying to carve it out and then it's easier just for me to copy over something from an older backup.
So, these non-flat file-based backup and restore systems, I just keep having issues with them.
And I don't know why. It's because there's no easy way for me to check my work with these platforms.
And whenever you look at something, there's no easy way to just check it to make sure that you're backing up what you think you're going to backup.
And I feel like that's been my problem.
Because you look at these blogs inside of these backup clients and it's just like errors all over the place.
And, you know, there's stuff like all these logs and they don't make any sense. It's like reading Linux logs when you're having issues.
When you haven't had issues before and Linux and something has issues with an application, Linux tends to have lots of error messages inside of logs that aren't actually errors or failures that present the problem that you're having.
So, what I would suggest is if you're ever using a Linux piece of Linux software, look at the logs.
Keep those logs for like a day or two of normal operations. And you can just get a feel for and call it like good log.
And you can look at the log file compared to the current log file and say, okay, this application for whatever reason is throwing all these errors.
And let me look at the old log and say, are those errors in the old log and not in the new log or vice versa if they're in the new log and not in the old.
Then, you know, they're actually possibly legitimate errors with your software. But Linux tends to just fail, fail, fail, fail, fail. You know, libraries fail. This doesn't happen. This is not configured.
And you don't know when the software is not working. You don't know what the normal is. So, you've got to capture that normal by like capturing logs.
And that's the only way I can think to evaluate these client side stupid backup solutions.
So, I don't know. I just feel like, you know, half of it is me, you know, fat fingering something or setting a flag to delete or something like that.
And, you know, not having the dismounted or something like that and then having, having a big issue where I'm trampling over my backups.
So, what I've ended up with now is just a cheap, you know, three terabyte web host that's like 10 bucks a month.
I can put my website on there. I can put my backups on there. I can encrypt them and kind of keep flat files on there.
Because there's not a ton of stuff. You know, I probably got 50 gigs worth of encrypted stuff images and maybe, you know, I don't know how many gigs worth of music.
I can terabyte worth of music and maybe 500 gigs worth of just miscellaneous stuff that's, you know, personal old files and stuff that's not necessarily sensitive in nature.
It's just old files that I use, like, you know, phone app data and old backups of stuff that's not super sensitive.
So, I feel like what I'm doing now is changing my approach to websites going to be the same place I put my backups on, which is pretty risky.
It's a VPS and you don't have those local, then you're kind of screwed.
I would suggest not having any non-local backups. You know, these drives are going to come down, come Christmas, you got 12 terabyte drives that are going to be relatively cheap.
You can pop them out and shim them to go inside of a normal server. That's what we did with the 10 terabytes.
I think I got one or one of those on sale and shimmed it and put it in there and used that for media storage.
But for backups, you know, I'm just going to go probably by two more 12 terabyte drives, one for more media and the other one for my backups.
Do my local backups and then do cloud backups. So that way I have my data again in three places, but not necessarily in the cloud in three places.
So I had my website in the cloud in two places and then I had it local.
And my local backup drive, external drive hardware failure and I tried to use FDES to repair it.
Now, pro tip, do not use FDES to repair a drive that is in a hardware failed state.
So if you're having issues with the drive, especially an external drive, do not run FSCK on that drive.
Because, and you try to fix something, that hardware based failure will sit there and try to fix the disk.
And it will give you issues and you'll have all these I note issues and the whole thing will be totally wrecking itself and making it worse.
So I ran FSCK on it so many times that I couldn't mount the disk anymore.
And like I know the raw messed up and the metadata for the files, whatever the MFT, but for, you know, whatever XFAT is or whatever it is, was all garbled up and messed up.
Even after I pulled it out of the enclosure.
And that was another thing that was telling me issues like the external enclosure was the hardware issue and it was kind of bottlenecking and freezing out and losing them out.
And I said, well, you know, it's fine. It kind of works. And if I dismount it and remount it seems to work okay for a couple of months and then it gives me issues.
So I kind of had the backup drive had issues with the external portion of it for some time.
And I kind of was ignoring it.
Thinking that, you know, worst case scenario, I just had the restore from the cloud, which my cloud backups were broken too.
So I had local backups that were broken, the cloud backups that were broken, and then a BPS writer that decided to break my internet access to my server.
And so I had no way to get to my website data.
And I'm sure I've lost some portion of my music I still have, but I'm sure I've lost some files within whatever, because I have no idea
what got backed up in spider up and what didn't get backed up in spider up.
Because it's like completely random, what got backed up and what didn't.
But anyways, my new approach here is going to be just a standard R-Sync to my website.
And I'm going to use that very cautiously, maybe backup configurations and things like that.
Make custom backups for iteration type of stuff.
So my website might have two or three GZIP files, which is a big giant GZIP file of my backup of my website.
And I might have two or three of those archived locally in on the cloud.
So that way, you know, my website dies.
I've got three different backups in three different locations.
And one of those at least should be good.
I don't know.
It just seems like if I do the one-to-one backups, I have issues.
And if I do the incremental backups, I have issues.
So it's like either way I seem to have issues with backups and things like that.
What I will say is, you know, keep it simple, stupid.
And that's not the case that I have here.
You know, I have selective things that I want to back up.
And I've got some stuff in here here and stuff in here there.
But, you know, drives and symbolic links and stuff all over the place.
So really managing what you need to backup kind of becomes complicated.
And that's why sometimes I feel like I just want to get a raid box
and spend, you know, six grand on a raid and be done with it.
And then I'll have my local backups.
And I won't have to worry about storage or any of that crap.
And I'll just have to set up alerts and, you know, manually check it every month or whatever the case may be.
But then you get down to, okay, you're paying a bunch of money for a raid.
And I don't necessarily need that much space.
I only need, you know, a terabyte or terabyte, not two terabytes max worth of backup storage or storage.
And, you know, a raid is not backup.
So just because I have a raid, that's not a backup.
It has to be an actual physical copy of the data somewhere else.
So just because you have a raid, a raid is not a backup.
So when you have drives fail or the backplane fails, you're totally screwed
because you don't have your data anywhere else.
So, you know, just because you have a raid doesn't mean you're not going to have storage issues or data loss.
So kind of the raid is out of the question because it's not a cost effective B.
If the backplane goes out on one of these Drobos or Synology things,
if the backplane goes out and might have to buy a new one and rebuild the headers
and God knows what kind of mess that would be to try to get my data back.
I just don't have that much data that I care about.
I mean, I've got roughly, again, I've got four or five images that I care about.
And maybe 60, 70 gigs, 100 gigs worth of personal data and stuff that I care about.
It's just not a ton of data that I really care about.
So I'm going to go back to the kind of R-Sync method and use that along with some caution
and some manual iteration or incremental backup stuff.
I hope this kind of helps somebody.
While this is fresh in my head and I'm furious at, you know, I'm furious at Spyderoke.
I'm kind of furious at myself for not really checking my backups
but there's not really any way to do that anyways unless I do a full restore
and just kind of diff the data and I don't really have enough hard drive space for that.
You know, I'm sitting on maybe a terabyte and a half for free or two terabytes free
and I don't have that data to restore and test.
So, you know, I blame the VPS provider that, you know, I'm 36 bucks a year
they're not on the hook for it really.
There's no validity there but I really think that Spyderoke kind of
dropped the ball on this one too because, you know, I did the best I could.
You know, I've been in IT for, you know, ever and I don't feel like I never really had the warm and fuzzies
with their backup client to begin with and I've never had the warm and fuzzies
with any of these backup clients that aren't like Windows-based.
All these Linux-based backup clients they just kind of churn and churn and you're like,
I guess it's done. I see errors in the logs and the files are there.
It looks like I can navigate around their stupid UI or their web UI and see most of my stuff
but there's no real way to set, you know, like a CRC check against this to that.
And, you know, even with the command line references in Spyderoke, I don't feel like
I don't feel like there's even a way to do that effectively without, you know,
you're doing doing some kind of, you know, manual restore or some kind of exercise
where you're sending metadata back and forth to check. But anyways,
what I'll say is to kind of wrap all this up is if you're using a cloud provider
for your backups, ensure that there's a way that you can look at those backups
at a high level and say, yep, this folder is backed up.
This folder has this many, you know, iterations. I can restore some of this
and maybe pick a folder or two or three of your most important stuff
and do a manual restore of those and ensure that everything is one to one
because you let these backup clients run and they get garbled up long enough
to where you're doing a restore or setting up a new device or something
and they get all confused. And so you just have to be careful with these backup clients
and not trust them, absolutely. Again, I had Amazon randomly delete
my DVDs, the family movies or family videos that were VHS,
they got converted to DVD. They, for whatever reason, completely got deleted
from my account on Amazon and luckily I had them locally stored
because I had a feeling that if I just had them in the cloud, right,
they might disappear one day and sure enough, I went to send a link,
a previous link that I had shared with my family for these DVDs
and I clicked the link and it was a bunch of empty files, empty folders.
So it wasn't deleted. The folders weren't deleted.
So if I were to go in there and delete them, the folders would be deleted
along with the videos. But the folders remained intact
and there was text files inside of each folder that described each video
and what was on each DVD. So each DVD had its own folder
but there was nothing inside the folder. So something, some automated process
or something in Amazon or something went in,
saw those folders and for whatever reason emptied them out and deleted
the data in there. I don't understand why or how that could have happened.
Maybe with a backup client, you know, I have no idea.
But I lost data. I didn't lose data but I had to go,
luckily I had those DVDs locally stored inside of my personal data
back up stuff, local backups.
If anybody needs any help with our clone,
our sync and kind of doing things the right way,
what I'll say with our sync in our clone is that you can set
what I do to check things out. I'll set a file.
I'll create a specific file that's called test file.
And I'll put that in the folder and I'll ensure that if that test file doesn't exist
then the disk is mounted or there's some other kind of issue with the disk.
So that way I'm not like clobbering data potentially
from, you know, copying something from a source destination
and that destination is empty or not mounted or something like that.
So you can set up file checks to check the source and destination
before you do any copying, especially if you're going to do like a
parsing dash delete to make sure that those files check out,
maybe the space has to check out, right? Do you say,
okay, well, I want to back up my music folder but the music folder has to be
over a terabyte for it to start the backup and the destination secret file
has to exist on the destination before it does a delete flag.
So that way you're ensuring that at least the size is relatively what you want.
The destination is relatively what you want mounted properly in theory
because I've had issues where I've, you know, backed up something
or source destination have been improperly mounted or blank or empty
and then when you go to our sync, it's flashed delete.
It fleets everything because there's nothing there to be mounted
or you run a disk space because, you know, your source isn't,
your destination is not mounted.
So you have to be careful with our sync, especially if you're doing
a delete flags or anything like that.
What else? I think that's pretty much it.
You know, these, the restore and the backup for these clients,
these backup clients are also kind of finicky.
They're much, much slower than your traditional R sync.
So if I had a terabyte and a half worth of data to R sync,
band it all, it'll get done as fast as it can possibly get done.
Whereas these backup clients, you know, they have to go to the server side
and request some, you know, pulling a ticket, some messages
and request a bulk of data.
And if you have a million tiny little files, you know,
it's going to pull down the cache and say,
I want to download these 10,000 files.
And then it sits there in turns and then the server says,
oh, here's your 10,000 files.
And then it digs them out from the spider webs of the internet
and takes, you know, five minutes to pull down 10,000 files.
And then there's a delay in the transfers.
And then it says, oh, here's your next 10,000 files.
And I observed this with multiple clients, especially spider oak.
The traffic would die for about two or three minutes.
Then it would peak and download, you know, maybe a gig or two gigs
or something like that.
And then it would die off for another five minutes.
And then it would peak and download it.
You know, a third of the speed that our sync is.
So if you're using our sync, it's going to be fast.
It's going to be quick.
I've even has issues with IO, local and doing local storage
with our sync because it is so fast and effective, you know,
the IO can get all confused and get synced between different issues.
And that's why you have like a sync on and a sync options
you can have with the file systems.
And you can tell it to mount special with special options
to make it quicker, faster, and sync better up when you're doing stuff.
You know, like EXT4 has some other features in there
that help with like quality of the data.
And then when it comes to like media, you can turn some of those options off.
I'd like to say, let's see if I can find some home for you.
There's some defaults, no fail, X system D, device time out one, no A time.
No dirt A time, data equals right block, barrier equals zero,
no blocks underscore valid validity, no, no, the allocation allocation,
no user, no user underscore XATTR, no CL and barrier zero.
These are all options that I was playing around with,
with having issues of doing, having IO issues between disks and backing up.
So when you're doing like local backups, you got to observe
and make sure you're not having any sync issues, look at D message
and make sure stuff isn't timing out and crapping out in the middle of the backup.
And it's just, there's just so many moving parts with backing up large amounts of data
and just trusting that it's doing what it's supposed to be doing.
And I keep running into issues whether I use backup clients or local clients
or whatever and I just continue to have issues with backups to this day.
And half of it's because of me and half of it's because of the clients are awful.
But hopefully this method will help out and I'll be a little bit more careful
how I do my backups and how I store local backups because just having cloud backups
is not an option, it is not worth it.
And if you're doing around a terabyte and a half, it's still not worth it.
Probably spent three days flapping with the client, trying to get it to download
what half of what it was supposed to be backing up.
So that's the thing I can tell you, it's kind of a rant, kind of a deeper dive
into how I've been doing backups and the issues that I've been having.
It's pretty much it.
I hope this helps somebody out.
As far as backup clients and solutions, there's some S3 options out there.
It's called Wasabi, which is a third of the price of an S3 bucket.
I was looking at those guys, they sell five, it's $3 per terabyte or something.
I was looking at the S3 storage and I realized I just find a cheap one somewhere.
Put your backups on there.
When Christmas comes around, go buy two terabyte drives and do my local backups there
and just call it a day because it's just not worth the issue in the time of these backup clients
and waiting for them to download and whatever.
You've got to have those local backups and use the cloud backups for when you accidentally delete
some shit that you should not have deleted and kind of ensure that your backups to the cloud
are also not mangled and screwed up and have some checks built into there
to say check the validity of your backups kind of automatically when you run them.
Anyways, if you have any questions, I've got notes for FDisk or what do you call it?
FS tab if you want.
I've got notes for Sharsync and things like that, but in general,
you shouldn't have any issues doing local backups.
Coco, hope this helps somebody out, take it easy and you know, check your backups man
because you don't want to lose 20 years worth of stuff like I did.
You've been listening to HECCA Public Radio at HECCA Public Radio.org.
We are a community podcast network that releases shows every weekday Monday through Friday.
Today's show, like all our shows, was contributed by an HPR listener like yourself.
If you ever thought of recording a podcast, then click on our contributing to find out how easy it really is.
HECCA Public Radio was founded by the Digital Dove Pound and the Infonomicon Computer Club
and is part of the binary revolution at binrev.com.
If you have comments on today's show, please email the host directly,
leave a comment on the website or record a follow-up episode yourself.
Unless otherwise status, today's show is released on the Creative Commons,
Attribution, ShareLite, 3.0 license.
www.HECCA Public Radio