Files
hpr-knowledge-base/hpr_transcripts/hpr0544.txt
Lee Hanken 7c8efd2228 Initial commit: HPR Knowledge Base MCP Server
- MCP server with stdio transport for local use
- Search episodes, transcripts, hosts, and series
- 4,511 episodes with metadata and transcripts
- Data loader with in-memory JSON storage

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-26 10:54:13 +00:00

290 lines
25 KiB
Plaintext

Episode: 544
Title: HPR0544: HPR: A private data cloud
Source: https://hub.hackerpublicradio.org/ccdn.php?filename=/eps/hpr0544/hpr0544.mp3
Transcribed: 2025-10-07 22:50:37
---
name is Kanfar and today's episode of Hacker Public Radio I'm going to talk about setting
up a secure private data cloud solution where I use Ersenc over SSH to back up my data
to remote location.
Over the last few years I've stopped using an analog camera and an analog video camera.
And as a result, you know, I don't print out photos anymore because, you know, the real
is never full of such, which kind of explains why my parents don't have any recent picture
of the kids, but you got to realize if you're in this situation as well, you got to ask
yourself the question, well, how secure and safe is my data?
Because up until this point, basically if your heart is crashed, there wasn't really
that much stuff on us that was going to be important.
Maybe you had a backup CD of a few documents that you had, but, you know, maybe you needed
to re-create your CV or whatever your resume.
So things have changed though now because the only copy of your charged photos are going
to be on that computer and the only memories of your, of your, the videos of your kids
are going to be on that, on that hard disk.
And over sometimes as the capacity of hard disks have increased, the reliability has decreased
and they're doing an awful lot of tricks on the controller to make sure that they, as
data sectors fail that they're moved seamlessly to another location, but you can't really
trust that.
I've included a PDF study done at Google and it shows the reliability failure of data
disks over time, you know, from 2% up to 50%, and within five years you can more or less
be guaranteed that your hard disk is going to fail.
So, you know, what was worrying there as well in that study was that the smart monitoring
tools that are supported to report the health of the hard disk in many of the cases they
didn't.
So you probably won't even have any warning until it's too late that something's going
wrong.
And traditionally the way we backed up stuff as we copied the important files onto a CD
or, you know, copy a few documents and a few spreadsheets onto a USB stick and we're happy
out.
But those mediums themselves I've already spoken about, these are unreliable and to grade
over time.
And DV video from four years ago is already bad, DVDs and CDs get corrupted and it's difficult
to read them.
Worse is the SD cards if they fail, that's it.
There's nothing left, you're in big trouble, but the fact of the matter is the capacity
of these backup mediums are okay for a few documents, but when you're talking about
photos and videos, especially when we're talking about HD stuff, they're there isn't enough
disk capacity to be able to cover this.
So the problem is that data capacity is increased, hard disk capacity is increased, but the
backup, the means to backup this hasn't.
So of course the solution of this is the problem, is the solution is to go out and buy yourself
even more hard disks and you can pick up a 1.5 terabyte hard disk from Amazon, I just
checked there and it was 95 dollars and equivalent over here in Europe, I won't terabyte for
70 euros from my com, that and now so basically the solution to this problem of hard disk failing
is a juggling act where you move data from one to the other and you as hard disk fail,
you replace them.
So you're looking at something like a mirroring array where everything is written to both
disks and the phone fails, you still have the backup or a red 5 array where you've got
three or more disks where the phone fails, there's enough information on the other two to
be able to recreate the third.
Now there are also proprietary solutions out there like a drawable which is an all-on-one
unit and you just can connect it to the network, but there are also a sub 100 euro dollar
NAS solutions network attached storage solutions, my brother has a HP one that runs devian and
I've just seen a few, see them every week small little low power devices that can take
serial 8a hard disk someone automatically murder them, no thinking required.
So if you got that in your home then at least the risk of one hard disk failing is kind
of eliminated or at least minimized somewhat, you've got to be keeping an eye on to make
sure that if one hard disk fails that you notice it so that you can replace it in time,
so you've got to do your homework still.
Okay so you've got a good solution but as any good system in will tell you there is no such
thing, raid and murdering are not backup solutions there, you still have to have a backup solution
and in this case you've got to have an offside backup solution because what if somebody comes in
and you know there's a power outage and it accidentally fries all the disks, so your
you know your network text storage blows up and goes on fire or somebody comes in and steals them
or there's a flood or whatever. So you really want to have an offside strategy and a lot of this
stuff has been around for you know these concepts have been around for years but they've been
limited to like banks or wherever but now you have important data that's stored only electronically
you've got to make sure that it is replicated to offside locations and what you can do the first
thing that springs to mind is something like like a Gmail drive or a Ubuntu one or Dropbox and even
even the Twitch network add for carbon and I pro there for a while but the problem with those
solutions is they're good for the USB sticky type amounts of data but you start talking about
the amount of data I'm talking about so I've got a quarter to a half of terabyte and you're looking
at $100 per month and you know the price of a one terabyte hard disk itself is $100 you know
is $100 so it's far more economical to simply just buy one or more disks and have them in an
offside location and by offside location I mean a family member or friend or some guy and
worker acquaintance or colleague your parents if they have broadband your brother if they've got
a broadband your sister whatever it makes a lot more sense to put your data over there rather than
paying these prices and what I've heard about some of these solutions is that the cost of uploading
data is quite cheap but you really pay through the notes when your data is is gone and you need to
download it because then they have it by the short and currently it's basically and you have to
pay outrageous amounts to get your data back so the solution then is to buy more hard disks and
just to put them in different locations and use rsync to replicate the data from your
NAS or your rage solution or your server or whatever we'll call it your NAS replicate that
be rsync over ssh to the remote location now before I get into anything else I want to say there's
like a degree of trust involved on this on both sides first of all you got to trust that on the
other side they're not going to be looking at your private and confidential data and on our side
you've got to take into account the fact that the person you're you know you're going to have
you're going to have access to a machine on the inside of your network and they may be able to
access files and your files as well are they may be able to set off processes that might get you
in trouble with the law now the way around this of course the way around the looking at the files
will be to use encrypted disks and to place instead of shipping just a hard disk you see one of these
low cost server NAS servers with just enough of a no s that it boots up and you know gets an IP address
registers with dynamic DNS or whatever and notifies you and that you can ssh in and mount an
encrypted drive so that that gets over that problem but it doesn't actually get you over the problem
with somebody having a machine and and doing nasty stuff on your on your side in the network so
proceed with caution but you know if it's family member there's a level of trust there so I wasn't
actually that concerned about that to be honest with you so what are you going to do is you know
if you both have newly provisioned disks and you don't have a lot of space or you don't have a lot
to synchronize you probably have enough space to just go ahead and begin the synchronization
but if you do you might want to consider buying a disk you buying the disk that's going to be in the
other location putting the data on it and then shipping it physically because
there the way rsync works is it's it's it's a bit like a copy on the initial on the initial run
so you copy all the data from the source to the destination and then subsequently any changes are
met by rsync so you're never going to copy everything over again or at least you shouldn't
if you are copying stuff over again it could be that something's wrong like
like I had a case where my the parameter I was using was specifying that the user
they that you should check the user ID and the group ID to make sure that they were correct and if not
modify them so it was recoupping files over because the user ID and group ID I was using on
the other side was different so anyway all I got fixed so what I suggest you do on the first
run is to use sneakernet where you physically you physically put it on a physical drive but the
physical drive in the post the post person DHL or wherever ships it off to the other side and then
you you put it in and then you can just synchronize the changes after that so that's a nice
convenient solution also another tip if you're synchronizing from a unique system to a Windows system
you're going to lose some of your some of the file attributes so users permissions and that
sort of thing will be lost so if you can it's better to keep it between Unix systems
you can of course you can if you're not worried about that you you know permissions whatever
you're just actually worried about the files themselves you can you can happily go out and
our sync will support our syncing between between Unix and Windows okay the command so now in our
hypothetical situation here we bought an additional one terabyte disk it's been shipped to your home
you plug it into your NAS server and it just been mounted as slash media slash disk so the
ursync command is and I'll go through these one by one afterwards is rsync dash vva dash dash
dry dash run dash dash delete dash dash force and then the source and destination which in my
case is slash data also think space slash media disk and so the first one is always the source
so slash data also think it's the source and then the second one is also always the destination
slash media disk now the dry run and there as you can probably guess is to make sure that your
you're lonely go through the motions of copying it it'll give you all the messages or
messages of what it's going to do but it doesn't actually do it and that's very very useful on your
first synchronization it's a very good idea when you're starting this for the first time to
um to put in some test directories both on the source and on the destination and use those
until you're sure you've got to sync text down and correct and in those directories I'd usually
have identical files on both sides files with the same name with different file size files with the
um you know directories with different file size directories on one that are done on the other
and then run the command and with that uh with that command you should see what's going on on
on both sides so let's go through the command here um or sync the dash vv is the sets that were
both the level which increases the amount of information given during the transfer one v is
small amount of information the more vz the more information you get then the dry run is you know
don't perform any changes they delete will be delete any files on the destination that are not
on the source and the force will also delete empty directories on the destination that are not
on the source this is where my warning comes you got to be careful when transferring this data
that you don't accidentally override anything because if you accidentally put slash media disk space
slash data slash also think whatever do is look oh there's a lovely new empty um an empty hard disk
here and he wants me to synchronize it over with that uh big location over there with all those
with all those files and folders and um yeah i've been told to force and delete then oh okay i'll
go and delete everything over there and they on what all my videos and photos are now gone officially
in a way to get them back so be careful about what you're doing um might be no harm at this point
to mount the destination uh do it through a loopback and mount that drive or that section of your
device via read only share so that the chance of that happening is is minimized as in this in this
setup um our sync supports you know um synchronize in two ways and you know keeping things in synchronization
but in this case what you're doing is you're just replicating it out you're not serve as always
going to be the master you know in case some are you know unless you need to get that information
back but it's always going to be the master so the assumption here is that if a file is not on the
master then delete it from the slave if it's um if it's on the master then put it to the slave that
sort of thing okay then we have the um the only other one that's left is the dash a command which
actually stands for dash rl p t g o all in or case and then capital D and those are um links
recreate the links and simulings on the destination we commit permissions times groups on owners on
the destination and their capital D is transfer character block device files names how could some
file files so basically all the special files so once you're happy you know what you're doing you've
done your test directory you're you're now doing a dry run against your read um read only copy of
the um your data drive then the thing that you can do is drop the dry run and then actually do
the synchronization to your hard disk and depending on the amount of data it might take a while to
synchronize because it needs to do checks on everything but actually it might take that long because
there's no destination so there's no comparison going on so the next step might be to ship
off the disk to the remote location and then set up or sync over ssh but I prefer to have an
additional testing step where I or sync over ssh to a pc in the home so I'll take the disk that
we've already or synced put it into a laptop set all the ssh stuff up just like I wanted on the
remote location just to make sure everything's working the steps to do this are going to be exactly
the same as what we've done before so um so you can work along now what you need to do on your
NAS server is you're going to need to generate a new ssh public and private key pair that has no
password associations and the reason you're going to do this is that you want to synchronization to
occur automatically so you don't need to be able to access the remote system without having to enter
a password if you do have a ssh key already it probably got a password on it so you don't want to
be there in the middle of the night trying to type your password to get this this script going
now there are security concerns about passwordless ssh keys anyone on that NAS
whose root on that NAS device of yours will be able to get to that key and then we'll be able to
ssh to the other location but kind of seeing as you're the user it's your NAS device I'm assuming
that this is a minor security concern so once you've generated your keys you probably want to call
something different than the normal one I use rsync dash key so I know what they are so I'll have
or sync dash key and then rsync dash key dot pub and they I just take the contents of that dot
pub file and I add it to the end of the authorized key files on my laptop and on the remote pc
do both at the same time so you get the ssh issues worked out and I've got link in the show notes
to journey mates journey mates is website that is more information on how you can go through all this
this has also been covered on the hpr network before so I'm not going to go into it too much
so once you have the keys generated and the public key copied over to the authorized key files
on both laptops you're going to want to ssh into them and the only trick here from a normal ssh is
that you need to use the dash i to specify the new name of the ssh key so it's ssh dash i slash
home user dot ssh or sync key and then the normal space user at example dot com if this is the
first time you've logged into that machine from the NAS server your your NAS server as the user
that you're going to do the rsync you're going to need to type in yes so that the ssh d keys from
the other side are going to be added to ssh dot ssh known host file so if you're it makes sense to
actually log in as that user so that that is done so now you've got the keys on your laptop and
you've also got them on the remote location and you've got a console up so that is that's very good
you might want to just create a file on both locations to make sure you can create files delete that
file again to make sure you can delete files so then that's ssh part about it that's the ssh part
done so we're going to put both of those two commands together into the rsync command and
the end command this will all be in the show notes as well is rsync dash v a space delete space
force dash e and this is the new bit of the dash e tells rsync to use this shell which is ssh
dash i slash home user dot ssh slash rsync key double call to space and then the source which
is slash data also think that's the same and the destination has changed so it'll be space
user at example dot com that we call and also think so that means the ssh user at the dns name
or ip address all go well there should be no updates but you may want to try adding,
deleting, modifying files and both ends to make sure the process is working correctly and when
you're happy you can ship the disk to the other side the only requirement is that the other network
ssh is allowed to firewall to your server and that you got a well known public ip addresses
if you don't have a static ip address then you can use services like dynamic dns or
there's a range and again there'll be a link in the show notes to where you can get that
and you should be able to ssh to your server like before if you're not able to
connect to the server over port 22 say for example the person you're peering with has already got
port 22 in use you can use you can specify a port option in your string to connect in an
and an additional port so but the whole point of this synchronization is that it should be seamless
so you want your rsync to be rolling constantly and the easiest way to do this is just start a
screen session and then run the command that we're given above into a simple loop and that has the
advantage of getting you all quickly but it's kind of not very resilient to reboots so i've created
a simple script which i put in user local bin also think and it's got the hash slash bin bash
while true do date and then the rsync string exactly as it was before then date and then sleep
three six zero zero and then done and what that does is the while true do loop look in my last
episode on bash loops about that so it puts it into an infinite loop it puts a date at the
beginning of the rsync and a date at the end of the rsync so that i can see you know when it
ran last and then it sleeps for an hour so i'm not flooding either sides of the script
so i take that and i put it into a crontab and you should see my last episode about cront
for more information on cront my crontab file is available crontab space dash l and i've got
uppercase male two equals double quote double quote so an empty male two line i've got zero space one
asterix asterix asterix space timeout space five four zero zero zero zero space user local bin
auto sync and then i read that redirect standard output to temp auto sync that log and i
redirect standarder to also temp auto sync that log now those of you among us may be thinking well
he's put a script with an infinite loop into a crontab surely the script is going to be
respond every night and at the end of the year i'm gonna have 365 of these scripts running
and chewing up my resources and the answer to that will be yes that is correct except for the fact
that i'm calling the script not from cront i'm calling the command that's actually calling the
script is the IBM command written by the IBM dudes which is called timeout and what timeout does
is it terminates any application after a particular period of time so if you want to run uh i don't
want to run a movie for a particular period of time uh playlist and then stop at a particular time
you put the timeout and then the number of seconds can highly wanted to stop and then kill the process
it's actually very nice um a very nice utility just little two short sweet does something
and the reason i did this is that uh orsync does have a patch that allows you to specify what times
it runs on my time at dusk but this is a very uh simple quick and simple means to um prevent
orsync running during the evening so that so that allows my brother to come home and
browse the network you know from four o'clock in the evening to midnight and then my
rsync script starts again at one o'clock and continues to four o'clock in the afternoon one o'clock in
the morning to four o'clock in the afternoon and then it terminates. I sell the script will
run infinitely from one o'clock stop after one o'clock stop at two stop at three four five six
with all the way around and then it gets terminated. Now the reason for the male two equals
double quotes is that although i'm redirecting the output of autosync to uh a log file the timeout
itself is run in a separate process because you yeah you don't want to terminate the application
and then timeouts gets terminated and it doesn't kill the other one so that's that will then add
entries into your crontab file every time that that runs it's not too serious so the male two
basically means you're not getting emailed um if you know in this with somebody else they might not
have a problem with urinal 24 or seven as long as you can throttle the bandwidth and orsync does
have a switch called dash dash bandwidth limit and then it equals and then you put in the killbrits
per second to have a value and that will limit all orsync copies to that to that bandwidth so you
might set it at 10% of the total download bandwidth of the person that you're thinking to you can so
that that is quite nice so that's basically that with regard to orsync i probably made the whole
thing sound a lot more scary and complicated than it is um whatever you can choose to do this or
not if there's something you're taking away from the show is that you should you should have some
system where everybody in your in your home network saves a file it's going to get saved to some
some mirrored or um backed up solution and then that is going to be sent to an offsite location
um in this case i've done it with one brother i'm working on the process of doing it with two
or possibly three of my other brothers so um that will be there's no reason why this orsync
orsync script can't replicate more locations the more the better now i would like to take some time
out at the end of this podcast to mention another podcast which i'd like to recommend and that's
called screencasters and it's at he thenext.org i'll just give you a little extract from
their above page the goal of screencasters.he thenext.org is to provide a means through a simple
website of allowing new users in the inkscape community to watch some basic and intimate two
tutorials by the authors of this website so he thenext and richer queer and have produced a show
that puts a lot of professional tutorials to shame i have sat through some very very sad episodes
of uh of videos that i've purchased through work but there's our actual joy to watch i've been
watching them from episode one uh on the train on mayes respire one every evening coming home on the
train and quite often you see people looking over my shoulder looking at what i'm what i'm looking
at they even have you know they go through the whole gamut of uh everything that inkscape can do
which is a drawing program if you if you didn't know um they even have many tutorials that will
get you started like this is the interface this is the menu bar this is where you can find this
this is where you can find that um after watching the entire episodes i'm now looking at posters
and work and ads and all sorts of things going on oh that's how they did that and you see that
effect there and this logo or this icon and this application has done like that so it's it's really
really cool even if you're not interested in graphics like it's it's something you want to download
if you know somebody who's using um photoshop or something else you know burn burn some of these
onto a DVD and hand it over to them and put a copy of minkscape on their irons and windows as well so
yeah good stuff so that is my recommendation for a podcast for this month and with that i'll
bid you a due and wrap this one up and hope that uh you will take some time out of your busy day
to record a show i'm very interested in hearing other episodes from other people with that thank
you very much um talk to you bye thank you for listening to after public radio
hpr sponsored by kill.net so head on over to c-a-r-o dot n-c for all of those