Episode: 544 Title: HPR0544: HPR: A private data cloud Source: https://hub.hackerpublicradio.org/ccdn.php?filename=/eps/hpr0544/hpr0544.mp3 Transcribed: 2025-10-07 22:50:37 --- name is Kanfar and today's episode of Hacker Public Radio I'm going to talk about setting up a secure private data cloud solution where I use Ersenc over SSH to back up my data to remote location. Over the last few years I've stopped using an analog camera and an analog video camera. And as a result, you know, I don't print out photos anymore because, you know, the real is never full of such, which kind of explains why my parents don't have any recent picture of the kids, but you got to realize if you're in this situation as well, you got to ask yourself the question, well, how secure and safe is my data? Because up until this point, basically if your heart is crashed, there wasn't really that much stuff on us that was going to be important. Maybe you had a backup CD of a few documents that you had, but, you know, maybe you needed to re-create your CV or whatever your resume. So things have changed though now because the only copy of your charged photos are going to be on that computer and the only memories of your, of your, the videos of your kids are going to be on that, on that hard disk. And over sometimes as the capacity of hard disks have increased, the reliability has decreased and they're doing an awful lot of tricks on the controller to make sure that they, as data sectors fail that they're moved seamlessly to another location, but you can't really trust that. I've included a PDF study done at Google and it shows the reliability failure of data disks over time, you know, from 2% up to 50%, and within five years you can more or less be guaranteed that your hard disk is going to fail. So, you know, what was worrying there as well in that study was that the smart monitoring tools that are supported to report the health of the hard disk in many of the cases they didn't. So you probably won't even have any warning until it's too late that something's going wrong. And traditionally the way we backed up stuff as we copied the important files onto a CD or, you know, copy a few documents and a few spreadsheets onto a USB stick and we're happy out. But those mediums themselves I've already spoken about, these are unreliable and to grade over time. And DV video from four years ago is already bad, DVDs and CDs get corrupted and it's difficult to read them. Worse is the SD cards if they fail, that's it. There's nothing left, you're in big trouble, but the fact of the matter is the capacity of these backup mediums are okay for a few documents, but when you're talking about photos and videos, especially when we're talking about HD stuff, they're there isn't enough disk capacity to be able to cover this. So the problem is that data capacity is increased, hard disk capacity is increased, but the backup, the means to backup this hasn't. So of course the solution of this is the problem, is the solution is to go out and buy yourself even more hard disks and you can pick up a 1.5 terabyte hard disk from Amazon, I just checked there and it was 95 dollars and equivalent over here in Europe, I won't terabyte for 70 euros from my com, that and now so basically the solution to this problem of hard disk failing is a juggling act where you move data from one to the other and you as hard disk fail, you replace them. So you're looking at something like a mirroring array where everything is written to both disks and the phone fails, you still have the backup or a red 5 array where you've got three or more disks where the phone fails, there's enough information on the other two to be able to recreate the third. Now there are also proprietary solutions out there like a drawable which is an all-on-one unit and you just can connect it to the network, but there are also a sub 100 euro dollar NAS solutions network attached storage solutions, my brother has a HP one that runs devian and I've just seen a few, see them every week small little low power devices that can take serial 8a hard disk someone automatically murder them, no thinking required. So if you got that in your home then at least the risk of one hard disk failing is kind of eliminated or at least minimized somewhat, you've got to be keeping an eye on to make sure that if one hard disk fails that you notice it so that you can replace it in time, so you've got to do your homework still. Okay so you've got a good solution but as any good system in will tell you there is no such thing, raid and murdering are not backup solutions there, you still have to have a backup solution and in this case you've got to have an offside backup solution because what if somebody comes in and you know there's a power outage and it accidentally fries all the disks, so your you know your network text storage blows up and goes on fire or somebody comes in and steals them or there's a flood or whatever. So you really want to have an offside strategy and a lot of this stuff has been around for you know these concepts have been around for years but they've been limited to like banks or wherever but now you have important data that's stored only electronically you've got to make sure that it is replicated to offside locations and what you can do the first thing that springs to mind is something like like a Gmail drive or a Ubuntu one or Dropbox and even even the Twitch network add for carbon and I pro there for a while but the problem with those solutions is they're good for the USB sticky type amounts of data but you start talking about the amount of data I'm talking about so I've got a quarter to a half of terabyte and you're looking at $100 per month and you know the price of a one terabyte hard disk itself is $100 you know is $100 so it's far more economical to simply just buy one or more disks and have them in an offside location and by offside location I mean a family member or friend or some guy and worker acquaintance or colleague your parents if they have broadband your brother if they've got a broadband your sister whatever it makes a lot more sense to put your data over there rather than paying these prices and what I've heard about some of these solutions is that the cost of uploading data is quite cheap but you really pay through the notes when your data is is gone and you need to download it because then they have it by the short and currently it's basically and you have to pay outrageous amounts to get your data back so the solution then is to buy more hard disks and just to put them in different locations and use rsync to replicate the data from your NAS or your rage solution or your server or whatever we'll call it your NAS replicate that be rsync over ssh to the remote location now before I get into anything else I want to say there's like a degree of trust involved on this on both sides first of all you got to trust that on the other side they're not going to be looking at your private and confidential data and on our side you've got to take into account the fact that the person you're you know you're going to have you're going to have access to a machine on the inside of your network and they may be able to access files and your files as well are they may be able to set off processes that might get you in trouble with the law now the way around this of course the way around the looking at the files will be to use encrypted disks and to place instead of shipping just a hard disk you see one of these low cost server NAS servers with just enough of a no s that it boots up and you know gets an IP address registers with dynamic DNS or whatever and notifies you and that you can ssh in and mount an encrypted drive so that that gets over that problem but it doesn't actually get you over the problem with somebody having a machine and and doing nasty stuff on your on your side in the network so proceed with caution but you know if it's family member there's a level of trust there so I wasn't actually that concerned about that to be honest with you so what are you going to do is you know if you both have newly provisioned disks and you don't have a lot of space or you don't have a lot to synchronize you probably have enough space to just go ahead and begin the synchronization but if you do you might want to consider buying a disk you buying the disk that's going to be in the other location putting the data on it and then shipping it physically because there the way rsync works is it's it's it's a bit like a copy on the initial on the initial run so you copy all the data from the source to the destination and then subsequently any changes are met by rsync so you're never going to copy everything over again or at least you shouldn't if you are copying stuff over again it could be that something's wrong like like I had a case where my the parameter I was using was specifying that the user they that you should check the user ID and the group ID to make sure that they were correct and if not modify them so it was recoupping files over because the user ID and group ID I was using on the other side was different so anyway all I got fixed so what I suggest you do on the first run is to use sneakernet where you physically you physically put it on a physical drive but the physical drive in the post the post person DHL or wherever ships it off to the other side and then you you put it in and then you can just synchronize the changes after that so that's a nice convenient solution also another tip if you're synchronizing from a unique system to a Windows system you're going to lose some of your some of the file attributes so users permissions and that sort of thing will be lost so if you can it's better to keep it between Unix systems you can of course you can if you're not worried about that you you know permissions whatever you're just actually worried about the files themselves you can you can happily go out and our sync will support our syncing between between Unix and Windows okay the command so now in our hypothetical situation here we bought an additional one terabyte disk it's been shipped to your home you plug it into your NAS server and it just been mounted as slash media slash disk so the ursync command is and I'll go through these one by one afterwards is rsync dash vva dash dash dry dash run dash dash delete dash dash force and then the source and destination which in my case is slash data also think space slash media disk and so the first one is always the source so slash data also think it's the source and then the second one is also always the destination slash media disk now the dry run and there as you can probably guess is to make sure that your you're lonely go through the motions of copying it it'll give you all the messages or messages of what it's going to do but it doesn't actually do it and that's very very useful on your first synchronization it's a very good idea when you're starting this for the first time to um to put in some test directories both on the source and on the destination and use those until you're sure you've got to sync text down and correct and in those directories I'd usually have identical files on both sides files with the same name with different file size files with the um you know directories with different file size directories on one that are done on the other and then run the command and with that uh with that command you should see what's going on on on both sides so let's go through the command here um or sync the dash vv is the sets that were both the level which increases the amount of information given during the transfer one v is small amount of information the more vz the more information you get then the dry run is you know don't perform any changes they delete will be delete any files on the destination that are not on the source and the force will also delete empty directories on the destination that are not on the source this is where my warning comes you got to be careful when transferring this data that you don't accidentally override anything because if you accidentally put slash media disk space slash data slash also think whatever do is look oh there's a lovely new empty um an empty hard disk here and he wants me to synchronize it over with that uh big location over there with all those with all those files and folders and um yeah i've been told to force and delete then oh okay i'll go and delete everything over there and they on what all my videos and photos are now gone officially in a way to get them back so be careful about what you're doing um might be no harm at this point to mount the destination uh do it through a loopback and mount that drive or that section of your device via read only share so that the chance of that happening is is minimized as in this in this setup um our sync supports you know um synchronize in two ways and you know keeping things in synchronization but in this case what you're doing is you're just replicating it out you're not serve as always going to be the master you know in case some are you know unless you need to get that information back but it's always going to be the master so the assumption here is that if a file is not on the master then delete it from the slave if it's um if it's on the master then put it to the slave that sort of thing okay then we have the um the only other one that's left is the dash a command which actually stands for dash rl p t g o all in or case and then capital D and those are um links recreate the links and simulings on the destination we commit permissions times groups on owners on the destination and their capital D is transfer character block device files names how could some file files so basically all the special files so once you're happy you know what you're doing you've done your test directory you're you're now doing a dry run against your read um read only copy of the um your data drive then the thing that you can do is drop the dry run and then actually do the synchronization to your hard disk and depending on the amount of data it might take a while to synchronize because it needs to do checks on everything but actually it might take that long because there's no destination so there's no comparison going on so the next step might be to ship off the disk to the remote location and then set up or sync over ssh but I prefer to have an additional testing step where I or sync over ssh to a pc in the home so I'll take the disk that we've already or synced put it into a laptop set all the ssh stuff up just like I wanted on the remote location just to make sure everything's working the steps to do this are going to be exactly the same as what we've done before so um so you can work along now what you need to do on your NAS server is you're going to need to generate a new ssh public and private key pair that has no password associations and the reason you're going to do this is that you want to synchronization to occur automatically so you don't need to be able to access the remote system without having to enter a password if you do have a ssh key already it probably got a password on it so you don't want to be there in the middle of the night trying to type your password to get this this script going now there are security concerns about passwordless ssh keys anyone on that NAS whose root on that NAS device of yours will be able to get to that key and then we'll be able to ssh to the other location but kind of seeing as you're the user it's your NAS device I'm assuming that this is a minor security concern so once you've generated your keys you probably want to call something different than the normal one I use rsync dash key so I know what they are so I'll have or sync dash key and then rsync dash key dot pub and they I just take the contents of that dot pub file and I add it to the end of the authorized key files on my laptop and on the remote pc do both at the same time so you get the ssh issues worked out and I've got link in the show notes to journey mates journey mates is website that is more information on how you can go through all this this has also been covered on the hpr network before so I'm not going to go into it too much so once you have the keys generated and the public key copied over to the authorized key files on both laptops you're going to want to ssh into them and the only trick here from a normal ssh is that you need to use the dash i to specify the new name of the ssh key so it's ssh dash i slash home user dot ssh or sync key and then the normal space user at example dot com if this is the first time you've logged into that machine from the NAS server your your NAS server as the user that you're going to do the rsync you're going to need to type in yes so that the ssh d keys from the other side are going to be added to ssh dot ssh known host file so if you're it makes sense to actually log in as that user so that that is done so now you've got the keys on your laptop and you've also got them on the remote location and you've got a console up so that is that's very good you might want to just create a file on both locations to make sure you can create files delete that file again to make sure you can delete files so then that's ssh part about it that's the ssh part done so we're going to put both of those two commands together into the rsync command and the end command this will all be in the show notes as well is rsync dash v a space delete space force dash e and this is the new bit of the dash e tells rsync to use this shell which is ssh dash i slash home user dot ssh slash rsync key double call to space and then the source which is slash data also think that's the same and the destination has changed so it'll be space user at example dot com that we call and also think so that means the ssh user at the dns name or ip address all go well there should be no updates but you may want to try adding, deleting, modifying files and both ends to make sure the process is working correctly and when you're happy you can ship the disk to the other side the only requirement is that the other network ssh is allowed to firewall to your server and that you got a well known public ip addresses if you don't have a static ip address then you can use services like dynamic dns or there's a range and again there'll be a link in the show notes to where you can get that and you should be able to ssh to your server like before if you're not able to connect to the server over port 22 say for example the person you're peering with has already got port 22 in use you can use you can specify a port option in your string to connect in an and an additional port so but the whole point of this synchronization is that it should be seamless so you want your rsync to be rolling constantly and the easiest way to do this is just start a screen session and then run the command that we're given above into a simple loop and that has the advantage of getting you all quickly but it's kind of not very resilient to reboots so i've created a simple script which i put in user local bin also think and it's got the hash slash bin bash while true do date and then the rsync string exactly as it was before then date and then sleep three six zero zero and then done and what that does is the while true do loop look in my last episode on bash loops about that so it puts it into an infinite loop it puts a date at the beginning of the rsync and a date at the end of the rsync so that i can see you know when it ran last and then it sleeps for an hour so i'm not flooding either sides of the script so i take that and i put it into a crontab and you should see my last episode about cront for more information on cront my crontab file is available crontab space dash l and i've got uppercase male two equals double quote double quote so an empty male two line i've got zero space one asterix asterix asterix space timeout space five four zero zero zero zero space user local bin auto sync and then i read that redirect standard output to temp auto sync that log and i redirect standarder to also temp auto sync that log now those of you among us may be thinking well he's put a script with an infinite loop into a crontab surely the script is going to be respond every night and at the end of the year i'm gonna have 365 of these scripts running and chewing up my resources and the answer to that will be yes that is correct except for the fact that i'm calling the script not from cront i'm calling the command that's actually calling the script is the IBM command written by the IBM dudes which is called timeout and what timeout does is it terminates any application after a particular period of time so if you want to run uh i don't want to run a movie for a particular period of time uh playlist and then stop at a particular time you put the timeout and then the number of seconds can highly wanted to stop and then kill the process it's actually very nice um a very nice utility just little two short sweet does something and the reason i did this is that uh orsync does have a patch that allows you to specify what times it runs on my time at dusk but this is a very uh simple quick and simple means to um prevent orsync running during the evening so that so that allows my brother to come home and browse the network you know from four o'clock in the evening to midnight and then my rsync script starts again at one o'clock and continues to four o'clock in the afternoon one o'clock in the morning to four o'clock in the afternoon and then it terminates. I sell the script will run infinitely from one o'clock stop after one o'clock stop at two stop at three four five six with all the way around and then it gets terminated. Now the reason for the male two equals double quotes is that although i'm redirecting the output of autosync to uh a log file the timeout itself is run in a separate process because you yeah you don't want to terminate the application and then timeouts gets terminated and it doesn't kill the other one so that's that will then add entries into your crontab file every time that that runs it's not too serious so the male two basically means you're not getting emailed um if you know in this with somebody else they might not have a problem with urinal 24 or seven as long as you can throttle the bandwidth and orsync does have a switch called dash dash bandwidth limit and then it equals and then you put in the killbrits per second to have a value and that will limit all orsync copies to that to that bandwidth so you might set it at 10% of the total download bandwidth of the person that you're thinking to you can so that that is quite nice so that's basically that with regard to orsync i probably made the whole thing sound a lot more scary and complicated than it is um whatever you can choose to do this or not if there's something you're taking away from the show is that you should you should have some system where everybody in your in your home network saves a file it's going to get saved to some some mirrored or um backed up solution and then that is going to be sent to an offsite location um in this case i've done it with one brother i'm working on the process of doing it with two or possibly three of my other brothers so um that will be there's no reason why this orsync orsync script can't replicate more locations the more the better now i would like to take some time out at the end of this podcast to mention another podcast which i'd like to recommend and that's called screencasters and it's at he thenext.org i'll just give you a little extract from their above page the goal of screencasters.he thenext.org is to provide a means through a simple website of allowing new users in the inkscape community to watch some basic and intimate two tutorials by the authors of this website so he thenext and richer queer and have produced a show that puts a lot of professional tutorials to shame i have sat through some very very sad episodes of uh of videos that i've purchased through work but there's our actual joy to watch i've been watching them from episode one uh on the train on mayes respire one every evening coming home on the train and quite often you see people looking over my shoulder looking at what i'm what i'm looking at they even have you know they go through the whole gamut of uh everything that inkscape can do which is a drawing program if you if you didn't know um they even have many tutorials that will get you started like this is the interface this is the menu bar this is where you can find this this is where you can find that um after watching the entire episodes i'm now looking at posters and work and ads and all sorts of things going on oh that's how they did that and you see that effect there and this logo or this icon and this application has done like that so it's it's really really cool even if you're not interested in graphics like it's it's something you want to download if you know somebody who's using um photoshop or something else you know burn burn some of these onto a DVD and hand it over to them and put a copy of minkscape on their irons and windows as well so yeah good stuff so that is my recommendation for a podcast for this month and with that i'll bid you a due and wrap this one up and hope that uh you will take some time out of your busy day to record a show i'm very interested in hearing other episodes from other people with that thank you very much um talk to you bye thank you for listening to after public radio hpr sponsored by kill.net so head on over to c-a-r-o dot n-c for all of those