Saga of Data-Loss induced paranoia.

My current setup leaves a lot to be desired, specifically due to the fact that I’ve yet to purchase a proper RAID controller and am instead using softRAID, which has many shortfalls. One of those is the fact that while I do have a live redundant copy of all data stored on that array, when one disk dies, as recently happened to me, there is no way to rebuild the array through the software management. Instead, I have to boot up, umount the now single disk, dd if=<working disk> of =<new disk>, reboot, enter RAID setup, rebuild the array, and then remount the new array to the old position. Not something I’m too worried about, since it doesn’t happen that often, but since I can’t leave well enough alone, when it happened this time I decided to reorganize my entire file system, install the server version of Ubuntu, and move everything around in a way that will, ostensibly, give me much more room for expanding my storage space.

tl;dr

Of course, doing this, while relatively new to CLI only commands, I fucked up my syntax and accidentally made only symlinks to files, rather than copies. This is my fault for confusing Windows and Linux copy syntax and doing -s when I meant to do -r. Thus resulting in losing about 150GB of personal shit like photos, and about 100GB of music. Thankfully I was smart enough to break the three arrays I had in the PC prior to copying and reformatting/rebuilding, and I was able to copy everything but the movies to its new home by mounting the drives via USB, and letting that all run over night and then diff’ing the two disks the next day to verify everything was there.

Add that to the half crash of the stupid hdd, which caused the entire 1TB array to be in an essentially useless state, necessitating a thorough lesson in fsck.reiserfs before getting all the data that was on that back, and my data retention paranoia has been renewed.

Prior to my prior setup, I was backing up everything to DVD once every whenever-I-got-around-to-doing-it. This has left me with about 60 DVDs filled with files that are inconsistently named, organized, tagged, and some outright fucked after the backup program I was using renamed any file over 64 characters to a random 64 character string. Of course if I were still using that program, those files may have been usable, however, when my last hdd died, I found that the fucking thing stored the original file name, in a manner that meant that each time I backed up files, the scheme changed, so only the most recent batch of DVDs would be renamed properly, despite being part of a long running job. So now, I’ve got sprinkled through the rest of my files a large number of files, totally readable on Linux, but useless to my Windows based HTPC.

After that all, I started backing up via some two string long tar command that tossed everything into an 800GB tar file on my external drive. Not only did that take about 16 hours to perform, I wasn’t ever really_totally_ sure that I didn’t fuck up my tar syntax somehow, and end up with a somehow totally empty 800GB file.

So my current solution is absolute fucking overkill for a home network, and it looks, or at least will look like the following once I’ve got all of the hardware, like this:

  1. Ubuntu server with 2 1TB RAID1 arrays holding all data. Array0 contains all personal documents, photos, writing, rants, and music. Array1 contains all very large (~1GB <) files, such as movies, ISOs, .rars, and the like. The system partition is a single 300GB disk, partitioned into /,/home,/var,/etc,/tmp. Since the only thing stored on that disk are easily replaced, and, since I’m new to Linux, easily improved upon I’m not very concerned about losing them.
  2. Windows 7 Desktop 1. Main computer, local copies of everything stored on Array0. This will be rsync’d six times a day, since it will contain all newly downloaded music, new writing edits, new whatever. I do this through a cron job that mounts the Windows folders, rsyncs the files, and then unmounts the folders.

This is done by sharing the like folders on the Windows box, mounting them via:
mount -t cifs -o credentials=/path/to/credentials/file,iocharset=utf8 //winpc/share /cifs/share

-t is to specify file system type.
-o means that there are a list of options to follow.
credentials is so that I can automate this, and have the credentials sent along with the mount command without having to add them in there in a way that I can never get to work, especially on Windows 7. If this were going across the internet, credentials would be replaced with whatever it is that allows you to use an encrypted file instead of plain text. The plain text file that credentials uses is like this:

username=<username>
password=<password>

iocharset is set because there are files with special characters, and when copying local files to the samba share, the special characters were translated incorrectly and became weird ASCII characters when viewed anywhere. On the Ubuntu box they were odd, and on Windows they were lines and squiggles. After setting that, the files were readable by Windows.

I then run the following rsync command for each of the shares, against the directory that is being synced:
rsync -ru --delete-after --iconv=UTF8,UTF8 /local/path/to/data/on/linux/box /cifs/share

-r is for recursive so that it travels into all sub directories.
-u is for update, so that only the files that have a more recent timestamp on the reciever.
--delete-after means that instead of deleting files as it goes, it does them all in a batch at the end. I found that when there were a lot of local copies of stuff that had to be deleted, it caused headaches and slowdowns. For what reason, I’m not sure, but it did.
--iconv=UTF8,UTF8 was for the same reason as the above Samba command. I’m not sure if this is needed, because I tried both at the same time, but copying worked properly after adding that, so I’m leaving it for now.

Prior to getting that all working, I ran the same rsync command, but with the option --dry-run, to make sure that it didn’t copy to the wrong directory, and that there weren’t other issues. One issue that kept recurring until I found out about the -i switch, which gives explanation for everything it’s doing, was that the same 100 or 200 files were continually being copied over, and subsequently deleted. Windows, ignores case differences in file names, Linux does not. So, the files that were in /T/tragedy/ were being copied to the same directory as /T/Tragedy/. Rsync, of course, saw the former, in the latter, and when looking at the sending files, did not see them in the receiving, so copied everything in /T/tragedy to Windows’/T/Tragedy, and the reverse, and then deleted half of the files it just copied because of that. So, off to the Linux box, move a few files around, and done.

I still have to install and configure Cygwin, so that I can make changes to files locally, and then run a similar cron job to copy the new files over to the Linux box. I’ll also copy game saves, and whatever assortment of downloaded crap I’ve picked up in the day to be copied over as well.

  1. A once every three months updated cold copy of everything on the Ubuntu server. Stored on a single 2TB hdd, that will be spun up for the copy, then turned off, and moved to somewhere that is not my house.
  2. Possibly, a big backup box, something like: THIS, with two 2TB drives in a simple mirror backing up everything every other day.

So all in all, I’m mental, and have stored on various hard drives:
2 live copies (two separate RAID arrays on the Ubuntu box) of everything but my OS’
3 copies (array and Windows box) of my music and important files that are backed up six times a day.
3 one to two day old copies of everything on the Ubuntu box on a separate power bar, and if I can get away with it, separate outlet.
One offsite copy that is up to three months old of everything on the Ubuntu box.