This post originated from an RSS feed registered with Python Buzz
by Aaron Brady.
Original Post: How I Backup
Feed Title: insom.me.uk
Feed URL: http://feeds2.feedburner.com/insommeuk
Feed Description: Posts related to using Python. Some tricks and tips, observations, hacks, and the Brand New Things.
I really hate losing things, and I am a little obsessive about my photos in
particular. I’m a pack-rat - I keep a lot of photos because they increase in
value over time. That random test shot with a new camera shows how our house
used to look (remember how the plaster was coming off that wall?) or a
throwaway shot that’s the only picture of one of the kids with that haircut.
I’m also still reeling from losing almost all of the photos of Jack from before
he was two. There was a partition table incident on Helen’s computer. Those
photos were digital and are now gone. Friends came to our aid and sent us back
photos that we had emailed them, but it’s a tiny fraction of that two years.
So, that’s why I backup. This is how:
iPhone Photos
I run BitTorrent Sync on my iPhone and this syncs my camera roll with my
desktop and HP Microserver at home and my Macbook at work - you need to remember to
launch it in order to sync: it won’t do it in the background (iOS won’t let it).
Assuming you can handle tapping on the icon every few days, you’ll be
covered. It’s also the most convenient way to get your photos off your phone
without a Lightning cable.
I import these into Lightroom, which I have decided to entrust with all
of my photos. This means that there’s now two copies on my desktop, on separate
drives, and that iPhone photos enter into the flow that my “proper” camera
photos have.
Camera Photos
I do the usual things with Lightroom: I run the scheduled backups of the
meta-data in case of massive corruption. Meta-data is data too, after all.
This isn’t that big a deal for me because I’m not actually very organised and
I rely on the EXIF data in the original files for most of the meta-data - and
that is easily reconstructed if something terrible happens.
At the end of every month, I copy that month’s folder out of Lightroom over
to my Microserver, where I import it into my git-annex
repository.
git-annex gives you all the power of Haskell with the ease of use of git.
(ha ha). What it actually does is really cleverly play to git’s strengths for
managing meta-data, like what a file is called, where it’s stored and what its
hash is, while filling in things it’s not good at, like supporting large files,
encryption and supporting things that don’t have git installed.
Git Annex
Aside from photos, some other things which are large or that I want to keep go into
git-annex - historical backups of my blog, email archives and my entire iTunes
library are in there. I don’t necessarily want all of these things on every
computer that I own, but git-annex lets me be selective about which files I
have on which computer.
If I need an MP3 from my iTunes library on my Macbook, for example, I can ask
git-annex where it’s stored and then ask it to connect to any of those places
and get it for me. Once it’s done, it’ll update it’s records to say that my
Macbook also has a copy.
It will let me remove my local copy of files if I’m low on space, but won’t
let me accidentally delete the last copy of my file (more accurately, it will
let me specify a minimum number of copies that are in circulation - one is
rarely enough for the paranoid).
(It has an interface only a mother could love, but there’s a UI called
git-annex Assistant. I prefer the control the CLI gives me).
beta: my Manchester virtual machine keeps most small data, but doesn’t have
the iTunes library, family videos or my photo collection.
play: my Microserver has a copy of absolutely everything.
bob: Bob the Server at work has a GPG encrypted copy of everything. It
doesn’t even run git-annex - it’s an rsync special remote.
macbook: my Macbook ebbs and flows, as it should. It’s often where large
assets (like edited video) are created - then I add them to git-annex, make
sure they are distributed around my remotes, and drop my local copy, giving
me back SSD space. It has some, but not many, albums from iTunes.
iwebftp: I wrote an iWeb FTP special-remote - this was very, very
easy and I happen to know that iWeb are keeping at least three copies of
every file on iWeb FTP. This remote is encrypted, even though the transfer
is encrypted and the data is stored encrypted at rest. Paranoia, right?
(If you’re not keeping count, that means that so far any photo I take is
in seven places, eight if it’s an iPhone photo).
rsync from Manchester
beta is periodically backed up to my house. Nothing fancy.
rsync on the Microserver
Everything on the Microserver is periodically backed up to other disks in the
same server. Rather than RAID (which I don’t need, as my SLA with my wife and
kids is “no nines”), I use three disks in my Microserver as an original and two
copies.
Everything good on the Microserver runs inside an LXC container, which is
stored in an LVM logical volume, and backed up from outside of the container. The
minimal Debian install for the host runs off a USB pen and changes infrequently.
I have a second USB pen with this install on it, in case the flash dies.
Because beta is downloaded to the Microserver, it is now present on four
disks total. Because play is a container on Microserver, its git-annex
install is also copied to another two disks. (10 copies of that iPhone
photo, now).
Windows
For stuff that I don’t even think to put into git-annex, but that I’d kick
myself for losing, I back my Users and Music folders up on my desktop.
They go to an external Western Digital MyBook via the truly
excellent Bvckup 2. Bvckup handles things like differential updates
and bypassing file locking (with Volume Shadow Copies) while still being
a beautiful application and not just porting rsync to Windows.
While I have the space, I also back this up to a Samba share on play.
You’re probably getting the picture by now, but because that Samba
share is on the Microserver, it’s going to a further two disks, too.
This backup includes the BitTorrent Sync folders and the Lightroom folders
because - well, why not?
OSX
Finally, there’s my Macbook. I really strive to keep everything important off
of my Mac. All of my code lives in git, I push regularly, my documents are
in BitTorrent Sync or Google Drive, my bookmarks are in the cloud and my
email is in GMail. I’m reasonably confident that if my Macbook took an
unexpected bath, I would not lose more than a half-a-day’s work.
But confidence isn’t control, so I use Bombich Software’s also-excellent
Carbon Copy Cloner to create a bootable copy of my whole Macbook.
I can plug this USB drive into a brand-new-in-box Mac and be up and running
with my entire environment in minutes. It also covers me in case I suddenly
realise I created an important file and just left it in my home folder.
Needless to say this also includes another copy of my BitTorrent Sync folder
and my git-annex remote for the Macbook.
Downsides and Weaknesses
Nothing is perfect. I’m happy with this backup situation, but it has some
shortcomings and down sides.
If I do want to purge a file, it’s remarkably difficult to make sure I have
gotten all of the copies. This hasn’t really happened yet, but if you want
a laugh at other people’s expense you should read about amateur pornographers
not being able to delete shots from their Apple Photo Stream.
(“I took a picture of my wife in lingerie and my in laws saw. Turning off the stream.”).
There are several manual steps. I don’t leave the MyBook connected to my PC
(it’s in a safe) - so I have to remember to do all of this. The Lightroom
end-of-month copy to git-annex is the most annoying.
Despite there being up to 20 copies of some photos, they’re actually only
split among three real locations: home, work, Manchester. It’s easy to
be overconfident. Lower priority data only exists at home, though on
several machines. It would be lost to a fire, for example.
Apart from the built-in delay of requiring manual steps, there’s no provision
for keeping historical copies. If you find out that you corrupted a file
six months ago, you will not be able to get it back. You’ll just have a
dozen copies of the corrupted version.
Obviously, you require a lot of storage to keep all of these copies of things.
BitTorrent Sync and Bvckup 2 are proprietary software. This is an issue for some
people, and to a limited degree, it is for me. That said, I prefer a proprietary
piece of software to a cloud service (cough Dropbox), and Bvckup keeps my
data in an open format (its original format, really), so I’m okay with it.