What’s in your tool bag?

Published on May 29, 2012 by in UNIX/Linux

0

You have a tool bag right?

If you are like me and spend time troubleshooting IT issues throughout the day (and night), you probably have accumulated a few programs and scripts that you have a preference for and  keep close at hand.  In this article, I’ll talk about a tool I keep close at hand and how I recently had a couple of occasions to use it in different ways.

I help maintain our Linux external DNS servers and it recently became time for me to apply patches to bring them up to date.  The were no kernel updates required so I could have just applied the patches to all three servers and quickly be done with it; but, one should always think about a back out plan.   And if you have ever been burnt by a project going South on you, a back out plan is forever at the forefront of your brain.

So having been burnt a time or two in my career, my rollback plan involved making a backup of the three servers prior to installing the updates and I chose to make cold backups.  Why a cold backup instead of using our normal company backup solution?  Well, if things go just a little wrong, like a few files needing restored, it would be no big deal to replace them and cold backups really wouldn’t be needed.  But, what if that squirrel that’s been playing outside my building gets hungry, decides to go looking for acorns, and takes the scenic route to the acorn tree by climbing the closest telephone pole and scooting as fast as he can across a power line to the next pole where normally, he would be able to stop  on a dime and maneuver around any dangers but unfortunately for him this time, it had just rained and it was a little slick up there causing the poor little thing to run right into a ground wire and BANG! (What? You don’t think it happens?   “Squirrels that fry themselves on power lines and transformers cause tens of thousands of blackouts every year.” http://www.usatoday.com/news/nation/2007-03-11-suicide-squirrels_N.htm)

It “could” happen to you.  Imagine power has been cut to your building and just before your servers went quiet, you heard the hard drives make that noise you never want hear.  You know, that screechy, high-pitched, whining noise like metal on metal.

- At a time like this, it’s good to have cold backups. -

I always plan for the worst and hope for the best so I did the cold backups which provided me the first occasion to use the tool I keep in my laptop bag called SystemRescueCD (http://www.sysresccd.org/SystemRescueCd_Homepage).  It’s a Linux bootable CD (or USB) which provides numerous programs to help you to recover your system after a crash as well as help you with other administrative tasks – including backups.  I chose to use a program on the CD called “FSArchiver”  which has worked well in the past for me (http://www.fsarchiver.org/Main_Page).

Per the fsarchiver web site, it “saves the contents of a file-system to a compressed archive file. The file-system can be restored on a partition which has a different size and it can be restored on a different file-system. Unlike tar/dar, FSArchiver also creates the file-system when it extracts the data to partitions. Everything is checksummed in the archive in order to protect the data”.

After booting my first server from the SystemRescueCD, the first thing I did was attach an external USB disk and mount it at /BACKUP.  I chose to use an external disk to store the backups but I could have easily mounted a network drive and used it as well.  The first fsarchiver command I ran was “fsarchiver probe simple” which listed all partitions/filesystems on the server allowing me to verify fsarchiver recognized the filesystems correctly.  I then used the below commands to perform the backups. For my three servers, I used more than the below commands but these represent each type of filesystem.

  • ext3:    fsarchiver  savefs  /BACKUP/051812/ns1_boot.fsa              /dev/sda1
  • LVM:   fsarchiver  savefs  /BACKUP/051812/ns1_root.fsa              /dev/mapper/system-root
  • RAID: fsarchiver  savefs  /BACKUP/051812/ns2_md0_boot.fsa    /dev/md122

As you can see, the format of the command is “what” you want to do, i.e., a backup (savefs) in this case, “where” to store the backup, and finally, “what” to backup.  After the backup has completed, you will see statistics including the number of files and directories backed up but more importantly, you will be able to see if there were any errors.  A few things to take note of for the LVM and RAID backups:

  1. I did not need to manually start LVM as it automatically started on bootup and recognized the logical volumes in my volume group.  A couple of quick LVM commands, “vgs” and “lvs“, verified this.
  2. I did not need to start software RAID as it was also automatically started at bootup and my configuration was recognized.
  3. The device files used for the software RAID backups did not use the normal numbering convention.  One might think the above command has a typo in it because I used /dev/md122 instead of /dev/md0 (which I actually tried and received an error) but I found I needed to use /dev/md122 for md0, /dev/md124 for md2, /dev/md125 for md3, and /dev/md126 for md4.  If it looks  like a hassle to make the connection between the normal naming convention and the existing naming convention, it actually wasn’t. I used a combination of looking at /proc/partitions and the device files for  /dev/md*  and, I may have even quickly mounted /dev/md122 temporarily to verify (sorry, it’s been a week since I performed these updates and this detail apparently is no longer retrievable from my memory).  The point is, it wasn’t hard.

After each backup, I verified the archive file using the below commands:

  • ext3:    fsarchiver  archinfo   /BACKUP/051812/ns1_boot.fsa 
  • LVM:   fsarchiver  archinfo   /BACKUP/051812/ns1_root.fsa 
  • RAID:  fsarchiver  archinfo  /BACKUP/051812/ns2_md0_boot.fsa

There are numerous options to the fsarchiver command such as excluding files and directories, setting the compression level, splitting an archive into multiple files, etc…  Restoring your data is just a matter of passing the “restfs” parameter to fsarchiver instead of savefs along with the destination, i.e., /dev/sda1.

There were a couple of other steps in my process.  Prior to shutting down each server I recorded the disk partition layouts using the command “fdisk -l” and the currently mounted disks using “df -h

In the end, it turned out I didn’t need the cold backups because the updates installed without any issues as I expected.  But if disaster had struck (squirrel or no squirrel), I was ready.

The second occasion for me to use SystemRescueCD came when I was in a customer’s data center a week ago preparing to update firmware in a few HPUX servers and needed to make a serial connection from my laptop (running openSUSE) to the MP (iLO) management console.  Normally I would make a connection to the MP console via the network but for security reasons, the network option had been disabled.  After discovering I hadn’t installed minicom on my laptop yet, the option of quickly installing it was not available as making a wireless connection to my mifi wasn’t working for some reason and connecting to the local wired network would have been a security violation.  There was no way my  “zypper install minicom” command was going to help me.  So I booted my laptop from the SystemRescueCD, fired up minicom, modified the appropriate comm and tty settings, and was on my way with the task at hand instead of wasting time trying to come up with a plan B.

SystemRescueCD has been a part of my tool bag for quite a while now and will continue to stay close at hand. Truth be told, it’s even helped out my co-worker more than a few times – something about wiping disks so Windows will recognize them again. I don’t know, I just give him the CD when he asks for it.

So give it a test drive and see what you think.  Maybe you’ll add it to your tool bag as well.

 

 

Leave a Reply

You must be logged in to post a comment.