Protecting Data with mdadm

Published on September 1, 2012 by in UNIX/Linux

0

It’s all about the data, specifically your data.  You’ve worked hard to create it and it drives your business; you just can’t afford to have it corrupted, lost, or stolen.

The  protection of data takes different forms:   physical and digital security, encryption,  backups, etc…    We have protection at the perimeter with firewalls, and tough password policies that require your users to at least use passwords that won’t be cracked within 60 seconds.   But protecting data at the source is important too.

Last month I wrote about HP’s product MirrorUX/Disk and how I’ve used it to protect the HPUX operating system and Oracle data.  This article is about protecting data on a Linux server using software RAID (mdadm).    I’ve used software RAID for our DNS servers for  two years now without any issues. It’s nice when software “just works” as designed.

Below are the steps I used to pro-actively replace one disk (sdb) of a RAID1 set when it starting generating errors in the system log file.   The disk hadn’t failed yet but I didn’t see any reason to wait for it to fail – who needs that headache.

Step one was to collect information about the RAID status and the below commands did just that.  By looking at /proc/mdstat, I was able to see the devices within the RAID (sda and sdb) and the “[UU]” let me know both disks were still in use.  I also knew there were 6 software RAID devices  (md0, md1, md2, md3, md4, and md5) and the underlying disk partitions they were using.

# cat /proc/mdstat

Personalities : [raid0] [raid1] [raid6] [raid5] [raid4]
md5 : active raid1 sda7[0] sdb7[1]
26217976 blocks super 1.0 [2/2] [UU]
bitmap: 0/201 pages [0KB], 64KB chunk

md3 : active raid1 sda5[0] sdb5[1]
20972752 blocks super 1.0 [2/2] [UU]
bitmap: 2/161 pages [8KB], 64KB chunk

md2 : active raid1 sda3[0] sdb3[1]
10482340 blocks super 1.0 [2/2] [UU]
bitmap: 4/160 pages [16KB], 32KB chunk

md1 : active (auto-read-only) raid1 sda2[0] sdb2[1]
4192952 blocks super 1.0 [2/2] [UU]
bitmap: 0/8 pages [0KB], 256KB chunk

md0 : active raid1 sda1[0] sdb1[1]
208800 blocks super 1.0 [2/2] [UU]
bitmap: 0/7 pages [0KB], 16KB chunk

md4 : active raid1 sda6[0] sdb6[1]
52428024 blocks super 1.0 [2/2] [UU]
bitmap: 0/200 pages [0KB], 128KB chunk

I knew I needed to partition the new disk exactly like the old disk so I  recorded the partition configuration of sdb using the fdisk command:

# /sbin/fdisk -l /dev/sdb

Disk /dev/sdb: 250.0 GB, 250000000000 bytes
255 heads, 63 sectors/track, 30394 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk identifier: 0xb4a14953

Device Boot      Start         End      Blocks   Id  System
/dev/sdb1   *           1          26      208813+  fd  Linux raid autodetect
/dev/sdb2              27         548     4192965   fd  Linux raid autodetect
/dev/sdb3             549        1853    10482412+  fd  Linux raid autodetect
/dev/sdb4            1854       30394   229255582+   f  W95 Ext’d (LBA)
/dev/sdb5            1854        4464    20972826   fd  Linux raid autodetect
/dev/sdb6            4465       10991    52428096   fd  Linux raid autodetect
/dev/sdb7           10992       14255    26218048+  fd  Linux raid autodetect

The mdadm command with the “–detail” parameter is great for getting information about the raid set. While I collected information on each raid device, only information on md0 is shown below to give you an idea of what the output looks like:

# /sbin/mdadm –detail /dev/md0

/dev/md0:
Version : 1.00
Creation Time : Tue Mar 30 14:30:19 2010
Raid Level : raid1
Array Size : 208800 (203.94 MiB 213.81 MB)
Used Dev Size : 208800 (203.94 MiB 213.81 MB)
Raid Devices : 2
Total Devices : 2
Persistence : Superblock is persistent

Intent Bitmap : Internal

Update Time : Mon Oct 17 23:16:38 2011
State : active
Active Devices : 2
Working Devices : 2
Failed Devices : 0
Spare Devices : 0

Name : linux:0
UUID : 22aa50b3:50c19701:352a3a66:1a4c2b19
Events : 2093

Number   Major   Minor   RaidDevice State
0       8        1        0      active sync   /dev/sda1
1       8       17        1      active sync   /dev/sdb1

After collecting disk information, the next step is to determine which disk is actually sdb.  I’ve learned when replacing a disk in a RAID1 set, it’s always best to pull the bad  disk, not the good disk.  Since the disk has not failed yet, I couldn’t use the LED method of identifying the disk because they were all green.  My method was to use the “dd” command and perform a lot of reads from the disk and watch the access LED to see which one lit up and stayed lit.  The below command worked perfectly.  If the access LED does not stay on long enough for you to identify the disk, change the “count=100″ parameter and start adding zeros until you’re satisfied you’ve identified the correct disk to be replaced; you can use Ctl-C to cancel the command.

# dd if=/dev/sdb of=/dev/null bs=1024 count=100

Once I was 100% sure which disk was sdb, I  manually put each disk partition for sdb into a failed state.  If the disk had already failed on it’s own, this step would not be required.

/sbin/mdadm -f /dev/sdb1
/sbin/mdadm -f /dev/sdb2
/sbin/mdadm -f /dev/sdb3
/sbin/mdadm -f /dev/sdb4
/sbin/mdadm -f /dev/sdb5
/sbin/mdadm -f /dev/sdb6
/sbin/mdadm -f /dev/sdb7

The last step prior to swapping out the bad disk was to remove the sdb partitions from the RAID set:

# /sbin/mdadm -r /dev/sdb1
/sbin/mdadm -r /dev/sdb2
/sbin/mdadm -r /dev/sdb3
/sbin/mdadm -r /dev/sdb4
/sbin/mdadm -r /dev/sdb5
/sbin/mdadm -r /dev/sdb6
/sbin/mdadm -r /dev/sdb7

After swapping in the new disk, the first step required was to partition the new disk exactly the same as the other disk in the RAID1 set (sda).  I could have done this task manually by using the command “fdisk /dev/sdb” and setting up the individual partitions according to the information I collected at the beginning of the process but why go through all the manual steps when using redirection will get the job done in one command.  The below sfdisk command reads the partition table of the sda device and the result are passed  as input into  a second sfdisk command to create the partitions on sdb – it’s like magic I tell you, magic.

# /sbin/sfdisk -d /dev/sda |  /sbin/sfdisk /dev/sdb

The last step in the process was to add the sdb partitions back into the RAID sets:

/sbin/mdadm /dev/md0 -a /dev/sdb1
/sbin/mdadm /dev/md1 -a /dev/sdb2
/sbin/mdadm /dev/md3 -a /dev/sdb5
/sbin/mdadm /dev/md4 -a /dev/sdb6
/sbin/mdadm /dev/md5 -a /dev/sdb7

A final status check using “cat /proc/mdstat” returns a healthy, two-device software RAID1 set back in business.

Leave a Reply

You must be logged in to post a comment.