Monitoring Software RAID1 with mdadm

Don't die on me!

So you've created your RAID1 with two drives using mdadm.  Now, you need to monitor it.  How?

With mdadm, of course!

Run the following and change the obvious to your own email address.

[root@localhost ~]# mdadm --monitor /dev/md1 --mail=your.email@address.com -f -t

This will also send a test email message with the arrays current status.

Test the notification with a degraded array

The fun part. Let's mark one of the disks as being bad and remove it from the array.

[root@localhost ~]# mdadm --manage /dev/md1 --fail /dev/sdc1
mdadm: set /dev/sdc1 faulty in /dev/md1

You should receive an email with something similar to the following:

This is an automatically generated mail message from mdadm
running on localhost.localdomain

A Fail event had been detected on md device /dev/md1.

It could be related to component device /dev/sdc1.

Faithfully yours, etc.

P.S. The /proc/mdstat file currently contains the following:

Personalities : [raid1]
md1 : active raid1 sdc1[2](F) sdb1[0]
      8381760 blocks super 1.2 [2/1] [U_]

From here, you can investigate and replace or repair as needed.

"Undo" your test and re-add /dev/sdc1 to the RAID

You will first have to remove the drive from the array:

[root@localhost ~]# mdadm --manage /dev/md1 --remove /dev/sdc1
mdadm: hot removed /dev/sdc1 from /dev/md1

Then, re-add it.

[root@localhost ~]# mdadm --manage /dev/md1 --add /dev/sdc1
mdadm: added /dev/sdc1

Courtesy of /proc/mdstat, you will see the RAID recovering:

[root@localhost ~]# cat /proc/mdstat
Personalities : [raid1]
md1 : active raid1 sdc1[2] sdb1[0]
      8381760 blocks super 1.2 [2/1] [U_]
      [===============>.....]  recovery = 76.3% (6402048/8381760) finish=0.1min speed=200064K/sec

unused devices: 

You may also like...

1 Response

  1. January 31, 2016

    […] Let's move on to monitoring the RAID array. […]

Leave a Reply

Your email address will not be published. Required fields are marked *