Monitoring Software RAID1 with mdadm
Don't die on me!
So you've created your RAID1 with two drives using mdadm. Now, you need to monitor it. How?
With mdadm, of course!
Run the following and change the obvious to your own email address.
[root@localhost ~]# mdadm --monitor /dev/md1 --email@example.com -f -t
This will also send a test email message with the arrays current status.
Test the notification with a degraded array
The fun part. Let's mark one of the disks as being bad and remove it from the array.
[root@localhost ~]# mdadm --manage /dev/md1 --fail /dev/sdc1 mdadm: set /dev/sdc1 faulty in /dev/md1
You should receive an email with something similar to the following:
This is an automatically generated mail message from mdadm running on localhost.localdomain A Fail event had been detected on md device /dev/md1. It could be related to component device /dev/sdc1. Faithfully yours, etc. P.S. The /proc/mdstat file currently contains the following: Personalities : [raid1] md1 : active raid1 sdc1(F) sdb1 8381760 blocks super 1.2 [2/1] [U_]
From here, you can investigate and replace or repair as needed.
"Undo" your test and re-add /dev/sdc1 to the RAID
You will first have to remove the drive from the array:
[root@localhost ~]# mdadm --manage /dev/md1 --remove /dev/sdc1 mdadm: hot removed /dev/sdc1 from /dev/md1
Then, re-add it.
[root@localhost ~]# mdadm --manage /dev/md1 --add /dev/sdc1 mdadm: added /dev/sdc1
Courtesy of /proc/mdstat, you will see the RAID recovering:
[root@localhost ~]# cat /proc/mdstat Personalities : [raid1] md1 : active raid1 sdc1 sdb1 8381760 blocks super 1.2 [2/1] [U_] [===============>.....] recovery = 76.3% (6402048/8381760) finish=0.1min speed=200064K/sec unused devices: