I don't know much about the inner workings of Linux software RAID, but I'm wondering if some ID got borked that makes them look like halves of two separate arrays.<div><br></div><div><div>[root@bob ~]# mdadm --detail /dev/md0 | grep UUID</div>
<div> UUID : f84b1e9d:5cac2742:d382826c:eabfdbf8</div><div>[root@bob ~]# mdadm --query --examine /dev/sdb1 | egrep '(Magic|UUID)'</div><div> Magic : a92b4efc</div><div> UUID : f84b1e9d:5cac2742:d382826c:eabfdbf8</div>
<div>[root@bob ~]# mdadm --query --examine /dev/sda2 | egrep '(Magic|UUID)'</div><div> Magic : a92b4efc</div><div> UUID : f84b1e9d:5cac2742:d382826c:eabfdbf8</div><div><br></div><div>Just poking around the mdadm command doesn't show anything specific to a single device. My guess would be that there's some algorithm that reconstructs the array based on what's found on the controllers. So a device belongs to the same RAID set as long as they have the same UUID (which is also repeated in /etc/mdadm.conf on my system). Then it would look at device specific metadata to figure out the sync status. Browsing the source to dm-raid1.c and some other files shows there's a notion of a primary device in the RAID set and some sync tables.</div>
<div><br></div><div>Sean</div><br><div class="gmail_quote">On Mon, May 14, 2012 at 2:55 PM, Trevor Cordes <span dir="ltr"><<a href="mailto:trevor@tecnopolis.ca" target="_blank">trevor@tecnopolis.ca</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">I love linux software ("md") raid. I use md raid1 on a zillion systems.<br>
I never has issues. Until today...<br>
<br>
I get a call that a customer has lost all their emails for about a month<br>
and their apps' data appears to be old and/or missing. Strange.<br>
<br>
I login to the linux server and see:<br>
<br>
cat /proc/mdstat<br>
Personalities : [raid1]<br>
md122 : active raid1 sda1[0]<br>
409536 blocks [2/1] [U_]<br>
<br>
md123 : active raid1 sda2[0]<br>
5242816 blocks [2/1] [U_]<br>
<br>
md124 : active raid1 sda3[0]<br>
1939865536 blocks [2/1] [U_]<br>
<br>
md125 : active raid1 sdb1[1]<br>
409536 blocks [2/1] [_U]<br>
<br>
md126 : active raid1 sdb2[1]<br>
5242816 blocks [2/1] [_U]<br>
<br>
md127 : active raid1 sdb3[1]<br>
1939865536 blocks [2/1] [_U]<br>
<br>
<br>
That's not correct. These systems should have 3 partitions, not 6. Ah,<br>
md has done some really goofball things with this pathological case. It's<br>
separated the raid into duplicates and assembled each separately! Woah!<br>
<br>
They said they had a accidental reboot today (kid hitting reset button).<br>
And it booted/rooted off the wrong schizo set (sda).<br>
<br>
There appears to have been a drive failure/kick a month ago:<br>
Apr 4 10:10:32 firewall kernel: [1443781.218260] md/raid1:md127: Disk failure on sda3, disabling device.<br>
Apr 4 10:10:32 firewall kernel: [1443781.218262] <1>md/raid1:md127: Operation continuing on 1 devices.<br>
<br>
And it hadn't rebooted since then, before today.<br>
<br>
It gets stranger... I rebooted the system trying to test a few recovery<br>
ideas (offsite) out. On the next reboot it came up using the good/current<br>
sdb drive for boot/root! Huh? It's like it's picking which one to use at<br>
random! It still shows 6 md arrays, but it's using the properly 3 this<br>
time.<br>
<br>
So is all this a bug?<br>
<br>
1. Shouldn't the system have marked the sda as failed/bad PERMANENTLY so<br>
on next reboot it would ignore it. OK, I can understand that if it<br>
thought the whole drive was bad, it wouldn't be able to write to the sda<br>
superblock to survive the reboot. But couldn't it have written the info<br>
to sdb's superblock? If a system can't remember what has failed, then I<br>
don't see how this behaviour can be avoided.<br>
<br>
2. Why did linux md bring up both sets of arrays? It can see they are the<br>
same array. Why on earth would it ever split them? That seems majorly<br>
screwy to me.<br>
<br>
<br>
Still, thank God it didn't try to start syncing the stale set to the good<br>
set! We had backups, but it's a pain to recover. In the end, just<br>
rebooting until luck gives us the current set was all it took. I'll head<br>
on-site to replace the bad disk and do a proper resync.<br>
<br>
I have had hardware RAID systems (ARAID99) in this exact situation go into<br>
a schizo state where the disks were unsynched yet both were being used for<br>
writes! The problems always seem to revolve around a disk going "soft"<br>
bad and then coming alive after reboot.<br>
_______________________________________________<br>
Roundtable mailing list<br>
<a href="mailto:Roundtable@muug.mb.ca">Roundtable@muug.mb.ca</a><br>
<a href="http://www.muug.mb.ca/mailman/listinfo/roundtable" target="_blank">http://www.muug.mb.ca/mailman/listinfo/roundtable</a><br>
</blockquote></div><br><br clear="all"><div><br></div>-- <br>Sean Walberg <<a href="mailto:sean@ertw.com" target="_blank">sean@ertw.com</a>> <a href="http://ertw.com/" target="_blank">http://ertw.com/</a><br>
</div>