5.1-rel deleted it's own MBR

From: Harald Schmalzbauer <h_at_schmalzbauer.de>
Date: Thu, 18 Sep 2003 04:25:08 +0200
Hi all,

big mysterious bug is lingering somwhere. (Machine: C3, 256MB, 2x 30GB 2,5" 
IDE, SIL0680 controller)
One of my drives failed with the following recovered from messages:

Sep 16 01:47:44 tek kernel: ad4: WRITE command timeout tag=0 serv=0 - 
resetting
Sep 16 01:47:45 tek kernel: ata2: resetting devices ..
Sep 16 01:47:45 tek kernel: ad4: removed from configuration
Sep 16 01:47:45 tek kernel: ar0: WARNING - mirror lost
Sep 16 01:47:45 tek kernel: ad4: deleted from ar0 disk0
Sep 16 01:47:45 tek kernel: done


This was at 1:47 but the machine ran until about 5:30. Then it died (no 
message!)
When I tried to reboot, BIOS complained about missing MBR. And indeed, when I 
opened the server and connected the drives to another box, BOTH drives had no 
partition table!!!!
I got a correct bsdlabel from both, ad6 and ad6s1.
How can this happen?
Bug in ata?
Bug in GEOM?
Nobody was loged in and also nobody can log in so the machine deleted it. 
That's really sure!

My fix was to use the fixit CD and wrote a new one with:

fdisk -I -B -b /boot/boot1 ar0
fdisk -u ar0 (to change the starting sector from 63 to 0)

fsck found a few errors but the server is up and running again.

Søren: I remember you're planning better RAID management support. Will it be 
possible to control the ar0 by the controller's BIOS in the future?
When I rebuilt the array with the BIOS (which took 6 hours!) FreeBSD still 
reported a degraded RAID1! This was really annoying

Thanks,

-Harry

Received on Wed Sep 17 2003 - 17:25:18 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:37:22 UTC