As I reported on the "Vinum status" thread a few days ago, gvinum is not very graceful when a disc disappears/dies in a RAID-5 array during operation. The machine I was testing this on was an SMP machine, but when I recompiled the kernel with ddb/kdb support I removed "option SMP" just to take that out of the equation. The system: ----------- ASUS P2B-DS mobo, 2 x P3/700, 1GB ECC RAM, latest BIOS (1014, beta 003) Some old 9GB ATA disc for the OS on the onboard 440BX UDMA33 chipset 4 x 120GB Maxtor SATA discs on a HighPoint RocketRAID 1540 (HPT374 chipset) The 4 120GB discs were put together in a RAID-5 array using "classic" vinum (so they have overlapping slices and VINUM slicetypes). Kernel/userland dated: 2004.08.10.21.00.00 (no CFLAGS/COPTFLAGS, but I use "CPUTYPE?=p3") gvinum started from /boot/loader.conf (geom_vinum_load="YES") The crash: ---------- Stop all non-essential processes (to protect against unnecessary file corruption, normally I've got Apache+PHP4+MySQL running among other things). Set up one process to write to the RAID-5 array (I have used a simple sftp to another machine, pulling down big files). Pull one of the SATA-cables. *boom* I have a vmcore dump and a kernel.debug, but I can't seem to get gdb53 to do what I want (not very familiar with gdb), so here's the output I took down on paper from within ddb. The first 5 lines are exactly what I would expect to happen (and what "classic" vinum also did), but then something goes wrong and the machine page-faults: ad8: TIMEOUT - READ_DMA retrying (2 retries left) LBA = 196849498 ad8: WARNING - removed from configuration gvinum: lost drive 'vinumdrive2' FOO: sd raid5.p0.s2 is down FOO: plex raid5.p0 is degraded Fatal trap 12: page fault while in kernel mode fault virtual address = 0x64 fault code = supervisor read, page not present instruction pointer = 0x8:0xc08580fe stack pointer = 0x10:0xdcf9dc00 frame pointer = 0x10:0xdcf9dc20 code segment = base 0x0, limit 0xfffff, type 0x1b DPL 0, pres 1, def32 1, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 2 (g_event) [thread 100038] Stopped at gv_drive_access+0x1e : movl 0x64(%eax),%ecx db> where gv_drive_access(c2608b00,ffffffff,ffffffff,0,0) at gv_drive_access+0x1e g_access(c260a8c0,ffffffff,ffffffff,0,c260a900) at g_access+0x16b gv_plex_orphan(c260a8c0, .......) g_orphan_register one_event g_run_events g_event_procbody fork_exit fork_trampoline --- trap 0x1, eip = 0, esp = 0xdcf9dd7c, ebp = 0 --- Attached is the kernel config file and the dmesg. Let me know if there is anything you want me to do with gdb to further track this down. I could probably even arrange for a guest account on this machine if someone wants to take a closer look at the vmcore file. /Daniel Eriksson
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:38:05 UTC