I've been able to crash a server (usenet news server) running 5.2R. The crash happens with and without ACPI. The attached info is with ACPI enabled. I would be very pleased if someone could tell me why the box crashed and how to prevent it from happening. I tried searching the list archives and googling wihout any positive result. The hardware is IBM x345 with two CPUs (Pentium4), internal LSI SCSI/RAID controller and external IBM SCSI controller (which is really Adaptec SCSI Card 29320LP). There is IBM ESX400 disk array connected to the Adaptec controller. All the disks are U320 disk. The root filesystem is mirrored with the LSI adapter (which only supports mirroring of two drives). There are three other mirrored filesystems created with vinum. On all file systems except root, I've enabled soft updates. I've tested all the filesystems (mirrored root, vinum mirrors and filesystems created on single disks) with bonnie++ and iozone and the server has behaved well. The disk layout is following (da0-6 are connected to the Adaptec controller and the rest to the LSI controller): #df Filesystem 512-blocks Used Avail Capacity Mounted on /dev/da16s1a 2025948 150456 1713420 8% / devfs 2 2 0 100% /dev /dev/da16s1h 12172084 147124 11051196 1% /home /dev/da16s1g 4052060 17744 3710152 0% /tmp /dev/da16s1d 52808984 4025312 44558956 8% /usr /dev/da16s1e 40616796 172624 37194832 0% /news /dev/da16s1f 10154076 15368 9326384 0% /var /dev/vinum/news_db 138862504 284336 127469168 0% /news/db /dev/vinum/overview 138862504 168888 127584616 0% /overview /dev/vinum/fispool 138862504 125891528 1861976 99% /cnfs/fispool /dev/da2a 138862772 125891536 1862216 99% /cnfs/altspool /dev/da3a 138862772 125891528 1862224 99% /cnfs/altspool/bin1 /dev/da4a 138862772 125891528 1862224 99% /cnfs/altspool/bin2 /dev/da5a 138862772 125891528 1862224 99% /cnfs/therest/1 /dev/da6a 138862772 126916072 837680 99% /cnfs/therest/2 procfs 8 8 0 100% /proc #vinum l 6 drives: D vinumdrive3 State: up /dev/da21a A: 0/70006 MB (0%) D vinumdrive2 State: up /dev/da20a A: 0/70006 MB (0%) D vinumdrive1 State: up /dev/da19a A: 0/70006 MB (0%) D vinumdrive0 State: up /dev/da18a A: 0/70006 MB (0%) D vinumdrive5 State: up /dev/da1a A: 0/70006 MB (0%) D vinumdrive4 State: up /dev/da0a A: 0/70006 MB (0%) 3 volumes: V news_db State: up Plexes: 2 Size: 68 GB V overview State: up Plexes: 2 Size: 68 GB V fispool State: up Plexes: 2 Size: 68 GB 6 plexes: P news_db.p0 C State: up Subdisks: 1 Size: 68 GB P news_db.p1 C State: up Subdisks: 1 Size: 68 GB P overview.p0 C State: up Subdisks: 1 Size: 68 GB P overview.p1 C State: up Subdisks: 1 Size: 68 GB P fispool.p0 C State: up Subdisks: 1 Size: 68 GB P fispool.p1 C State: up Subdisks: 1 Size: 68 GB 6 subdisks: S news_db.p0.s0 State: up D: vinumdrive0 Size: 68 GB S news_db.p1.s0 State: up D: vinumdrive1 Size: 68 GB S overview.p0.s0 State: up D: vinumdrive2 Size: 68 GB S overview.p1.s0 State: up D: vinumdrive3 Size: 68 GB S fispool.p0.s0 State: up D: vinumdrive4 Size: 68 GB S fispool.p1.s0 State: up D: vinumdrive5 Size: 68 GB Now, I installed INN (cnfs + ovdb) and a test newfeed which puts stress mainly on /news/db, /overview and /cnfs/fispool. When I started the feed to the server, everething worked fine for a couple of minutes and the a crash. The logs show following: (da0:ahd0:0:0:0): Retrying Command (da0:ahd0:0:0:0): Queue Full (da0:ahd0:0:0:0): tagged openings now 128 (da0:ahd0:0:0:0): Retrying Command Fatal trap 12: page fault while in kernel mode cpuid = 0; apic id = 00 fault virtual address = 0x0 fault code = supervisor write, page not present instruction pointer = 0x8:0xc07bcafe stack pointer = 0x10:0xe7b96784 frame pointer = 0x10:0xe7b967c0 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, def32 1, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 79 (syncer) Attached below are the verbose boot logs from the server and the kernel debugger output. Cheers, -- - Matti -
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:37:38 UTC