Re: 5.2R: panic (syncer) on IBM x345 (SMP and Vinum)

From: Don Lewis <truckman_at_FreeBSD.org>
Date: Mon, 19 Jan 2004 03:39:13 -0800 (PST)
On 19 Jan, Matti Saarinen wrote:
> 
> I've been able to crash a server (usenet news server) running 5.2R.
> The crash happens with and without ACPI. The attached info is with
> ACPI enabled. I would be very pleased if someone could tell me why the
> box crashed and how to prevent it from happening. I tried searching
> the list archives and googling wihout any positive result.
> 
> The hardware is IBM x345 with two CPUs (Pentium4), internal LSI
> SCSI/RAID controller and external IBM SCSI controller (which is really
> Adaptec SCSI Card 29320LP). There is IBM ESX400 disk array connected
> to the Adaptec controller. All the disks are U320 disk.
> 
> The root filesystem is mirrored with the LSI adapter (which only
> supports mirroring of two drives). There are three other mirrored
> filesystems created with vinum. On all file systems except root, I've
> enabled soft updates. I've tested all the filesystems (mirrored root, 
> vinum mirrors and filesystems created on single disks) with bonnie++
> and iozone and the server has behaved well. 

> (da0:ahd0:0:0:0): Retrying Command
> (da0:ahd0:0:0:0): Queue Full
> (da0:ahd0:0:0:0): tagged openings now 128
> (da0:ahd0:0:0:0): Retrying Command


Try using the camcontrol modepage command to turn off write caching on
each of the drives (set the WCE bit to 0).  This should eliminate the
need for the driver to crank down the number of tagged openings. Less
stress on the error recovery code may keep the bug from being triggered.


> Fatal trap 12: page fault while in kernel mode
> cpuid = 0; apic id = 00
> fault virtual address   = 0x0
> fault code              = supervisor write, page not present
> instruction pointer     = 0x8:0xc07bcafe
> stack pointer           = 0x10:0xe7b96784
> frame pointer           = 0x10:0xe7b967c0
> code segment            = base 0x0, limit 0xfffff, type 0x1b
>                         = DPL 0, pres 1, def32 1, gran 1
> processor eflags        = interrupt enabled, resume, IOPL = 0
> current process         = 79 (syncer)
> 
> 
> 
> Attached below are the verbose boot logs from the server and the
> kernel debugger output.

> trap_fatal(e7b96744,0,c0837ed0,2cd,cafe9500) at trap_fatal+0x326
> trap_pfault(e7b96744,0,0,1ea30e7,0) at trap_pfault+0x1c2
> trap(e7b90018,10,e7b90010,0,d9a46000) at trap+0x2fd
> calltrap() at calltrap+0x5
> --- trap 0xc, eip = 0xc07bcafe, esp = 0xe7b96784, ebp = 0xe7b967c0 ---
> generic_bcopy(d78de930,0,d78de930,e7b967e4,c06590e1) at generic_bcopy+0x1a
> vinumstrategy(d78de930,cafe9500,e7b9680c,c05da937,d78de930) at vinumstrategy+0xa6
> dev_strategy(d78de930,0,2ee,1,c077dc95) at dev_strategy+0x41
> spec_xstrategy(cb6d071c,d78de930,e7b96828,c05d9c38,e7b96854) at spec_xstrategy+0x1d7

Looks like vinum is passing a NULL pointer to bcopy.
Received on Mon Jan 19 2004 - 02:39:24 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:37:38 UTC