Re: arcmsr crash

From: Areca lusa <lusa_at_areca.com.tw> Date: Mon, 16 Jul 2007 19:12:41 +0800 · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:14 UTC

Hi Dear ALL:
I am areca test team,sorry reply delay,I try this question , run dd 
if=/dev/zero of=/dev/da1 bs=1024 at FreeBSD-7.0-current version, It is 
normal.I don't see any error,I have attach file that is freebsd messages.but 
I used to one CPU,because I have one.Can you use one CPU test again,please?
MB:supermicro X7DB8
BIOS:05/29/07
RAID CARD:Arc 1220 F/w:1.43
create two RAID6 volume  attach four HDD
first volume --FreeBSD 7.0
second volume--is free space
If you have any new suggestions that reply.
thank you~

Best Regards,
Lusa Sue

Areca Technology Test Engineer
Tel : 886-2-87974060 Ext. 233
Fax : 886-2-87975970
Http://www.areca.com.tw

----- Original Message ----- 
From: "erich" <erich_at_areca.com.tw>
To: "(廣安科技)蘇莉嵐" <lusa_at_areca.com.tw>
Sent: Monday, July 16, 2007 4:36 PM
Subject: Fw: arcmsr crash

>
> ----- Original Message ----- 
> From: "Matt Reimer" <mattjreimer_at_gmail.com>
> To: "Scott Long" <scottl_at_samsco.org>
> Cc: "John Baldwin" <jhb_at_freebsd.org>; <freebsd-current_at_freebsd.org>;
> "erich" <erich_at_areca.com.tw>
> Sent: Saturday, July 14, 2007 4:46 AM
> Subject: Re: arcmsr crash
>
>
>> On 7/13/07, Scott Long <scottl_at_samsco.org> wrote:
>>> John Baldwin wrote:
>>> > On Tuesday 05 June 2007 05:22:38 pm Matt Reimer wrote:
>>> >> Once a week or so we're seeing a panic with a -current kernel built
>>> >> just before the gcc 4.2 import (maybe three weeks ago). The box has a
>>> >> Supermicro X7DBE/X7DBE+ motherboard with two Xeon 5160s, 16G RAM, and
>>> >> an Areca 1220 controller with eight 500G disks connected.
>>> >>
>>> >> Does this indicate that the arcmsr driver is at fault:
>>> >>
>>> >> Tracing command irq16: arcmsr0 pid 26 tid 100018 td
>>> >> 0xffffff040fc5b000
>>> >> cpustop_handler() at cpustop_handler+0x35
>>> >> ipi_nmi_handler() at ipi_nmi_handler+0x2e
>>> >> trap() at trap+0x365
>>> >> nmi_calltrap() at nmi_calltrap+0x8
>>> >> --- trap 0x13, rip = 0xffffffff8041ab11, rsp = 0xffffffffab59eff0,
>>> >> rbp
>>> >> = 0xffffffffac0a37d0 ---
>>> >> siocnclose() at siocnclose+0x21
>>> >> sio_cnputc() at sio_cnputc+0x89
>>> >> cnputc() at cnputc+0x6a
>>> >> putchar() at putchar+0x5f
>>> >> kvprintf() at kvprintf+0xd45
>>> >> printf() at printf+0xe1
>>> >> panic() at panic+0x145
>>> >> xpt_done() at xpt_done+0x14a
>>> >> arcmsr_interrupt() at arcmsr_interrupt+0x2df
>>> >> ithread_loop() at ithread_loop+0x108
>>> >> fork_exit() at fork_exit+0xaa
>>> >> fork_trampoline() at fork_trampoline+0xe
>>> >> --- trap 0, rip = 0, rsp = 0xffffffffac0a3d30, rbp = 0 ---
>>> >
>>> > Looks like it has panic'd here:
>>> >
>>> >                 switch (done_ccb->ccb_h.path->periph->type) {
>>> >                 case CAM_PERIPH_BIO:
>>> >                         mtx_lock(&cam_bioq_lock);
>>> >                         TAILQ_INSERT_TAIL(&cam_bioq, &done_ccb->ccb_h,
>>> >                                           sim_links.tqe);
>>> >                         done_ccb->ccb_h.pinfo.index = CAM_DONEQ_INDEX;
>>> >                         mtx_unlock(&cam_bioq_lock);
>>> >                         swi_sched(cambio_ih, 0);
>>> >                         break;
>>> >                 default:
>>> >                         panic("unknown periph type %d",
>>> >                             done_ccb->ccb_h.path->periph->type);
>>> >                 }
>>> >
>>> > which should seem to indicate that, yes, it is a driver bug.
>>> >
>>>
>>> The doneq has gotten corrupted somehow.  The only real way that this
>>> could happen is if xpt_done() was called twice on the same ccb.  Whether
>>> this is a hardware bug (hardware completing the same command twice) or
>>> a driver bug is unknown.  I'll try to add some seatbelts to CAM to
>>> detect this kind of condition.  But yes, it's ultimately something in
>>> the arcmsr subsystem that is at fault.
>>
>> Do you have any suggestions of instrumentation printfs I could add to
>> zero in on what part of the driver is at fault?
>>
>> Matt
>