Re: arcmsr crash

From: Scott Long <scottl_at_samsco.org>
Date: Fri, 13 Jul 2007 17:29:38 -0600
Matt Reimer wrote:
> On 7/13/07, Scott Long <scottl_at_samsco.org> wrote:
>> Matt Reimer wrote:
>> > On 7/13/07, John Baldwin <jhb_at_freebsd.org> wrote:
>> >> On Tuesday 05 June 2007 05:22:38 pm Matt Reimer wrote:
>> >> > Once a week or so we're seeing a panic with a -current kernel built
>> >> > just before the gcc 4.2 import (maybe three weeks ago). The box
>> has a
>> >> > Supermicro X7DBE/X7DBE+ motherboard with two Xeon 5160s, 16G RAM,
>> and
>> >> > an Areca 1220 controller with eight 500G disks connected.
>> >> >
>> >> > Does this indicate that the arcmsr driver is at fault:
>> >> >
>> >> > Tracing command irq16: arcmsr0 pid 26 tid 100018 td
>> 0xffffff040fc5b000
>> >> > cpustop_handler() at cpustop_handler+0x35
>> >> > ipi_nmi_handler() at ipi_nmi_handler+0x2e
>> >> > trap() at trap+0x365
>> >> > nmi_calltrap() at nmi_calltrap+0x8
>> >> > --- trap 0x13, rip = 0xffffffff8041ab11, rsp =
>> 0xffffffffab59eff0, rbp
>> >> > = 0xffffffffac0a37d0 ---
>> >> > siocnclose() at siocnclose+0x21
>> >> > sio_cnputc() at sio_cnputc+0x89
>> >> > cnputc() at cnputc+0x6a
>> >> > putchar() at putchar+0x5f
>> >> > kvprintf() at kvprintf+0xd45
>> >> > printf() at printf+0xe1
>> >> > panic() at panic+0x145
>> >> > xpt_done() at xpt_done+0x14a
>> >> > arcmsr_interrupt() at arcmsr_interrupt+0x2df
>> >> > ithread_loop() at ithread_loop+0x108
>> >> > fork_exit() at fork_exit+0xaa
>> >> > fork_trampoline() at fork_trampoline+0xe
>> >> > --- trap 0, rip = 0, rsp = 0xffffffffac0a3d30, rbp = 0 ---
>> >>
>> >> Looks like it has panic'd here:
>> >>
>> >>                 switch (done_ccb->ccb_h.path->periph->type) {
>> >>                 case CAM_PERIPH_BIO:
>> >>                         mtx_lock(&cam_bioq_lock);
>> >>                         TAILQ_INSERT_TAIL(&cam_bioq, &done_ccb->ccb_h,
>> >>                                           sim_links.tqe);
>> >>                         done_ccb->ccb_h.pinfo.index = CAM_DONEQ_INDEX;
>> >>                         mtx_unlock(&cam_bioq_lock);
>> >>                         swi_sched(cambio_ih, 0);
>> >>                         break;
>> >>                 default:
>> >>                         panic("unknown periph type %d",
>> >>                             done_ccb->ccb_h.path->periph->type);
>> >>                 }
>> >>
>> >> which should seem to indicate that, yes, it is a driver bug.
>> >
>> > That code in -CURRENT looks a bit different (cam_simq_lock instead of
>> > cam_bioq_lock, etc.). Is that relevant to your analysis?
>> >
>> > Matt
>>
>> The locking is different, but the problem is basically the same.  Are
>> you using 7-CURRENT or 6.x?
> 
> 7-CURRENT from right before the gcc upgrade.
> 
> Matt

Crud.... now that I look closer, I can definitely see the locking
problems in the driver.  I think the locking will have to be completely
overhauled.  Can I use you as a guinea pig for testing?

Scott
Received on Fri Jul 13 2007 - 21:29:56 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:14 UTC