Matt Reimer wrote: > On 7/13/07, Scott Long <scottl_at_samsco.org> wrote: >> Matt Reimer wrote: >> > On 7/13/07, John Baldwin <jhb_at_freebsd.org> wrote: >> >> On Tuesday 05 June 2007 05:22:38 pm Matt Reimer wrote: >> >> > Once a week or so we're seeing a panic with a -current kernel built >> >> > just before the gcc 4.2 import (maybe three weeks ago). The box >> has a >> >> > Supermicro X7DBE/X7DBE+ motherboard with two Xeon 5160s, 16G RAM, >> and >> >> > an Areca 1220 controller with eight 500G disks connected. >> >> > >> >> > Does this indicate that the arcmsr driver is at fault: >> >> > >> >> > Tracing command irq16: arcmsr0 pid 26 tid 100018 td >> 0xffffff040fc5b000 >> >> > cpustop_handler() at cpustop_handler+0x35 >> >> > ipi_nmi_handler() at ipi_nmi_handler+0x2e >> >> > trap() at trap+0x365 >> >> > nmi_calltrap() at nmi_calltrap+0x8 >> >> > --- trap 0x13, rip = 0xffffffff8041ab11, rsp = >> 0xffffffffab59eff0, rbp >> >> > = 0xffffffffac0a37d0 --- >> >> > siocnclose() at siocnclose+0x21 >> >> > sio_cnputc() at sio_cnputc+0x89 >> >> > cnputc() at cnputc+0x6a >> >> > putchar() at putchar+0x5f >> >> > kvprintf() at kvprintf+0xd45 >> >> > printf() at printf+0xe1 >> >> > panic() at panic+0x145 >> >> > xpt_done() at xpt_done+0x14a >> >> > arcmsr_interrupt() at arcmsr_interrupt+0x2df >> >> > ithread_loop() at ithread_loop+0x108 >> >> > fork_exit() at fork_exit+0xaa >> >> > fork_trampoline() at fork_trampoline+0xe >> >> > --- trap 0, rip = 0, rsp = 0xffffffffac0a3d30, rbp = 0 --- >> >> >> >> Looks like it has panic'd here: >> >> >> >> switch (done_ccb->ccb_h.path->periph->type) { >> >> case CAM_PERIPH_BIO: >> >> mtx_lock(&cam_bioq_lock); >> >> TAILQ_INSERT_TAIL(&cam_bioq, &done_ccb->ccb_h, >> >> sim_links.tqe); >> >> done_ccb->ccb_h.pinfo.index = CAM_DONEQ_INDEX; >> >> mtx_unlock(&cam_bioq_lock); >> >> swi_sched(cambio_ih, 0); >> >> break; >> >> default: >> >> panic("unknown periph type %d", >> >> done_ccb->ccb_h.path->periph->type); >> >> } >> >> >> >> which should seem to indicate that, yes, it is a driver bug. >> > >> > That code in -CURRENT looks a bit different (cam_simq_lock instead of >> > cam_bioq_lock, etc.). Is that relevant to your analysis? >> > >> > Matt >> >> The locking is different, but the problem is basically the same. Are >> you using 7-CURRENT or 6.x? > > 7-CURRENT from right before the gcc upgrade. > > Matt Crud.... now that I look closer, I can definitely see the locking problems in the driver. I think the locking will have to be completely overhauled. Can I use you as a guinea pig for testing? ScottReceived on Fri Jul 13 2007 - 21:29:56 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:14 UTC