Hi Dear ALL: I am areca test team,sorry reply delay,I try this question , run dd if=/dev/zero of=/dev/da1 bs=1024 at FreeBSD-7.0-current version, It is normal.I don't see any error,I have attach file that is freebsd messages.but I used to one CPU,because I have one.Can you use one CPU test again,please? MB:supermicro X7DB8 BIOS:05/29/07 RAID CARD:Arc 1220 F/w:1.43 create two RAID6 volume attach four HDD first volume --FreeBSD 7.0 second volume--is free space If you have any new suggestions that reply. thank you~ Best Regards, Lusa Sue Areca Technology Test Engineer Tel : 886-2-87974060 Ext. 233 Fax : 886-2-87975970 Http://www.areca.com.tw ----- Original Message ----- From: "erich" <erich_at_areca.com.tw> To: "(廣安科技)蘇莉嵐" <lusa_at_areca.com.tw> Sent: Monday, July 16, 2007 4:36 PM Subject: Fw: arcmsr crash > > ----- Original Message ----- > From: "Matt Reimer" <mattjreimer_at_gmail.com> > To: "Scott Long" <scottl_at_samsco.org> > Cc: "John Baldwin" <jhb_at_freebsd.org>; <freebsd-current_at_freebsd.org>; > "erich" <erich_at_areca.com.tw> > Sent: Saturday, July 14, 2007 4:46 AM > Subject: Re: arcmsr crash > > >> On 7/13/07, Scott Long <scottl_at_samsco.org> wrote: >>> John Baldwin wrote: >>> > On Tuesday 05 June 2007 05:22:38 pm Matt Reimer wrote: >>> >> Once a week or so we're seeing a panic with a -current kernel built >>> >> just before the gcc 4.2 import (maybe three weeks ago). The box has a >>> >> Supermicro X7DBE/X7DBE+ motherboard with two Xeon 5160s, 16G RAM, and >>> >> an Areca 1220 controller with eight 500G disks connected. >>> >> >>> >> Does this indicate that the arcmsr driver is at fault: >>> >> >>> >> Tracing command irq16: arcmsr0 pid 26 tid 100018 td >>> >> 0xffffff040fc5b000 >>> >> cpustop_handler() at cpustop_handler+0x35 >>> >> ipi_nmi_handler() at ipi_nmi_handler+0x2e >>> >> trap() at trap+0x365 >>> >> nmi_calltrap() at nmi_calltrap+0x8 >>> >> --- trap 0x13, rip = 0xffffffff8041ab11, rsp = 0xffffffffab59eff0, >>> >> rbp >>> >> = 0xffffffffac0a37d0 --- >>> >> siocnclose() at siocnclose+0x21 >>> >> sio_cnputc() at sio_cnputc+0x89 >>> >> cnputc() at cnputc+0x6a >>> >> putchar() at putchar+0x5f >>> >> kvprintf() at kvprintf+0xd45 >>> >> printf() at printf+0xe1 >>> >> panic() at panic+0x145 >>> >> xpt_done() at xpt_done+0x14a >>> >> arcmsr_interrupt() at arcmsr_interrupt+0x2df >>> >> ithread_loop() at ithread_loop+0x108 >>> >> fork_exit() at fork_exit+0xaa >>> >> fork_trampoline() at fork_trampoline+0xe >>> >> --- trap 0, rip = 0, rsp = 0xffffffffac0a3d30, rbp = 0 --- >>> > >>> > Looks like it has panic'd here: >>> > >>> > switch (done_ccb->ccb_h.path->periph->type) { >>> > case CAM_PERIPH_BIO: >>> > mtx_lock(&cam_bioq_lock); >>> > TAILQ_INSERT_TAIL(&cam_bioq, &done_ccb->ccb_h, >>> > sim_links.tqe); >>> > done_ccb->ccb_h.pinfo.index = CAM_DONEQ_INDEX; >>> > mtx_unlock(&cam_bioq_lock); >>> > swi_sched(cambio_ih, 0); >>> > break; >>> > default: >>> > panic("unknown periph type %d", >>> > done_ccb->ccb_h.path->periph->type); >>> > } >>> > >>> > which should seem to indicate that, yes, it is a driver bug. >>> > >>> >>> The doneq has gotten corrupted somehow. The only real way that this >>> could happen is if xpt_done() was called twice on the same ccb. Whether >>> this is a hardware bug (hardware completing the same command twice) or >>> a driver bug is unknown. I'll try to add some seatbelts to CAM to >>> detect this kind of condition. But yes, it's ultimately something in >>> the arcmsr subsystem that is at fault. >> >> Do you have any suggestions of instrumentation printfs I could add to >> zero in on what part of the driver is at fault? >> >> Matt >
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:14 UTC