> -----Original Message----- > From: Bjoern A. Zeeb [mailto:bz_at_FreeBSD.org] > Sent: Tuesday, April 26, 2005 3:26 AM > To: Vinod Kashyap > Subject: RE: Problem with twa in HEAD > > > On Mon, 25 Apr 2005, Vinod Kashyap wrote: > > Hi, > > > > -----Original Message----- > > > From: Bjoern A. Zeeb [mailto:bz_at_FreeBSD.org] > > > Sent: Monday, April 25, 2005 6:45 AM > > > To: Vinod Kashyap > > > Subject: Re: Problem with twa in HEAD > > > > > > > > > On Fri, 22 Apr 2005, Bjoern A. Zeeb wrote: > > > > > > Hi, > > > > > > > scottl redirected me to you. > > > > > > > > I am currently debugging "hangs" on reboot and shutdown on a > > > > SMP machine with 12 discs at a > > > > > > > > 3ware device driver for 9000 series storage controllers, > > > version: 3.60.00.016 > > > > twa0: <3ware 9000 series Storage Controller> port > > > 0x9800-0x98ff mem 0xfe8ffc00-0xfe8ffcff,0xfb800000-0xfbffffff > > > irq 28 at device 6.0 on pci3 > > > > twa0: [FAST] > > > > twa0: INFO: (0x15: 0x1300): Controller details:: 12 ports, > > > Firmware FE9X 2.06.00.009, BIOS BE9X 2.03.01.051 > > > > > > > > > > > > What I know so far is that Giant is held by sync. > > > > > > > > Things a "spinning" in cam/cam_xpt.c around: > > > > > > > > --- cam_xpt.c 31 Mar 2005 21:42:49 -0000 1.152 > > > > +++ cam_xpt.c 22 Apr 2005 18:42:43 -0000 > > > > _at__at_ -3643,6 +3643,7 _at__at_ xpt_polled_action(union ccb *start_ccb) > > > > != CAM_REQ_INPROG) > > > > break; > > > > DELAY(1000); > > > > printf("XXX status=%02x\n", > > > start_ccb->ccb_h.status); > > > > } > > > > if (timeout == 0) { > > > > /* > > > > > > > > > > > > with status being 0x200. > > > > > > > > Seems the twa has a command stuck in it. > > > > > > > > I have seen the comment in dev/twa/tw_osl_cam.c ~ line 253 about > > > > queuing and CAM_SIM_QUEUED but I don't know enough about cam. > > > > I seems no all patchs out of this functions seem to > clear that from > > > > status? > > > > > > > > Any help apreaciated ;) I can try patches; as long as I > can break > > > > to db> to reboot. > > > > > > further debugging shows that is seems to be spinning in twa_poll. > > > see debug output from TWA_DEBUG 3. The problem is that at > this point > > > I am no longer able to break to debugger. > > > > > > twa0: tw_osli_execute_scsi: XPT_SCSI_IO: Single virtual address! > > > twa0: tw_osli_execute_scsi: XPT_SCSI_IO: Single virtual address! > > > unmount of /dev failed (BUSY) > > > twa0: tw_osli_execute_scsi: XPT_SCSI_IO: Single virtual address! > > > twa0: tw_osli_execute_scsi: XPT_SCSI_IO: Single virtual address! > > > Uptime: 2m57s > > > twa0: tw_osli_execute_scsi: XPT_SCSI_IO: Single virtual address! > > > twa0: twa_poll: entering; sc = 0xc57bb200 > > > twa0: twa_poll: exiting; sc = 0xc57bb200 > > > twa0: twa_poll: entering; sc = 0xc57bb200 > > > twa0: twa_poll: exiting; sc = 0xc57bb200 > > > twa0: twa_poll: entering; sc = 0xc57bb200 > > > twa0: twa_poll: exiting; sc = 0xc57bb200 > > > twa0: twa_poll: entering; sc = 0xc57bb200 > > > twa0: twa_poll: exiting; sc = 0xc57bb200 > > > twa0: twa_poll: entering; sc = 0xc57bb200 > > > twa0: twa_poll: exiting; sc = 0xc57bb200 > > > twa0: twa_poll: entering; sc = 0xc57bb200 > > > twa0: twa_poll: exiting; sc = 0xc57bb200 > > > twa0: twa_poll: entering; sc = 0xc57bb200 > > > twa0: twa_poll: exiting; sc = 0xc57bb200 > > > ... > > > > > > > I am in the middle of an office move right now. > > I will get back to you once I have some time to look into this. > > > thanks for the information; I'll be able to test at least until end of > this week and hopefully next week too. > I looked into this, and this is what is happening: On reboot/halt, the following function calling sequence happens: ... --> dashutdown --> xpt_polled_action --> twa_poll. But, the interrupt handler in twa is still active at this time, since twa_detach/twa_shutdown hasn't been called yet. Before twa_poll can fetch the response for the posted command, the ISR gets called when the firmware posts the response. The ISR clears the interrupt bit on the controller, registers a taskqueue handler like it always does, and exits. Meanwhile, xpt_polled_action continues to call twa_poll, which cannot determine that the command has completed, since the interrupt bit on the controller is already cleared. So, we get into a (near) never-ending loop (the timeout for scsi_synchronize_cache, which is what is being tried here, is, for whatever reason, 60 minutes, and so, the system is as good as hung). Now, does anyone know why xpt_polled_action is being called from dashutdown, even before the ISR has been unregistered (via twa_detach)? Bjoern, this patch should work-around your problem, although it's not the fix. Also, it still leaves a window for the race condition described above. diff -u -r ../twa.cur/tw_osl.h ./tw_osl.h --- ../twa.cur/tw_osl.h Fri Apr 8 12:43:45 2005 +++ ./tw_osl.h Thu Apr 28 20:28:40 2005 _at__at_ -71,6 +71,7 _at__at_ /* Possible values of sc->state. */ #define TW_OSLI_CTLR_STATE_OPEN (1<<0) /* control device is open */ #define TW_OSLI_CTLR_STATE_SIMQ_FROZEN (1<<1) /* simq frozen */ +#define TW_OSLI_CTLR_STATE_POLLING (1<<2) /* polling for ctlr response */ #ifdef TW_OSL_DEBUG diff -u -r ../twa.cur/tw_osl_cam.c ./tw_osl_cam.c --- ../twa.cur/tw_osl_cam.c Fri Apr 8 12:43:57 2005 +++ ./tw_osl_cam.c Thu Apr 28 20:29:22 2005 _at__at_ -482,6 +482,7 _at__at_ struct twa_softc *sc = (struct twa_softc *)(cam_sim_softc(sim)); tw_osli_dbg_dprintf(3, sc, "entering; sc = %p", sc); + sc->state |= TW_OSLI_CTLR_STATE_POLLING; if (tw_cl_interrupt(&(sc->ctlr_handle))) tw_cl_deferred_interrupt(&(sc->ctlr_handle)); tw_osli_dbg_dprintf(3, sc, "exiting; sc = %p", sc); diff -u -r ../twa.cur/tw_osl_freebsd.c ./tw_osl_freebsd.c --- ../twa.cur/tw_osl_freebsd.c Fri Apr 8 12:44:12 2005 +++ ./tw_osl_freebsd.c Thu Apr 28 20:31:25 2005 _at__at_ -964,6 +964,8 _at__at_ struct twa_softc *sc = (struct twa_softc *)arg; tw_osli_dbg_dprintf(10, sc, "entered"); + if (sc->state & TW_OSLI_CTLR_STATE_POLLING) + return; if (tw_cl_interrupt(&(sc->ctlr_handle))) taskqueue_enqueue_fast(taskqueue_fast, &(sc->deferred_intr_callback));Received on Fri Apr 29 2005 - 01:50:52 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:38:33 UTC