On Mon, Dec 14, 2009 at 4:47 PM, Alexander Sack <pisymbol_at_gmail.com> wrote: > Hello Again: > > I guess I have a technical question/concern that I was looking for > feedback. During the probe sequence, aac(4) conditionally responds > to INQUIRY commands depending on target LUN: > > aac_cam.c/aac_cam_complete(): > 532 if (command == INQUIRY) { > 533 if (ccb->ccb_h.status == CAM_REQ_CMP) { > 534 device = ccb->csio.data_ptr[0] & 0x1f; > 535 /* > 536 * We want DASD and PROC devices to only be > 537 * visible through the pass device. > 538 */ > 539 if ((device == T_DIRECT) || > 540 (device == T_PROCESSOR) || > 541 (sc->flags & AAC_FLAGS_CAM_PASSONLY)) > 542 ccb->csio.data_ptr[0] = > 543 ((device & 0xe0) | T_NODEVICE); > 544 } else if (ccb->ccb_h.status == > CAM_SEL_TIMEOUT && > 545 ccb->ccb_h.target_lun != 0) { > 546 /* fix for INQUIRYs on Lun>0 */ > 547 ccb->ccb_h.status = > CAM_DEV_NOT_THERE; > 548 } > 549 } > > Why is CAM_DEV_NOT_THERE skipped on LUN 0? This is true on my target > 6.1-amd64 machine as well as CURRENT. The reason why I ask this is > because now that aac(4) is sequential scanned, there are a lot of cam > interrupts that come in on my 6.x machine where the threshold is only > 500 and I get the interrupt storm threshold warning for swi2 pretty > quickly: > > Interrupt storm detected on "swi2:"; throttling interrupt source > > Obviously its contingent on the number of adapters you have on your > system. On CURRENT I didn't see this because the threshold is double > (I think its a 1000 by default). > > The issue is the number of xpt_async(AC_LOST_DEVICE, ..) calls during > the scan. The probe sequence in CURRENT as well as 6.1 handles > CAM_SEL_TIMEOUT a little differently depending on context. > > scsi_xpt.c/probedone(): > 1090 } else if (cam_periph_error(done_ccb, 0, > 1091 done_ccb->ccb_h.target_lun > 0 > 1092 ? SF_RETRY_UA|SF_QUIET_IR > 1093 : SF_RETRY_UA, > 1094 &softc->saved_ccb) == > ERESTART) { > 1095 return; > 1096 } else if ((done_ccb->ccb_h.status & CAM_DEV_QFRZN) != 0) { > 1097 /* Don't wedge the queue */ > 1098 xpt_release_devq(done_ccb->ccb_h.path, /*count*/1, > 1099 /*run_queue*/TRUE); > 1100 } > 1101 /* > 1102 * If we get to this point, we got an error status back > 1103 * from the inquiry and the error status doesn't require > 1104 * automatically retrying the command. Therefore, the > 1105 * inquiry failed. If we had inquiry information before > 1106 * for this device, but this latest inquiry command failed, > 1107 * the device has probably gone away. If this device isn't > 1108 * already marked unconfigured, notify the peripheral > 1109 * drivers that this device is no more. > 1110 */ > 1111 if ((path->device->flags & CAM_DEV_UNCONFIGURED) == 0) > 1112 /* Send the async notification. */ > 1113 xpt_async(AC_LOST_DEVICE, path, NULL); > 1114 > 1115 xpt_release_ccb(done_ccb); > 1116 break; > 1117 } > > But on cam_periph_error(), this will issue a xpt_async(AC_LOST_DEVICE, > path, NULL) regardless of whether or not the device has been scene > already (as per the comment above), i.e. on every initial bus scan, > you will get into (on an aac(4) card with LUN > 0): > > cam_periph.c/cam_periph_error(): > 1697 case CAM_SEL_TIMEOUT: > 1698 { > . > . > 1729 /* > 1730 * Let peripheral drivers know that this device has gone > 1731 * away. > 1732 */ > 1733 xpt_async(AC_LOST_DEVICE, newpath, NULL); > 1734 xpt_free_path(newpath); > 1735 break; > > Is this really right? This generates A LOT of interrupts noise when no > devices are attached during the initial scan, i.e. we are treating the > initial scan of failed INQUIRY commands on the SCSI BUS as if we > really lost a device during a selection timeout. (we even generate a > path to issue the async event). I should have properly titled the thread a little bit better, but basically we always generate a ton of software CAM interrupts during a LUN scan for targets on aac(4) that do not really exist (i.e. nothing is truly there). We do this because we treat the initial INQUIRY sent down equivalent to a selection timeout instead of the device is not really there. There seems to be an historical workaround for part of this issue but I am trying to delve deeper in order to do the *right thing* for our 6.1 deployments (as well as 7.x and CURRENT). -apsReceived on Mon Dec 14 2009 - 21:09:10 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:59 UTC