On Tue, Dec 15, 2009 at 4:54 AM, Scott Long <scottl_at_samsco.org> wrote: > On Dec 14, 2009, at 2:47 PM, Alexander Sack wrote: >> >> Hello Again: >> >> I guess I have a technical question/concern that I was looking for >> feedback. During the probe sequence, aac(4) conditionally responds >> to INQUIRY commands depending on target LUN: >> >> aac_cam.c/aac_cam_complete(): >> 532 if (command == INQUIRY) { >> 533 if (ccb->ccb_h.status == CAM_REQ_CMP) >> { >> 534 device = ccb->csio.data_ptr[0] & 0x1f; >> 535 /* >> 536 * We want DASD and PROC devices to >> only be >> 537 * visible through the pass device. >> 538 */ >> 539 if ((device == T_DIRECT) || >> 540 (device == T_PROCESSOR) || >> 541 (sc->flags & >> AAC_FLAGS_CAM_PASSONLY)) >> 542 ccb->csio.data_ptr[0] = >> 543 ((device & 0xe0) | >> T_NODEVICE); >> 544 } else if (ccb->ccb_h.status == >> CAM_SEL_TIMEOUT && >> 545 ccb->ccb_h.target_lun != 0) { >> 546 /* fix for INQUIRYs on Lun>0 >> */ >> 547 ccb->ccb_h.status = >> CAM_DEV_NOT_THERE; >> 548 } >> 549 } >> >> Why is CAM_DEV_NOT_THERE skipped on LUN 0? > > In the parallel scsi world, a selection timeout means that all LUNs within > the entire target do not (or no longer) exist. So returning > CAM_SEL_TIMEOUT for LUN 1 would tell CAM to invalidate LUN 0 as well. > > If you look higher up in this function, you'll see a note about the > error/status codes from the AAC firmware coincidentally matching CAM's > status codes. My guess is that somewhere along the line, someone at Adaptec > stopped reading the SCSI spec and starting returning CAM_SEL_TIMEOUT for > LUNs greater than 0, which is why this work-around is now in the driver. Interesting. Learn something everyday. I did not know that a selection timeout on a non-zero LUN meant no other LUN was available. As a colleague noted, "Has Adaptec ever read the SCSI spec?" Just kidding (somewhat).... >> This is true on my target >> 6.1-amd64 machine as well as CURRENT. The reason why I ask this is >> because now that aac(4) is sequential scanned, there are a lot of cam >> interrupts that come in on my 6.x machine where the threshold is only >> 500 and I get the interrupt storm threshold warning for swi2 pretty >> quickly: >> >> Interrupt storm detected on "swi2:"; throttling interrupt source >> >> Obviously its contingent on the number of adapters you have on your >> system. On CURRENT I didn't see this because the threshold is double >> (I think its a 1000 by default). >> >> The issue is the number of xpt_async(AC_LOST_DEVICE, ..) calls during >> the scan. The probe sequence in CURRENT as well as 6.1 handles >> CAM_SEL_TIMEOUT a little differently depending on context. Yeah I spoke too soon. I think that is a red herring though and misinterpretation of what that was really doing (in this case just seeing the device as unconfigured and moving on). But I STILL don't understand why its treated as a AC_LOST_DEVICE event at scan time (i.e. more overhead than really necessary but perhaps I am not thinking of all the possibilities down this code path, i.e. why create a path, then call xpt_asyc, all to just set the flag as unconfigured - perhaps its more align with the model than anything else and I'm reading too much into it). > It's not at all clear to me what is going on here. Can you instrument the > code to record the status of everything that is being issued to the aac_cam > module? Yes surely. I think what might be happening is that after the INQUIRY fails, xpt_release_ccb() which I think will also check to see if any more CCBs should be sent to the device and send them. Basically the boot -v output is I am getting a CAM_SEL_TIMEOUT for each target and just hit into the 500 interrupt storm default threshold on 6.1. Let me investigate further...I'm on the right track, but I need to instrument more...Scott its my first time playing with CAM (be gentle). :D -apsReceived on Wed Dec 16 2009 - 16:11:01 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:59 UTC