Re: aac(4) handling of probe when no devices are there

From: Scott Long <scottl_at_samsco.org>
Date: Tue, 15 Dec 2009 02:54:19 -0700
On Dec 14, 2009, at 2:47 PM, Alexander Sack wrote:
> Hello Again:
>
> I guess I have a technical question/concern that I was looking for
> feedback.   During the probe sequence, aac(4) conditionally responds
> to INQUIRY commands depending on target LUN:
>
> aac_cam.c/aac_cam_complete():
> 532                         if (command == INQUIRY) {
> 533                                 if (ccb->ccb_h.status ==  
> CAM_REQ_CMP) {
> 534                                 device = ccb->csio.data_ptr[0] &  
> 0x1f;
> 535                                 /*
> 536                                  * We want DASD and PROC devices  
> to only be
> 537                                  * visible through the pass  
> device.
> 538                                  */
> 539                                 if ((device == T_DIRECT) ||
> 540                                     (device == T_PROCESSOR) ||
> 541                                     (sc->flags &  
> AAC_FLAGS_CAM_PASSONLY))
> 542                                         ccb->csio.data_ptr[0] =
> 543                                             ((device & 0xe0) |  
> T_NODEVICE);
> 544                                 } else if (ccb->ccb_h.status ==
> CAM_SEL_TIMEOUT &&
> 545                                         ccb->ccb_h.target_lun !=  
> 0) {
> 546                                         /* fix for INQUIRYs on  
> Lun>0 */
> 547                                         ccb->ccb_h.status =
> CAM_DEV_NOT_THERE;
> 548                                 }
> 549                         }
>
> Why is CAM_DEV_NOT_THERE skipped on LUN 0?

In the parallel scsi world, a selection timeout means that all LUNs  
within the entire target  do not (or no longer) exist.  So returning  
CAM_SEL_TIMEOUT for LUN 1 would tell CAM to invalidate LUN 0 as well.

If you look higher up in this function, you'll see a note about the  
error/status codes from the AAC firmware coincidentally matching CAM's  
status codes.  My guess is that somewhere along the line, someone at  
Adaptec stopped reading the SCSI spec and starting returning  
CAM_SEL_TIMEOUT for LUNs greater than 0, which is why this work-around  
is now in the driver.

>  This is true on my target
> 6.1-amd64 machine as well as CURRENT.  The reason why I ask this is
> because now that aac(4) is sequential scanned, there are a lot of cam
> interrupts that come in on my 6.x machine where the threshold is only
> 500 and I get the interrupt storm threshold warning for swi2 pretty
> quickly:
>
> Interrupt storm detected on "swi2:"; throttling interrupt source
>
> Obviously its contingent on the number of adapters you have on your
> system.  On CURRENT I didn't see this because the threshold is double
> (I think its a 1000 by default).
>
> The issue is the number of xpt_async(AC_LOST_DEVICE, ..) calls during
> the scan.  The probe sequence in CURRENT as well as 6.1 handles
> CAM_SEL_TIMEOUT a little differently depending on context.
>

It's not at all clear to me what is going on here.  Can you instrument  
the code to record the status of everything that is being issued to  
the aac_cam module?

Scott
Received on Tue Dec 15 2009 - 08:54:27 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:59 UTC