Re: Firewire disk/tape access stopped working after recent CAM commit

From: Kenneth D. Merry <ken_at_freebsd.org>
Date: Mon, 23 Jan 2012 11:16:05 -0700
On Sun, Jan 22, 2012 at 20:52:38 -0600, Richard Todd wrote:
> Hi.  I tried upgrading my amd64 10-CURRENT box to the most recent -CURRENT code
> and found that the new kernel couldn't find my two disks and tape drive that
> are on a Firewire bus.  All the USB and AHCI-attached hardware still showed
> up okay, it's just the Firewire stuff that failed to show up properly on boot.
> Spent today doing binary search to find the responsible commit and it looks
> to be this one: 
> 
>   r230000 | ken | 2012-01-11 18:41:48 -0600 (Wed, 11 Jan 2012) | 72 lines
> 
>   Fix a race condition in CAM peripheral free handling, locking
>   in the CAM XPT bus traversal code, and a number of other periph level
>   issues.
> 
> Not sure what in this commit triggers the problem, or why it just hits 
> Firewire and not the rest of the system.   I've built kernels both right
> before and right after the r230000 commit, with CAM debugging turned on real
> high on the firewire bus in question, bus 0 (hardwired to that number in
> device.hints, if that matters)
> 
>  options CAMDEBUG
>  options CAM_DEBUG_BUS=0
>  options CAM_DEBUG_TARGET=-1
>  options CAM_DEBUG_LUN=-1
>  options CAM_DEBUG_FLAGS=CAM_DEBUG_INFO|CAM_DEBUG_TRACE|CAM_DEBUG_CDB
> 
> and got dmesgs of both the "bad" (r230000) and "good" (pre-r230000) kernels,
> which I've put online at http://ln.servalan.com/rmtodd/bug1/dmesg.bad and
> http://ln.servalan.com/rmtodd/bug1/dmesg.good, respectively.  They're a bit
> lengthy, what with all that debug info.  Grepping out the info for one of
> the targets (disk 0, sbp0:0:0:0) and just looking at the lines for that one,
> we see that the "good" kernel does a lot more with that target, starting
> with the "(noperiph:sbp0:0:0:0): xpt_compile_path" bit, that the "bad"
> kernel doesn't do, as seen in the diff below. 
> 
> Not sure what's going on here, but if anyone has suggestions on more things
> I can test/debug code I can add to track this down further, let me know.

Thanks for testing this out, and for sending all of the debugging output!

If you can, please try the attached patch and see if it has any impact on
the problem.  There is a bug in that commit in that we shouldn't be
invalidating all LUNs on a target when we get a status of
CAM_DEV_NOT_THERE.

It may be that we need to do a more thorough audit of how various SIM
drivers are using the CAM_DEV_NOT_THERE status.

Thanks,

Ken
-- 
Kenneth Merry
ken_at_FreeBSD.ORG

Received on Mon Jan 23 2012 - 17:16:06 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:23 UTC