Re: CAM problem

From: Alexander Motin <mav_at_FreeBSD.org>
Date: Tue, 20 Oct 2009 11:36:19 +0300
Andrew Thompson wrote:
> I have a cam problem that is noticeable with usb devices. It relates to
> the ordering of xpt_release_device() and the CAM_DEV_UNCONFIGURED flag
> when yanking a device that has stalled. This then causes a problem with
> the usb explore thread which will end up waiting on simfree forever,
> blocking any further usb attach/detach on the controller.
> 
> Hopefully my printfs can show the problem. I have replaced the pointers
> returned from xpt_alloc_device() with pretty names, <dev3> is the one in
> question.
> 
> <...unplug...>
> 
> ugen1.3: <KINGSTON> at usbus1 (disconnected)
> umass0: at uhub2, port 1, addr 3 (disconnected)
> umass_detach:
> usb_cam_action, device GONE
> usb_cam_action, device GONE
> usb_cam_action, device GONE
> xpt_find_bus: ref=6 -> 7
> usb_cam_action, device GONE
> usb_cam_action, device GONE

As I can see, you are returning CAM_TID_INVALID error here. There is no
special error handling for this error, comparing to CAM_SEL_TIMEOUT. If
you return CAM_SEL_TIMEOUT there, device will be killed immediately and
probably workaround this specific problem.

> xpt_release_device dev3 failed, ref=3 unconf=0
> xpt_release_path: xpt_release_bus
> xpt_release_bus: ref=7 -> 6
> (da0:umass-sim0:0:0:0): got CAM status 0x39
> (da0:umass-sim0:0:0:0): fatal error, failed to attach to device
> (da0:umass-sim0:0:0:0): lost device
> (da0:umass-sim0:0:0:0): removing device entry
> 
>  ^^^ USB disk had stalled on attach

This thing drops reference as periph driver detached itself, but device
is still treated as valid by XPT.

> xpt_release_device dev3 failed, ref=1 unconf=0
> xpt_release_path: xpt_release_bus
> xpt_release_bus: ref=6 -> 5
> xpt_release_device dev3 failed, ref=0 unconf=0
> 
>  ^^^ last reference to dev3 dropped

>From deallocation point of view, configured status handled the same as
one more reference...

> xpt_release_path: xpt_release_bus
> xpt_release_bus: ref=5 -> 4
> xpt_release_device dev2 OK 
> xpt_release_target: xpt_release_bus
> xpt_release_bus: ref=4 -> 3
> xpt_release_path: xpt_release_bus
> xpt_release_bus: ref=3 -> 2
> umass_cam_detach_sim: calling xpt_bus_deregister
> xpt_find_bus: ref=2 -> 3
> xpt_alloc_target: ref=3 -> 4
> xpt_alloc_device: device = dev4
> scsi_dev_async: set dev dev3 unconfigured
> 
>  ^^^ dev3 gets the CAM_DEV_UNCONFIGURED flag cleared here

... but removing configured status does not call deallocation, as
unreferencing does.

> xpt_bus_deregister: xpt_release_bus
> xpt_release_bus: ref=4 -> 3
> xpt_release_device dev4 OK 
> xpt_release_target: xpt_release_bus
> xpt_release_bus: ref=3 -> 2
> xpt_release_path: xpt_release_bus
> xpt_release_bus: ref=2 -> 1
> umass_cam_detach_sim:
> umass-sim0: waiting... ref = 1
> 
>  ^^^ wait on "simfree" forever.

I think correct solution will be to additionally increment reference
counter before clearing CAM_DEV_UNCONFIGURED and decrement it back after
setting CAM_DEV_UNCONFIGURED back. Check for CAM_DEV_UNCONFIGURED inside
xpt_release_device() then could be removed or turned into assertion.

-- 
Alexander Motin
Received on Tue Oct 20 2009 - 06:36:23 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:57 UTC