Re: LOR in mpr(4)

From: Pete Wright <pete_at_nomadlogic.org> Date: Wed, 19 Oct 2016 09:39:27 -0700 · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:08 UTC

On 10/19/16 8:10 AM, geoffroy desvernay wrote:
> On 11/17/2015 21:43, Pete Wright wrote:
>>
>>
>> On 11/12/15 09:44, Pete Wright wrote:
>>> Hi All,
>>> Just wanted a sanity check before filing a PR.  I am running r290688 and
>>> am seeing a LOR being triggered in the mpr(4) device:
>>>
>>> $ uname -ar
>>> FreeBSD srd0013 11.0-CURRENT FreeBSD 11.0-CURRENT #1 r290688: Wed Nov 11
>>> 21:28:26 PST 2015     root_at_srd0013:/usr/obj/usr/src/sys/GENERIC  amd64
>>>
>>> <dmesg snip>
>>> lock order reversal:
>>>  1st 0xfffff8000d26bc60 CAM device lock (CAM device lock) _at_
>>> /usr/src/sys/cam/cam_xpt.c:784
>>>  2nd 0xfffffe00012811c0 MPR lock (MPR lock) _at_
>>> /usr/src/sys/cam/cam_xpt.c:2620
>>> KDB: stack backtrace:
>>> db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame
>>> 0xfffffe04608ee890
>>> witness_checkorder() at witness_checkorder+0xe79/frame 0xfffffe04608ee910
>>> __mtx_lock_flags() at __mtx_lock_flags+0xa4/frame 0xfffffe04608ee960
>>> xpt_action_default() at xpt_action_default+0xb6c/frame 0xfffffe04608ee9b0
>>> scsi_scan_bus() at scsi_scan_bus+0x1d5/frame 0xfffffe04608eea20
>>> xpt_scanner_thread() at xpt_scanner_thread+0x15c/frame 0xfffffe04608eea70
>>> fork_exit() at fork_exit+0x84/frame 0xfffffe04608eeab0
>>> fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe04608eeab0
>>> --- trap 0, rip = 0, rsp = 0, rbp = 0 ---
>>> <snip>
>>
>> FWIW I filed the following PR as I can still reproduce this on boot:
>>
>> https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=204614
>>
>> cheers,
>> -pete
>>
> Hi all,
>
> Sorry for cross-posting, let me know where this should go please, I
> didn't figured it out :(
>
> On 11-RELEASE-p1 here (but replying on current_at_ where I found something
> around mpr(4))
>
> Not sure if it's related, but on a fresh new machine with Avago SAS3008
> and a 24 disks enclosure (single attached).
>
> I see a bunch of:
>
> mpr0: Found device <401<SspTarg>,End Device> <12.0Gbps> handle<0x001b>
> enclosureHandle<0x0002> slot 8
> (da0:mpr0:0:8:0): UNMAPPED
> (da0:mpr0:0:8:0): CAM status: SCSI Status Error
> (da0:mpr0:0:8:0): SCSI status: Check Condition
> (da0:mpr0:0:8:0): SCSI sense: ILLEGAL REQUEST asc:20,0 (Invalid command
> operation code)
> (da0:mpr0:0:8:0): Error 22, Unretryable error
> 10:0): UNMAPPED
> (da0:mpr0:0:8:0): READ(10). CDB: 28 00 e8 e0 88 71 00 00 04 00
> (da0:mpr0:0:8:0): CAM status: SCSI Status Error
> (da0:mpr0:0:8:0): SCSI status: Check Condition
> (da0:mpr0:0:8:0): SCSI sense: ILLEGAL REQUEST asc:20,0 (Invalid command
> operation code)
> (da0:mpr0:0:8:0): Error 22, Unretryable error
> ses0: da0: Element descriptor: 'Drive Slot 0'
> ses0: da0: SAS Device Slot Element: 2 Phys at Slot 0
> ses0:  phy 0: SAS device type 1 id 0
> ses0:  phy 0: protocols: Initiator( None ) Target( SSP )
> ses0:  phy 0: parent 520474729974b57f addr 5000c50097ce8215
> ses0:  phy 1: SAS device type 1 id 1
> ses0:  phy 1: protocols: Initiator( None ) Target( SSP )
> ses0:  phy 1: parent 520474729974b5ff addr 5000c50097ce8216
>
> (more complete dmesg.boot here: http://dgeo.perso.ec-m.fr/dmesg.boot )
>

the issue you are seeing is most likely not related to the LOR from the 
original email and PR I filed.  This looks like a media error with the 
disk device on your RAID controller.  A quick google search turn's up 
quite a few threads on this - ranging from bad RAID/JBOD controllers to 
out of date firmware.

Cheers,
-pete

-- 
Pete Wright
pete_at_nomadlogic.org
nomadlogicLA