Re: Panic with ataintel and not ready CD on a Dell r710_at_r357958

From: Warner Losh <imp_at_bsdimp.com>
Date: Mon, 17 Feb 2020 13:33:29 -0700
> On Feb 17, 2020, at 1:18 PM, Larry Rosenman <ler_at_lerctr.org> wrote:
> 
> On 02/17/2020 1:46 pm, Larry Rosenman wrote:
>> Unread portion of the kernel message buffer:
>> panic: aprobe1: freed with 1 active CCBs
>> cpuid = 22
>> time = 1581771571
>> KDB: stack backtrace:
>> db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe01fb9a11a0
>> vpanic() at vpanic+0x185/frame 0xfffffe01fb9a1200
>> panic() at panic+0x43/frame 0xfffffe01fb9a1260
>> cam_periph_release_locked_buses() at
>> cam_periph_release_locked_buses+0x372/frame 0xfffffe01fb9a1780
>> cam_periph_release_locked() at cam_periph_release_locked+0x1b/frame
>> 0xfffffe01fb9a17a0
>> probedone() at probedone+0x186/frame 0xfffffe01fb9a1c60
>> xpt_done_process() at xpt_done_process+0x358/frame 0xfffffe01fb9a1ca0
>> xpt_done_td() at xpt_done_td+0xf5/frame 0xfffffe01fb9a1cf0
>> fork_exit() at fork_exit+0x80/frame 0xfffffe01fb9a1d30
>> fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe01fb9a1d30
>> --- trap 0, rip = 0, rsp = 0, rbp = 0 ---
>> Uptime: 1m8s
>> Dumping 6077 out of 131029 MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91%
>> __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55
>> 55              __asm("movq %%gs:%P1,%0" : "=r" (td) : "n"
>> (offsetof(struct pcpu,
>> (kgdb) #0  __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55
>> #1  doadump (textdump=1) at /usr/src/sys/kern/kern_shutdown.c:393
>> #2  0xffffffff804bdf80 in kern_reboot (howto=260)
>>    at /usr/src/sys/kern/kern_shutdown.c:480
>> #3  0xffffffff804be3dd in vpanic (fmt=<optimized out>, ap=<optimized out>)
>>    at /usr/src/sys/kern/kern_shutdown.c:910
>> #4  0xffffffff804be133 in panic (fmt=<unavailable>)
>>    at /usr/src/sys/kern/kern_shutdown.c:836
>> #5  0xffffffff823c5bc2 in camperiphfree (periph=0xfffff80115da2300)
>>    at /usr/src/sys/cam/cam_periph.c:685
>> #6  cam_periph_release_locked_buses (periph=0xfffff80115da2300)
>>    at /usr/src/sys/cam/cam_periph.c:450
>> #7  0xffffffff823c5bfb in cam_periph_release_locked (periph=0xfffff80115da2300)
>>    at /usr/src/sys/cam/cam_periph.c:461
>> #8  0xffffffff8240dce6 in probedone (periph=0xfffff80115da2300,
>>    done_ccb=<optimized out>) at /usr/src/sys/cam/ata/ata_xpt.c:1352
>> #9  0xffffffff823cee08 in xpt_done_process (ccb_h=0xfffff8015013e800)
>>    at /usr/src/sys/cam/cam_xpt.c:5488
>> #10 0xffffffff823d0db5 in xpt_done_td (arg=0xffffffff8243d780 <cam_doneqs+128>)
>>    at /usr/src/sys/cam/cam_xpt.c:5515
>> #11 0xffffffff80483200 in fork_exit (callout=0xffffffff823d0cc0 <xpt_done_td>,
>>    arg=0xffffffff8243d780 <cam_doneqs+128>, frame=0xfffffe01fb9a1d40)
>>    at /usr/src/sys/kern/kern_fork.c:1059
>> #12 <signal handler called>
>> (kgdb)
>> Core IS available as is the kernel
>> I do load the ataintel driver as a module.  Removing it allows me to boot.
>> What info do you all need?
> 
> Forgot to include, the previous working version was r356506

I’ve fixed this in r357969 which reverted r357897.

Looks like you tried 11 revs too soon. The commit message for r357969 says it all:

    The KASSERT is too strict: revert r357897

    It's valid for a periph to be removed with outstanding transactions on the
    device. In CAM, multiple periphs attach to a single device. There's no interlock
    to prevent one of these going away while other periphs have outstanding CCBs and
    it's not an error either. Remove this overly agressive KASSERT to prevent
    false-positive panics when devices depart.

Sorry for the hassle. I’ve been trying to find a way to trap a race that we’re seeing at work sooner, and I thought this was good, but I tested my kernel on a non-invariants tree so thought it was cool, only to discover a little later it wasn’t. :(

Warner
Received on Mon Feb 17 2020 - 19:33:33 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:23 UTC