Re: GPF on boot with devmatch

From: Warner Losh <imp_at_bsdimp.com>
Date: Mon, 12 Oct 2020 12:13:21 -0600
On Mon, Oct 5, 2020 at 3:39 PM Alexander Motin <mav_at_freebsd.org> wrote:

> On 05.10.2020 17:20, Warner Losh wrote:
> > On Mon, Oct 5, 2020 at 12:36 PM Alexander Motin <mav_at_freebsd.org
> > <mailto:mav_at_freebsd.org>> wrote:
> >
> >     I can add that we've received report about identical panic on FreeBSD
> >     releng/12.2 of r365436, AKA TrueNAS 12.0-RC1:
> >     https://jira.ixsystems.com/browse/NAS-107578 .  So it looks a)
> pretty
> >     rate (one report from thousands of early adopters and none in our
> lab),
> >     and b) it is in stable/12 too, not only head.
> >
> > Thanks! I'll see if I can recreate here....  But we're accessing the
> > sysctl tree from devmatch to get some information, which should always
> > be OK (the fact that it isn't suggests either a bug in some driver
> > leaving bad pointers, or some race or both)...  It would be nice to know
> > which nodes they were, or to have a kernel panic I can look at...
>
> All we have now in this case is a screenshot you may see in the ticket.
>  Also previously the same user on some earlier version of stable/12
> reported other very weird panics on process lock being dropped where it
> can't be in some other sysctls inside kern.proc, so if we guess those
> are related, I suspect there may be some kind of memory corruption
> happening, but have no clue where.  Unfortunately we have only textdumps
> for those.  So if Xin is able to reproduce it locally, it may be our
> best chance to debug it, at least this specific issue.
>

That's totally weird.

Xin Li's traceback lead to code I just rewrote in current, while this code
leads to code that's been there for a long time and hasn't been MFC'd. This
suggests that Xin Li's backtrace isn't to be trusted, or there's two issues
at play. Both are plausible. I've fixed a minor signedness bug and a
possible one byte overflow that might have happened in the code I just
rewrote. But I suspect this is due to something else related to how
children are handled after we've raced. Maybe there's something special
about how USB does things, because other buses will create the child early
and the child list is stable. If USB's discovery code is adding something
and is racing with devd's walking of the tree, that might explain it...  It
would be nice if there were some way to provoke the race on a system I
could get a core from for deeper analysis....

Warner


> --
> Alexander Motin
>
Received on Mon Oct 12 2020 - 16:13:34 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:25 UTC