Re: atomic changes break drm-next-kmod?

From: Niclas Zeising <zeising+freebsd_at_daemonic.se>
Date: Fri, 6 Jul 2018 09:52:24 +0200
On 07/06/18 00:02, Warner Losh wrote:
> 
> 
> On Thu, Jul 5, 2018 at 1:44 PM, John Baldwin <jhb_at_freebsd.org 
> <mailto:jhb_at_freebsd.org>> wrote:
> 
>     On 7/5/18 12:36 PM, Konstantin Belousov wrote:
>      > On Thu, Jul 05, 2018 at 09:12:24PM +0200, Hans Petter Selasky wrote:
>      >> On 07/05/18 20:59, Hans Petter Selasky wrote:
>      >>> On 07/05/18 19:48, Pete Wright wrote:
>      >>>>
>      >>>>
>      >>>> On 07/05/2018 10:10, John Baldwin wrote:
>      >>>>> On 7/3/18 5:10 PM, Pete Wright wrote:
>      >>>>>>
>      >>>>>> On 07/03/2018 15:56, John Baldwin wrote:
>      >>>>>>> On 7/3/18 3:34 PM, Pete Wright wrote:
>      >>>>>>>> On 07/03/2018 15:29, John Baldwin wrote:
>      >>>>>>>>> That seems like kgdb is looking at the wrong CPU.  Can
>     you use
>      >>>>>>>>> 'info threads' and look for threads not stopped in
>     'sched_switch'
>      >>>>>>>>> and get their backtraces?  You could also just do 'thread
>     apply
>      >>>>>>>>> all bt' and put that file at a URL if that is easiest.
>      >>>>>>>>>
>      >>>>>>>> sure thing John - here's a gist of "thread apply all bt"
>      >>>>>>>>
>      >>>>>>>>
>     https://gist.github.com/gem-pete/d8d7ab220dc8781f0827f965f09d43ed
>     <https://gist.github.com/gem-pete/d8d7ab220dc8781f0827f965f09d43ed>
>      >>>>>>> That doesn't look right at all.  Are you sure the kernel
>     matches the
>      >>>>>>> vmcore?  Also, which kgdb version are you using?
>      >>>>>>>
>      >>>>>> yea i agree that doesn't look right at all.  here is my setup:
>      >>>>>>
>      >>>>>> $ which kgdb
>      >>>>>> /usr/bin/kgdb
>      >>>>>> $ kgdb
>      >>>>>> GNU gdb 6.1.1 [FreeBSD]
>      >>>>>> $ ls -lh /var/crash/vmcore.1
>      >>>>>> -rw-------  1 root  wheel   1.6G Jul  3 15:03
>     /var/crash/vmcore.1
>      >>>>>> $ ls -l /usr/lib/debug/boot/kernel/kernel.debug
>      >>>>>> -r-xr-xr-x  1 root  wheel  87840496 Jul  3 13:54
>      >>>>>> /usr/lib/debug/boot/kernel/kernel.debug
>      >>>>>>
>      >>>>>> and i invoke kgdb like so:
>      >>>>>> $ sudo kgdb /usr/lib/debug/boot/kernel/kernel.debug
>     /var/crash/vmcore.1
>      >>>>>>
>      >>>>>> here's a gist of my full gdb session:
>      >>>>>> http://termbin.com/krsn
>      >>>>>>
>      >>>>>> dunno - maybe i have a bad core dump?  regardless, more than
>     happy to
>      >>>>>> help so let me know if i should try anything else or patches
>     etc..
>      >>>>> Can you try installing gdb from ports and using
>     /usr/local/bin/kgdb?
>      >>>>>
>      >>>>
>      >>>> that seems to have done the trick, at least the output looks more
>      >>>> encouraging.
>      >>>>
>      >>>>   --- trap 0, rip = 0, rsp = 0, rbp = 0 ---
>      >>>> KDB: enter: panic
>      >>>>
>      >>>> __curthread () at ./machine/pcpu.h:231
>      >>>> 231        __asm("movq %%gs:%1,%0" : "=r" (td)
>      >>>>
>      >>>>
>      >>>> here's my full kgdb session:
>      >>>> http://termbin.com/qa4f
>      >>>>
>      >>>> i don't see any threads not in "sched_switch" though :(
>      >>>
>      >>> Hi,
>      >>>
>      >>> The problem may be that the patch to enable atomic inlining of all
>      >>> macros forgot to set the SMP keyword which means SMP is not
>     defined at
>      >>> all for KLD's so all non-kernel atomic usage is with MPLOCKED
>     empty!
>      > Problem is that out-of-tree modules build does not have opt*.h files
>      > from the kernel.  UP config is a valid one, flipping some option's
>      > default value does not solve the problem.
> 
>     Yes, but using the lock prefix in a generic module is ok (it will still
>     work, just not quite as fast) whereas the lack of lock is fatal on
>     SMP.  I would amend Hans' patch slightly to honor the opt_* setting
>     for KLD_TIED (but that is only true if KLD_TIED means "built as part of
>     a kernel build, so has valid opt_foo.h headers" and not
>     'a standalone module where someone put MODULES_TIED=1 on the command
>     line
>     to make').
> 
> 
> I agree with this default. It's sensible to default to (a) the most 
> popular thing and (b) thing that always works, especially when (a) and 
> (b) are identical.
> 
> Don't make me start the "Do we really need an SMP option, why not make 
> it always on" thread :) The number of relevant uniprocessor x86 boxes 
> that benefit from omitting SMP is so small as to be irrelevant, IMHO. A 
> MP kernel runs just fine on them...
> 
> Warner

Where are we on this?
It is important to get it fixed, it's already been 4 days, which means 4 
days of all modern FreeBSD desktop systems being broken, and possibly 
other systems with kernel modules from ports as well.


Another question, how hard would it be to expose how the kernel was 
built to modules built from ports, so that they can figure out stuff 
like SMP and others, that might affect the module build?

Regards
-- 
Niclas
Received on Fri Jul 06 2018 - 05:52:29 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:16 UTC