Re: atomic changes break drm-next-kmod?

From: Pete Wright <pete_at_nomadlogic.org>
Date: Sat, 7 Jul 2018 19:54:59 -0700
On 07/06/2018 03:15, Hans Petter Selasky wrote:
> On 07/06/18 11:14, Johannes Lundberg wrote:
>> On Fri, Jul 6, 2018 at 9:49 AM Konstantin Belousov <kostikbel_at_gmail.com>
>> wrote:
>>
>>> On Fri, Jul 06, 2018 at 09:52:24AM +0200, Niclas Zeising wrote:
>>>> On 07/06/18 00:02, Warner Losh wrote:
>>>>>
>>>>>
>>>>> On Thu, Jul 5, 2018 at 1:44 PM, John Baldwin <jhb_at_freebsd.org
>>>>> <mailto:jhb_at_freebsd.org>> wrote:
>>>>>
>>>>>      On 7/5/18 12:36 PM, Konstantin Belousov wrote:
>>>>>       > On Thu, Jul 05, 2018 at 09:12:24PM +0200, Hans Petter Selasky
>>> wrote:
>>>>>       >> On 07/05/18 20:59, Hans Petter Selasky wrote:
>>>>>       >>> On 07/05/18 19:48, Pete Wright wrote:
>>>>>       >>>>
>>>>>       >>>>
>>>>>       >>>> On 07/05/2018 10:10, John Baldwin wrote:
>>>>>       >>>>> On 7/3/18 5:10 PM, Pete Wright wrote:
>>>>>       >>>>>>
>>>>>       >>>>>> On 07/03/2018 15:56, John Baldwin wrote:
>>>>>       >>>>>>> On 7/3/18 3:34 PM, Pete Wright wrote:
>>>>>       >>>>>>>> On 07/03/2018 15:29, John Baldwin wrote:
>>>>>       >>>>>>>>> That seems like kgdb is looking at the wrong CPU.  
>>>>> Can
>>>>>      you use
>>>>>       >>>>>>>>> 'info threads' and look for threads not stopped in
>>>>>      'sched_switch'
>>>>>       >>>>>>>>> and get their backtraces?  You could also just do
>>> 'thread
>>>>>      apply
>>>>>       >>>>>>>>> all bt' and put that file at a URL if that is 
>>>>> easiest.
>>>>>       >>>>>>>>>
>>>>>       >>>>>>>> sure thing John - here's a gist of "thread apply 
>>>>> all bt"
>>>>>       >>>>>>>>
>>>>>       >>>>>>>>
>>>>> https://gist.github.com/gem-pete/d8d7ab220dc8781f0827f965f09d43ed
>>>>> <https://gist.github.com/gem-pete/d8d7ab220dc8781f0827f965f09d43ed
>>>>
>>>>>       >>>>>>> That doesn't look right at all.  Are you sure the 
>>>>> kernel
>>>>>      matches the
>>>>>       >>>>>>> vmcore?  Also, which kgdb version are you using?
>>>>>       >>>>>>>
>>>>>       >>>>>> yea i agree that doesn't look right at all.  here is my
>>> setup:
>>>>>       >>>>>>
>>>>>       >>>>>> $ which kgdb
>>>>>       >>>>>> /usr/bin/kgdb
>>>>>       >>>>>> $ kgdb
>>>>>       >>>>>> GNU gdb 6.1.1 [FreeBSD]
>>>>>       >>>>>> $ ls -lh /var/crash/vmcore.1
>>>>>       >>>>>> -rw-------  1 root  wheel 1.6G Jul  3 15:03
>>>>>      /var/crash/vmcore.1
>>>>>       >>>>>> $ ls -l /usr/lib/debug/boot/kernel/kernel.debug
>>>>>       >>>>>> -r-xr-xr-x  1 root  wheel 87840496 Jul  3 13:54
>>>>>       >>>>>> /usr/lib/debug/boot/kernel/kernel.debug
>>>>>       >>>>>>
>>>>>       >>>>>> and i invoke kgdb like so:
>>>>>       >>>>>> $ sudo kgdb /usr/lib/debug/boot/kernel/kernel.debug
>>>>>      /var/crash/vmcore.1
>>>>>       >>>>>>
>>>>>       >>>>>> here's a gist of my full gdb session:
>>>>>       >>>>>> http://termbin.com/krsn
>>>>>       >>>>>>
>>>>>       >>>>>> dunno - maybe i have a bad core dump?  regardless, more
>>> than
>>>>>      happy to
>>>>>       >>>>>> help so let me know if i should try anything else or
>>> patches
>>>>>      etc..
>>>>>       >>>>> Can you try installing gdb from ports and using
>>>>>      /usr/local/bin/kgdb?
>>>>>       >>>>>
>>>>>       >>>>
>>>>>       >>>> that seems to have done the trick, at least the output 
>>>>> looks
>>> more
>>>>>       >>>> encouraging.
>>>>>       >>>>
>>>>>       >>>>   --- trap 0, rip = 0, rsp = 0, rbp = 0 ---
>>>>>       >>>> KDB: enter: panic
>>>>>       >>>>
>>>>>       >>>> __curthread () at ./machine/pcpu.h:231
>>>>>       >>>> 231        __asm("movq %%gs:%1,%0" : "=r" (td)
>>>>>       >>>>
>>>>>       >>>>
>>>>>       >>>> here's my full kgdb session:
>>>>>       >>>> http://termbin.com/qa4f
>>>>>       >>>>
>>>>>       >>>> i don't see any threads not in "sched_switch" though :(
>>>>>       >>>
>>>>>       >>> Hi,
>>>>>       >>>
>>>>>       >>> The problem may be that the patch to enable atomic inlining
>>> of all
>>>>>       >>> macros forgot to set the SMP keyword which means SMP is not
>>>>>      defined at
>>>>>       >>> all for KLD's so all non-kernel atomic usage is with 
>>>>> MPLOCKED
>>>>>      empty!
>>>>>       > Problem is that out-of-tree modules build does not have 
>>>>> opt*.h
>>> files
>>>>>       > from the kernel.  UP config is a valid one, flipping some
>>> option's
>>>>>       > default value does not solve the problem.
>>>>>
>>>>>      Yes, but using the lock prefix in a generic module is ok (it 
>>>>> will
>>> still
>>>>>      work, just not quite as fast) whereas the lack of lock is 
>>>>> fatal on
>>>>>      SMP.  I would amend Hans' patch slightly to honor the opt_* 
>>>>> setting
>>>>>      for KLD_TIED (but that is only true if KLD_TIED means "built as
>>> part of
>>>>>      a kernel build, so has valid opt_foo.h headers" and not
>>>>>      'a standalone module where someone put MODULES_TIED=1 on the
>>> command
>>>>>      line
>>>>>      to make').
>>>>>
>>>>>
>>>>> I agree with this default. It's sensible to default to (a) the most
>>>>> popular thing and (b) thing that always works, especially when (a) 
>>>>> and
>>>>> (b) are identical.
>>>>>
>>>>> Don't make me start the "Do we really need an SMP option, why not 
>>>>> make
>>>>> it always on" thread :) The number of relevant uniprocessor x86 boxes
>>>>> that benefit from omitting SMP is so small as to be irrelevant, IMHO.
>>> A
>>>>> MP kernel runs just fine on them...
>>>>>
>>>>> Warner
>>>>
>>>> Where are we on this?
>>>> It is important to get it fixed, it's already been 4 days, which 
>>>> means 4
>>>> days of all modern FreeBSD desktop systems being broken, and possibly
>>>> other systems with kernel modules from ports as well.
>>>>
>>>>
>>>> Another question, how hard would it be to expose how the kernel was
>>>> built to modules built from ports, so that they can figure out stuff
>>>> like SMP and others, that might affect the module build?
>>> Point the KERNBUILDDIR variable to the directory of the kernel build.
>>> This is the directory where *.o and opt*.h are located.  Then 
>>> everything
>>> would just work.
>>>
>>
>> Is the solution that we require everyone to build a kernel before 
>> they can
>> build the standalone modules or am I missing something here?
>>
>
> Hi,
>
> Here is a temporary fix:
> https://svnweb.freebsd.org/changeset/base/336025
>
> Like Konstantin says this issue needs to be revisited.
>

this patch has been stable for me for a couple days now after rebuilding 
drm-next under the new kernel containing this update.  we may want to 
kick-off an update of the drm-next pkg if that hasn't happened already.  
the old package caused periodic kernel-panics on my end.

cheers,
-pete

-- 
Pete Wright
pete_at_nomadlogic.org
_at_nomadlogicLA
Received on Sun Jul 08 2018 - 00:55:08 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:16 UTC