Re: couple of nvidia-driver issues

From: Aaron Plattner <aplattner_at_nvidia.com>
Date: Thu, 7 Dec 2017 08:00:40 -0800
On 12/07/2017 07:35 AM, Alan Somers wrote:
> On Thu, Dec 7, 2017 at 2:33 AM, Andriy Gapon <avg_at_freebsd.org 
> <mailto:avg_at_freebsd.org>> wrote:
> 
> 
>     [cc-ing current_at_ to raise more awareness]
> 
>     On 05/12/2017 16:03, Alexey Dokuchaev wrote:
>      > On Fri, Nov 24, 2017 at 11:31:51AM +0200, Andriy Gapon wrote:
>      >>
>      >> I have reported a couple of nvidia-driver issues in the FreeBSD
>     section
>      >> of the nVidia developer forum, but no replies so far.
>      >>
>      >> Well, the first issue is not with the driver, but with a utility
>     that
>      >> comes with it, nvidia-smi:
>      >>
>     https://devtalk.nvidia.com/default/topic/1026589/freebsd/nvidia-smi-query-gpu-spins-forever-on-freebsd-head-amd64-/
>     <https://devtalk.nvidia.com/default/topic/1026589/freebsd/nvidia-smi-query-gpu-spins-forever-on-freebsd-head-amd64-/>
>      >> I wonder if I am the only one affected or if I see the problem
>     because
>      >> I am on head or something else.
>      >> I am pretty sure that the problem is caused by a programming bug
>     related
>      >> to strtok_r.
>      >
>      > I'll try to reproduce it and report back.
> 
>     I've done some work with a debugger and it seems that there is code
>     that does
>     something like this:
> 
>     char *last = NULL;
> 
>     while (1) {
>              if (last == NULL)
>                      p = strtok_r(str, sep, &last);
>              else
>                      p = strtok_r(NULL, sep, &last);
>              if (p == NULL)
>                      break;
>              ...
>     }
> 
>     The problem is that when 'p' points to the last token, 'last' is
>     NULL (in
>     FreeBSD implementation of strtok_r).  That means that when we go to
>     the next
>     iteration the parsing starts all over again leading to the endless loop.
>     The code is incorrect from the standards point of view, because the
>     value of
>     'last' is completely opaque and should not be used for anything else
>     but passing
>     it back to strtok_r.
> 
>     I used gdb -w to change the logic to:
> 
>     char *last = 1;
> 
>     While (1) {
>              if (last == 1)
>                      p = strtok_r(str, sep, &last);
>              else
>                      p = strtok_r(NULL, sep, &last);
>              ...
>     }
> 
>     Where 1 is used as an "impossible" pointer value which is neither
>     NULL nor a
>     valid pointer that can be set by strtok_r.  It's not ideal, but
>     binary code
>     editing is not as easy as that of source code.
> 
>     The binary patch is here:
>     https://people.freebsd.org/~avg/nvidia-smi.bsdiff
>     <https://people.freebsd.org/~avg/nvidia-smi.bsdiff>
> 
>      >> The second issue is with the FreeBSD support for the kernel driver:
>      >>
>     https://devtalk.nvidia.com/default/topic/1026645/freebsd/panic-related-to-nvkms_timers-lock-sx-lock-/
>     <https://devtalk.nvidia.com/default/topic/1026645/freebsd/panic-related-to-nvkms_timers-lock-sx-lock-/>
>      >> I would like to get some feedback on my analysis.
>      >> I am testing this patch right now:
>      >>
>     https://people.freebsd.org/~avg/extra-patch-src_nvidia-modeset_nvidia-modeset-freebsd.c
>     <https://people.freebsd.org/~avg/extra-patch-src_nvidia-modeset_nvidia-modeset-freebsd.c>
>      >
>      > Unfortunately, I'm not an expert on kernel locking primitives to
>     give you
>      > a proper review, let's see what others have to say.
> 
>     It's been a while since I posted the patch and there are no comments
>     yet.
>     I can only add that I am running an INVARIANTS and WITNESS enabled
>     kernel all
>     the time and before the patch I was getting kernel panics every now
>     and then.
>     Since I started using the patch I haven't had a single nvidia panic yet.
> 
>      >> Also, what's the best place or who are the best people with whom to
>      >> discuss such issues?
>      >
>      > Yes, this is a problem now: since Christian Zander had left
>     nVidia, he
>      > could not tell me who'd be their next liaison to talk to from FreeBSD
>      > community. :-(
> 
>     Oh, I didn't know about Christian's departure.
>     So, we are not in a very good position now.
> 
> 
> How about Aaron Plattner (CC'd).  Aaron, are you still working on 
> FreeBSD driver issues?

Thanks for the heads up, Alan. I filed bug 2032249 to track this.

-- Aaron
Received on Thu Dec 07 2017 - 15:05:49 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:14 UTC