Re: panic: sbuf_vprintf called with a NULL sbuf pointer

From: Don Lewis <truckman_at_FreeBSD.org> Date: Wed, 2 Dec 2015 13:25:56 -0800 (PST) · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:01 UTC

On  2 Dec, John Baldwin wrote:
> On Sunday, November 29, 2015 04:16:11 PM Don Lewis wrote:
>> I got this strange panic the other day:
>> 
>> 	panic: sbuf_vprintf called with a NULL sbuf pointer
>> 
>> The machine was running this:
>> 
>>   FreeBSD zipper.catspoiler.org 11.0-CURRENT FreeBSD 11.0-CURRENT #31 r290224: Sat Oct 31 02:56:36 PDT 2015     dl_at_zipper.catspoiler.org:/usr/obj/usr/src/sys/GENERIC  amd64
>> 
>> 
>> The trigger was:
>> 
>> 	swap_pager: indefinite wait buffer: bufobj: 0 blkno: 1737153,
>>         size 4096
>> 
>> That triggered an attempt to kill a process and create a core file for
>> it.
>> 
>> Where things get strange is at #16 and #17 in the backtrace:
>> 
>> #13 0xffffffff80a20676 in kassert_panic (
>>     fmt=0xffffffff8137bf84 "%s called with a NULL sbuf pointer")
>>     at /usr/src/sys/kern/kern_shutdown.c:647
>> #14 0xffffffff80a66c51 in _assert_sbuf_integrity (
>>     fun=0xffffffff8137bdf8 "sbuf_vprintf", s=0x0)
>>     at /usr/src/sys/kern/subr_sbuf.c:103
>> #15 0xffffffff80a6742e in sbuf_vprintf (s=0x0, fmt=<optimized out>, 
>>     ap=<optimized out>) at /usr/src/sys/kern/subr_sbuf.c:583
>> #16 sbuf_printf (s=0x0, fmt=0xffffffff80fd8ad3 "%s")
>>     at /usr/src/sys/kern/subr_sbuf.c:677
>> #17 0xffffffff80a237d6 in corefile_open (compress=0, comm=<optimized out>, 
>>     uid=<optimized out>, pid=<optimized out>, td=<optimized out>, 
>>     vpp=<optimized out>, namep=<optimized out>)
>>     at /usr/src/sys/kern/kern_sig.c:3188
>> #18 coredump (td=0x0) at /usr/src/sys/kern/kern_sig.c:3337
>> #19 sigexit (td=0x0, sig=40) at /usr/src/sys/kern/kern_sig.c:2976
>> #20 0xffffffff80a6fcb8 in userret (td=0xfffff8057b0a59a0, 
>>     frame=<optimized out>) at /usr/src/sys/kern/subr_trap.c:147
>> #21 0xffffffff80e68619 in syscallret (td=0xfffff8057b0a59a0, 
>>     error=<optimized out>, sa=<optimized out>)
>>     at /usr/src/sys/amd64/amd64/../../kern/subr_syscall.c:187
>> #22 amd64_syscall (td=0xfffff8057b0a59a0, traced=0)
>>     at /usr/src/sys/amd64/amd64/trap.c:956
>> #23 0xffffffff80e47a3b in Xfast_syscall ()
>>     at /usr/src/sys/amd64/amd64/exception.S:394
>> #24 0x000000080cc9f12a in ?? ()
>> 
>> In #16, s is 0x0, which is strange because kern_sig.c:3188 is:
>> 
>> #17 0xffffffff80a237d6 in corefile_open (compress=0, comm=<optimized out>, 
>>     uid=<optimized out>, pid=<optimized out>, td=<optimized out>, 
>>     vpp=<optimized out>, namep=<optimized out>)
>>     at /usr/src/sys/kern/kern_sig.c:3188
>> 3188					sbuf_printf(&sb, "%s", comm);
>> 
>> I don't see how &sb can be NULL ...
> 
> Agreed on that.
> 
>> Also interesting is:
>> 
>> (kgdb) print sb
>> $1 = <optimized out>
>> 
>> I don't see how sb can be optimized out since its address is passed to
>> sbuf_printf().
> 
> This is just what gdb says when it doesn't think it can find the value
> becuase the regsiter holding it (according to the debug info) has been
> overwritten.  Sometimes if you look in the disassembly you can find another
> copy of the value in another register and figure out the value thay way.
> If you use kgdb710 from the devel/gdb port it sometimes does a better job
> here, but even with it I often run into this.

I am using kgdb710.

>> While format is optimized out, the corefilename isn't and
>> format=corefilename
>> 
>> (kgdb) print format
>> $2 = <optimized out>
>> (kgdb) print corefilename
>> $3 = "%N.core", '\000' <repeats 1016 times>
>> 
>> Since this is the first iteration of the loop, and the crash happens on
>> the entry into sbuf_printf()->sbuf_vprintf(), it doesn't look like it
>> could be caused by stack smash.
>> 
>> Under other circumstances I've seen other processes dump core, and I can
>> trigger core dumps by sending SIGQUIT to processes ...
>> 
>> %sleep 10 &
>> [1] 2456
>> %kill -QUIT %1
>> %
>> [1]    Quit                          sleep 10 (core dumped)
>> 
>> Another other oddity is td=0x0 in frames 18 and 19.  I'm guessing that
>> kgdb is wrong about this because that should cause a NULL pointer
>> dereference panic in sigexit().
> 
> Probably the saved register on the stack that gdb thinks td is saved in has
> been overwritten (so this could be a buffer overrun onto the stack after
> all).
> 
>> Also sig=40 in sigexit() is strange since the highest numbered signal is
>> 33.
>> 
>> Lastly, I don't see how we get from userret() to sigexit().  Line 147 in
>> subr_trap.c (userret()) is:
>>          WITNESS_WARN(WARN_PANIC, NULL, "userret: returning");
> 
> I suspect there is some tail-call optimization or some such so that the function
> userret() called branched go sigexit() instead of calling it (since it is marked
> __dead2, meaning it never returns).  kgdb710 might cope with this better (it
> constructs "fake" frames for inlined functions for example).
> 
> If you want to look at this further, try going to frame 16 and dissassembling the
> instructions before the call to see if you can spot which register the first
> parameter (saved in %rdi IIRC) comes from.

Dump of assembler code for function sbuf_printf:
   0xffffffff80a673e0 <+0>:	push   %rbp
   0xffffffff80a673e1 <+1>:	mov    %rsp,%rbp
   0xffffffff80a673e4 <+4>:	push   %r14
   0xffffffff80a673e6 <+6>:	push   %rbx
   0xffffffff80a673e7 <+7>:	sub    $0x50,%rsp
   0xffffffff80a673eb <+11>:	mov    %rsi,%r14
   0xffffffff80a673ee <+14>:	mov    %rdi,%rbx
   0xffffffff80a673f1 <+17>:	mov    %r9,-0x38(%rbp)
   0xffffffff80a673f5 <+21>:	mov    %r8,-0x40(%rbp)
   0xffffffff80a673f9 <+25>:	mov    %rcx,-0x48(%rbp)
   0xffffffff80a673fd <+29>:	mov    %rdx,-0x50(%rbp)
   0xffffffff80a67401 <+33>:	lea    -0x60(%rbp),%rax
   0xffffffff80a67405 <+37>:	mov    %rax,-0x20(%rbp)
   0xffffffff80a67409 <+41>:	lea    0x10(%rbp),%rax
   0xffffffff80a6740d <+45>:	mov    %rax,-0x28(%rbp)
   0xffffffff80a67411 <+49>:	movl   $0x30,-0x2c(%rbp)
   0xffffffff80a67418 <+56>:	movl   $0x10,-0x30(%rbp)
   0xffffffff80a6741f <+63>:	mov    $0xffffffff8137bdf8,%rdi
   0xffffffff80a67426 <+70>:	mov    %rbx,%rsi
   0xffffffff80a67429 <+73>:	callq  0xffffffff80a66c00 <_assert_sbuf_integrity>

   0xffffffff80a237b9 <+825>:	jmpq   0xffffffff80a236fd <sigexit+637>
   0xffffffff80a237be <+830>:	mov    $0xffffffff80fd8ad3,%rsi
   0xffffffff80a237c5 <+837>:	xor    %eax,%eax
   0xffffffff80a237c7 <+839>:	mov    %r12,%rdi
   0xffffffff80a237ca <+842>:	mov    -0x228(%rbp),%rdx
   0xffffffff80a237d1 <+849>:	callq  0xffffffff80a673e0 <sbuf_printf>
=> 0xffffffff80a237d6 <+854>:	inc    %r14d
   0xffffffff80a237d9 <+857>:	jmpq   0xffffffff80a236fd <sigexit+637>