Re: was: CURRENT [r308087] still crashing: Backtrace provided

From: O. Hartmann <ohartman_at_zedat.fu-berlin.de> Date: Sat, 5 Nov 2016 18:45:09 +0100 · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:08 UTC

Am Sun, 30 Oct 2016 11:25:09 -0700
Mark Johnston <markj_at_FreeBSD.org> schrieb:

> On Sun, Oct 30, 2016 at 06:55:00PM +0100, O. Hartmann wrote:
> > Am Sun, 30 Oct 2016 09:39:34 -0700
> > Mark Johnston <markj_at_FreeBSD.org> schrieb:
> >   
> > > On Sun, Oct 30, 2016 at 08:25:25AM +0100, O. Hartmann wrote:  
> > > > Am Sat, 29 Oct 2016 18:33:45 -0700
> > > > Mark Johnston <markj_at_FreeBSD.org> schrieb:
> > > >     
> > > > > On Sat, Oct 29, 2016 at 04:33:36PM +0200, O. Hartmann wrote:    
> > > > > > Am Sun, 23 Oct 2016 15:18:57 -0400 (EDT)
> > > > > > Benjamin Kaduk <kaduk_at_MIT.EDU> schrieb:
> > > > > >       
> > > > > > > On Sun, 23 Oct 2016, O. Hartmann wrote:
> > > > > > >       
> > > > > > > > How can I track a memory leak?        
> > > > > > > 
> > > > > > > I think I did not read enough of the context, but vmstat and top can track
> > > > > > > memory usage as a general thing.
> > > > > > >       
> > > > > > > > How can I write to disk the backtrace given by the debugger when
> > > > > > > > crashing? My box I can freely test is using the nVidia BLOB and vt(), so
> > > > > > > > I can not see the backtrace. I got a very bad screenshot on one of my
> > > > > > > > laptops, but its so ugly/unreadable, I think it is unsuable to be
> > > > > > > > presented within this list at a reasonable size (200 kB max ist too
> > > > > > > > small).        
> > > > > > > 
> > > > > > > The backtrace should be part of the crash dump that is written to the
> > > > > > > (directly connected, non-encrypted, non-USB) swap device.  "call doadump"
> > > > > > > at the debugger prompt (even typing blind) is supposed to make sure
> > > > > > > there's a dump taken.
> > > > > > > 
> > > > > > > With respect to the screenshot, you should be able to post the image on an
> > > > > > > external site and send a link to the list, at least.
> > > > > > > 
> > > > > > > -Ben
> > > > > > > _______________________________________________
> > > > > > > freebsd-current_at_freebsd.org mailing list
> > > > > > > https://lists.freebsd.org/mailman/listinfo/freebsd-current
> > > > > > > To unsubscribe, send any mail to
> > > > > > > "freebsd-current-unsubscribe_at_freebsd.org"      
> > > > > > 
> > > > > > Hello Benjamin,
> > > > > > 
> > > > > > thank you for your response. Attached, you'll find the backtrace developers
> > > > > > seem to have requested for. It was a bit hard, since FreeBSD, vt() and nVidia
> > > > > > is broken (black or distorted console, on UEFI it is black/locked as long as
> > > > > > the nvidia-modeset,ko module is loaded). I figured out that I could blindly
> > > > > > type "dump" when the box has crashed and resided at the debugger promt.
> > > > > > 
> > > > > > I hope this time I could provide the help to fix this really nasty problem. On
> > > > > > more recent hardware, Haswell and beyond, I was able to run CURRENT even with
> > > > > > ZFS and poudriere on a hard memory pressure without crash within three days.
> > > > > > On older machines, one older Fujitsu dual socket Core2Duo XEON (2x 4 core, 2x
> > > > > > 16 GB RAM banks) as well as two of my private boxes (1x IvyBridge XEON, one
> > > > > > i3-3220, both wit a non-UEFI-working ASROCK Z77 Pro4 board) crash, if FreeBSD
> > > > > > is > r307157. Staying on those systems with r307157 leaves the machine
> > > > > > "rock-solid" - the XEON box last now for a week uptime.       
> > > > > 
> > > > > In kgdb, could you execute:
> > > > > 
> > > > > (kgdb) frame 12
> > > > > (kgdb) p *ifp
> > > > > (kgdb) p *ro
> > > > > 
> > > > > and reply with the output?    
> > > > 
> > > > Besides, is there any way to investigate the crashed vmcore.X files?    
> > > 
> > > Besides examining the state contained in the vmcore? Not really.  
> > 
> > Juts not to misunderstand you (I'm not familiar with debugging!): I can investigate
> > the saved corefiles (vmcore.X) with kgdb? My first attempts failed by simply refering
> > via option -n 0 to the specific vmcore.0 and typing the commands as requested above -
> > the output looked like an error to me.  
> 
> Oh, sorry. Indeed, you should be able to execute
> 
> # kgdb $(sysctl kern.bootfile) vmcore.0
> 
> to open the core with kgdb.
> 
> > 
> > 
> >   
> > > 
> > > Based on the stack trace and affected range of revisions, it may be that
> > > reverting r307887 or r307234 helps, but I have no specific evidence for
> > > this without the requested output.  
> > 
> > I had the crashing also with > r307300 until now, so that leaves me with r307233 ... I
> > will go further with that revision and report so far.   
> 
> Hm, I don't see why this excludes r307887? In any case, r307234 looks to
> be the more likely culprit.

Here I'm again.

This time, it was r308329 or r308331. WITHOUT the debug stuff compiled into the kernel,
it took approximately 5 minutes to provoke the crash. WITH the debug options set, it
took more than 45 minutes to let the system dump the core. I really hope this time we
can fix the problem, this moment, I have put the system back to r307233 to see whether
3072034 is causing the crash as you suspect.

Attached, you'll find the backtrace report as last time. I had to type in "dump" blindly
on the system as a dark screen or a stuck X11 screen blocked the console (I use vt() and
nVidia BLOB with my nVidia GPUs - and this is still broken on FBSD).

Please let me know how I can assist further. I saved both the core AND this time the
culprit kernel.

Kind regards,

Oliver