was: CURRENT [r307305]: r307823 still crashing

From: O. Hartmann <ohartman_at_zedat.fu-berlin.de> Date: Sun, 23 Oct 2016 18:24:36 +0200 · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:08 UTC

Am Sat, 15 Oct 2016 12:13:21 +0200
"O. Hartmann" <ohartman_at_zedat.fu-berlin.de> schrieb:

> Am Sat, 15 Oct 2016 10:22:42 +0200
> "O. Hartmann" <ohartman_at_zedat.fu-berlin.de> schrieb:
> 
> > Am Fri, 14 Oct 2016 10:48:33 +0200
> > "O. Hartmann" <ohartman_at_zedat.fu-berlin.de> schrieb:
> >   
> > > Systems I updated to recent CURRENT start crashing spontaneously.
> > > 
> > > recent crashing system is on
> > > 12.0-CURRENT FreeBSD 12.0-CURRENT #11 r307305: Fri Oct 14 08:37:59 CEST 2016
> > > 
> > > other (no access since it is remote and not accessible until later the day) has
> > > been updated ~ 12 hours ago and it is alos rebooting/crashing without any
> > > warnings. Can be triggered on heavy load.
> > > 
> > > Only system with r307263 and stable so far is an older two-socket XEON
> > > Core2Duao based machine, all crashing boxes have CPUs newer or equal than
> > > IvyBridge.
> > > 
> > > Does anyone also see these crashes? I tried to compile a debug kernel on one
> > > host, but that's the remote machine I have access to later, it failed compiling
> > > the kernel - under load it crashed often. After ZFS scrubbing kickied in, it
> > > vanished from the net ;-/
> > > 
> > > kind regards,
> > > oh
> > > _______________________________________________
> > > freebsd-current_at_freebsd.org mailing list
> > > https://lists.freebsd.org/mailman/listinfo/freebsd-current
> > > To unsubscribe, send any mail to "freebsd-current-unsubscribe_at_freebsd.org"    
> > 
> > Still 307341 is crashing undpredicted ( FreeBSD 12.0-CURRENT #5 r307341: Sat Oct 15
> > 09:36:16 CEST 2016).
> > 
> > I'm back to r307157, which seems to be "stable".
> >   
> 
> Seems, I'm the only one at the moment having those problems :-(
> 
> I now have a laptop avalable and start putting debugging options into the kernel. But
> the laptop, so far, doesn't expose the problems of crashes  described above. The laptop
> is the only system so far without ZFS!
> 
> The most frequent crashing box is a CURRENT server with the largest ZFS volume. When on
> most recent CURRENT (>r307157, see above), starting a scrubbing on a RAIDZ volume with ~
> 12 TB brutto size AND running a poudriere job, triggers the crash every 1 - 18 minutes.
> Another box with only /home as ZFS volume on a dedicated hdd crashes after minutes or
> hours. A laptop, also CURRENT (now at r307349) without ZFS is working stable as long as
> I do not pull the LAN wire (a problem I described also in the list, I try to capture the
> screen when crashing right now).

I spent now the last three days trying to figure out whether my custom config is faulty
or CURRENT has a serious bug. Even with GENERIC and in single user mode (it takes then
longer) CURRENT, now at  r307823, is crashing. The crashes seem to be unrelated to X11,
but I can trigger this crash faster when using firefox. I also can trigger it faster when
doing a "svn update" on a ZFS pool containing /usr/ports. Everyone who uses ZFS
on /usr/src or /usr/ports and updates via subversion knows that over time the update
process takes 10 - 15 minutes on ZFS volumes - compared to several minutes on UFS. And
while svn traverses the folder /usr/ports, the crash occurs.

I'm still wondering about the fact nobody else is facing such a periodically crashing.
The crash is, I already reported this, with CURRENT on several boxes with or without ZFS.

How can I track a memory leak?

How can I write to disk the backtrace given by the debugger when crashing? My box I can
freely test is using the nVidia BLOB and vt(), so I can not see the backtrace. I got a
very bad screenshot on one of my laptops, but its so ugly/unreadable, I think it is
unsuable to be presented within this list at a reasonable size (200 kB max ist too small).