Re: panic after resume, triggered by vt_switch_timer() ?

From: John Baldwin <jhb_at_freebsd.org> Date: Mon, 10 Mar 2014 14:35:50 -0400 · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:47 UTC

On Sunday, March 09, 2014 9:39:36 pm Glen Barber wrote:
> I think I have hit a panic in the code path for vt_switch_timer() when
> resuming from suspend.
> 
> In both cases it happened, the laptop was suspended for >2 hours.
> 
> In both cases, c_func is vt_switch_timer(), and c_arg is a negative
> value (both times it is -2133611368).  I am not sure if these are valid
> values for vt_switch_timer(), but it caught my eye.
> 
> The amount of time the laptop has been suspended may be unrelated, but
> so far has been the only constant in numerous attempts to reproduce the
> crash.
> 
> The machine is running 11.0-CURRENT #202 r262562, and the only recent
> change to the kernel configuration is switching from sc(4) to vt(4)
> a few weeks ago.  Prior to this, I could leave the machine suspended for
> several hours (sometimes up to 4 when traveling), without issue.
> 
> 
> Script started on Sun Mar  9 20:22:44 2014
> command: /bin/sh
> # kgdb ./kernel.debug /var/crash/vmcore.last
> GNU gdb 6.1.1 [FreeBSD]
> Copyright 2004 Free Software Foundation, Inc.
> GDB is free software, covered by the GNU General Public License, and you are
> welcome to change it and/or distribute copies of it under certain conditions.
> Type "show copying" to see the conditions.
> There is absolutely no warranty for GDB.  Type "show warranty" for details.
> This GDB was configured as "amd64-marcel-freebsd"...
> 
> Unread portion of the kernel message buffer:
> 
> 
> Fatal trap 12: page fault while in kernel mode
> cpuid = 0; apic id = 00
> fault virtual address   = 0x30
> fault code              = supervisor read data, page not present
> instruction pointer     = 0x20:0xffffffff8064378f
> stack pointer           = 0x28:0xfffffe01e47b98f0
> frame pointer           = 0x28:0xfffffe01e47b99b0
> code segment            = base 0x0, limit 0xfffff, type 0x1b
>                         = DPL 0, pres 1, long 1, def32 0, gran 1
> processor eflags        = interrupt enabled, resume, IOPL = 0
> current process         = 12 (swi4: clock (0))
> trap number             = 12
> panic: page fault
> cpuid = 0
> KDB: stack backtrace:
> #0 0xffffffff8066c280 at kdb_backtrace+0x60
> #1 0xffffffff8062bc65 at panic+0x155
> #2 0xffffffff8095a78f at trap_fatal+0x38f
> #3 0xffffffff8095aac1 at trap_pfault+0x321
> #4 0xffffffff8095a160 at trap+0x4a0
> #5 0xffffffff8093f682 at calltrap+0x8
> #6 0xffffffff80643b94 at softclock+0x94
> #7 0xffffffff805f6aa8 at intr_event_execute_handlers+0x1b8
> #8 0xffffffff805f6ea6 at ithread_loop+0x96
> #9 0xffffffff805f3c9a at fork_exit+0x9a
> #10 0xffffffff8093fbbe at fork_trampoline+0xe
> Uptime: 6h12m9s
> Dumping 3121 out of 7951 MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91%
> 
> Reading symbols from /boot/kernel/zfs.ko.symbols...done.
> Loaded symbols for /boot/kernel/zfs.ko.symbols
> Reading symbols from /boot/kernel/opensolaris.ko.symbols...done.
> Loaded symbols for /boot/kernel/opensolaris.ko.symbols
> Reading symbols from /boot/kernel/i915kms.ko.symbols...done.
> Loaded symbols for /boot/kernel/i915kms.ko.symbols
> Reading symbols from /boot/kernel/drm2.ko.symbols...done.
> Loaded symbols for /boot/kernel/drm2.ko.symbols
> #0  doadump (textdump=<value optimized out>) at pcpu.h:219
> 219             __asm("movq %%gs:%1,%0" : "=r" (td)
> (kgdb) bt
> #0  doadump (textdump=<value optimized out>) at pcpu.h:219
> #1  0xffffffff8062b808 in kern_reboot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:452
> #2  0xffffffff8062bca4 in panic (fmt=<value optimized out>)
>     at /usr/src/sys/kern/kern_shutdown.c:759
> #3  0xffffffff8095a78f in trap_fatal (frame=<value optimized out>, eva=<value optimized out>)
>     at /usr/src/sys/amd64/amd64/trap.c:874
> #4  0xffffffff8095aac1 in trap_pfault (frame=0xfffffe01e47b9840, 
>     usermode=<value optimized out>) at /usr/src/sys/amd64/amd64/trap.c:691
> #5  0xffffffff8095a160 in trap (frame=0xfffffe01e47b9840)
>     at /usr/src/sys/amd64/amd64/trap.c:455
> #6  0xffffffff8093f682 in calltrap () at /usr/src/sys/amd64/amd64/exception.S:231
> #7  0xffffffff8064378f in softclock_call_cc (c=0xffffffff80d3ae78, cc=0xffffffff80eace80, 
>     direct=0) at /usr/src/sys/kern/kern_timeout.c:703
> #8  0xffffffff80643b94 in softclock (arg=0xffffffff80eace80)
>     at /usr/src/sys/kern/kern_timeout.c:812
> #9  0xffffffff805f6aa8 in intr_event_execute_handlers (p=<value optimized out>, 
>     ie=0xfffff800031f3a00) at /usr/src/sys/kern/kern_intr.c:1263
> #10 0xffffffff805f6ea6 in ithread_loop (arg=0xfffff8000323cf80)
>     at /usr/src/sys/kern/kern_intr.c:1276
> #11 0xffffffff805f3c9a in fork_exit (callout=0xffffffff805f6e10 <ithread_loop>, 
>     arg=0xfffff8000323cf80, frame=0xfffffe01e47b9ac0) at /usr/src/sys/kern/kern_fork.c:977
> #12 0xffffffff8093fbbe in fork_trampoline () at /usr/src/sys/amd64/amd64/exception.S:605
> #13 0x0000000000000000 in ?? ()
> Current language:  auto; currently minimal
> (kgdb) frame 7
> #7  0xffffffff8064378f in softclock_call_cc (c=0xffffffff80d3ae78, cc=0xffffffff80eace80, 
>     direct=0) at /usr/src/sys/kern/kern_timeout.c:703
> 703                     class->lc_unlock(c_lock);
> (kgdb) l
> 698                     lastfunc = c_func;
> 699             }
> 700     #endif
> 701             CTR1(KTR_CALLOUT, "callout %p finished", c);
> 702             if ((c_flags & CALLOUT_RETURNUNLOCKED) == 0)
> 703                     class->lc_unlock(c_lock);
> 704     skip:
> 705             CC_LOCK(cc);
> 706             KASSERT(cc->cc_exec_entity[direct].cc_curr == c, ("mishandled cc_curr"));
> 707             cc->cc_exec_entity[direct].cc_curr = NULL;
> (kgdb) p *c
> $1 = {c_links = {le = {le_next = 0x0, le_prev = 0xffffffff80eacf90}, sle = {sle_next = 0x0}, 
>     tqe = {tqe_next = 0x0, tqe_prev = 0xffffffff80eacf90}}, c_time = 95902818571375, 
>   c_precision = 4026531562, c_arg = 0xffffffff80d3adc0, 
>   c_func = 0xffffffff80510850 <vt_switch_timer>, c_lock = 0x0, c_flags = 128, c_cpu = 0}
> (kgdb) l *0xffffffff80510850
> 0xffffffff80510850 is in vt_switch_timer (/usr/src/sys/dev/vt/vt_core.c:150).
> 145     static int vt_proc_alive(struct vt_window *);
> 146     static void vt_resize(struct vt_device *);
> 147     
> 148     static void
> 149     vt_switch_timer(void *arg)
> 150     {
> 151     
> 152             vt_late_window_switch((struct vt_window *)arg);
> 153     }
> 154     
> (kgdb) p *0xffffffff80d3adc0
> $2 = -2133611368
> (kgdb) quit
> # ^D
> Script done on Sun Mar  9 20:27:43 2014
> 
> Thoughts?

This has been reported multiple times by both adrian_at_ and myself.  I would try
disabling vt switching during suspend as a workaround for now.  I found that
only ttyv0's timer object is zero'd when this happens.  The timers for the
other VTY's are all fine.

-- 
John Baldwin