Re: panic: spin lock held too long (while rebooting)

From: John Baldwin <jhb_at_freebsd.org> Date: Mon, 23 Jan 2006 15:51:07 -0500 · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:38:51 UTC

On Saturday 21 January 2006 01:05, Thierry Herbelot wrote:
> Le Wednesday 4 January 2006 14:38, John Baldwin a écrit :
> > On Wednesday 04 January 2006 02:06 am, Thierry Herbelot wrote:
>
> [SNIP previous similar panic]
>
> > Next time you get this, can you use 'show threads' to figure out the tid
> > for the thread whose pointer is in the printf (0xc16de480 in this case)
> > and then do a trace of that thread?
>
> Hello,
>
> Here is a more detailed crash session :
>
> is this (zomb)  problematic ? (in ps) :
>     8 c182e228    0     1     0 0002204 zomb[INACTIVE] g_mirror gm0s1
>
> I keep the machine in DDB, if there are more detailed commands to
> investigate the panic (the machine is an SMP BP6, runs a GENERIC current
> kernel, and stores its local files in two g_mirror partitions).
>
> The problematic spinlock is held by 0xc16de340 which is cpustop_handler.
>
> 	TfH
>
> PS : printout of the crash :
>
> # reboot
> Waiting (max 60 seconds) for system process `vnlru' to stop...done
> Waiting (max 60 seconds) for system process `bufdaemon' to stop...done
> Waiting (max 60 seconds) for system process `syncer' to stop...
> Syncing disks, vnodes remaining...3 2 2 2 0 0 done
> All buffers synced.
> Uptime: 39m52s
> GEOM_MIRROR: Device files1: provider mirror/files1 destroyed.
> GEOM_MIRROR: Device files1 destroyed.
> GEOM_MIRROR: Device gm0s1: provider mirror/gm0s1 destroyed.
> GEOM_MIRROR: Device gm0s1 destroyed.
> Rebooting...
> cpu_reset: Stopping other CPUs
> spin lock sched lock held by 0xc16de340 for > 5 seconds
> panic: spin lock held too long

Ok, it's not a fatal panic in that your disks should already be clean at this 
point, etc.  You can try this hack to see if it fixes it:

Index: vm_machdep.c
===================================================================
RCS file: /usr/cvs/src/sys/i386/i386/vm_machdep.c,v
retrieving revision 1.267
diff -u -r1.267 vm_machdep.c
--- vm_machdep.c        14 Nov 2005 00:43:44 -0000      1.267
+++ vm_machdep.c        23 Jan 2006 20:49:21 -0000
_at__at_ -533,6 +533,7 _at__at_
                ;       /* Wait for other cpu to see that we've started */
        stop_cpus((1<<cpu_reset_proxyid));
        printf("cpu_reset_proxy: Stopped CPU %d\n", cpu_reset_proxyid);
+       disable_intr();
        DELAY(1000000);
        cpu_reset_real();
 }
_at__at_ -581,6 +582,7 _at__at_
                        /* NOTREACHED */
                }

+               disable_intr();
                DELAY(1000000);
        }
 #endif

The better fix is that we really should take CPUs offline more gracefully 
during a shutdown (at least during an orderly shutdown).

-- 
John Baldwin <jhb_at_FreeBSD.org>  <><  http://www.FreeBSD.org/~jhb/
"Power Users Use the Power to Serve"  =  http://www.FreeBSD.org