Re: sysctl lock, system lockup

From: Tim Robbins <tim_at_robbins.dropbear.id.au>
Date: Mon, 31 May 2004 14:22:34 +1000
On Sun, May 30, 2004 at 11:31:14PM -0400, Don Bowman wrote:
> From: Tim Robbins [mailto:tim_at_robbins.dropbear.id.au]
> > On Sun, May 30, 2004 at 10:18:34PM -0400, Don Bowman wrote:
> > > From: Tim Robbins [mailto:tim_at_robbins.dropbear.id.au]
> > > > On Sun, May 30, 2004 at 04:35:55PM -0400, Don Bowman wrote:
> > > > > From: Don Bowman [mailto:don_at_sandvine.com]
> > > > > > On the console i ran 'top', but it wouldn't start,
> > > > > > giving:
> > > > > > 
> > > > > > load: 0.00  cmd: top 4282 [sysctl lock] 0.00u 0.00s 0% 180k
> > > > > > 
> > > > > > as the status. I can't ^C it, can't ssh in.
> > > > > > can still ping the device.
> > > > > > 
> > > > > > It was doing a backgound fsck from an earlier hang.
> > > > > > 
> > > > > > i have called panic from db, not sure if the core will
> > > > > > work properly or not.
> > > > > 
> > > > > As a followup... i did get a vmcore, and matching kernel.debug,
> > > > > if someone can suggest what i might look _at_?
> > > > 
> > > > print sysctllock (or just sysctllock.sx_xholder if you 
> > don't have a
> > > > serial console set up.)
> > > 
> > > (kgdb) print sysctllock
> > > $1 = {sx_object = {lo_class = 0xc070dacc, lo_name = 
> > 0xc06ce43d "sysctl
> > > lock", 
> > >     lo_type = 0xc06ce43d "sysctl lock", lo_flags = 3866624, 
> > lo_list = {
> > >       tqe_next = 0xc074f9e0, tqe_prev = 0xc0747ab0}, lo_witness =
> > > 0xc0751410}, 
> > >   sx_lock = 0xc0748e80, sx_cnt = -1, sx_shrd_cv = {
> > >     cv_description = 0xc06ce43d "sysctl lock", cv_waiters = 0}, 
> > >   sx_shrd_wcnt = 0, sx_excl_cv = {cv_description = 
> > 0xc06ce43d "sysctl lock",
> > > 
> > >     cv_waiters = 9}, sx_excl_wcnt = 9, sx_xholder = 0xc8ee2150}
> > 
> > Hmm. How about the value of sysctllock.sx_xholder->td_proc? 
> > Then, if possible,
> > switch to that process (with gdb's proc command) and try to 
> > get a backtrace.
> > (I admit to not having used this feature recently; I'm not 
> > completely sure
> > that it still works. You may need to pass it a thread pointer 
> > instead.)
> 
> 
> (kgdb) p sysctllock.sx_xholder->td_proc
> $1 = (struct proc *) 0xc8eddc08
> (kgdb) proc 0xc8eddc08
> (kgdb) bt
> #0  0xc0550340 in sched_switch (td=0xc8ee2150)
>     at /usr/src/sys/kern/sched_4bsd.c:666
> #1  0xc0545dfe in mi_switch (flags=1945947512)
>     at /usr/src/sys/kern/kern_synch.c:359
> #2  0xc055d382 in sleepq_switch (wchan=0x0)
>     at /usr/src/sys/kern/subr_sleepqueue.c:374
> #3  0xc055d53f in sleepq_wait (wchan=0xe15dbc28)
>     at /usr/src/sys/kern/subr_sleepqueue.c:478
> #4  0xc0545ac6 in msleep (ident=0xe15dbc28, mtx=0xc0774a00, priority=76, 
>     wmesg=0xc06d4ad5 "biord", timo=0) at /usr/src/sys/kern/kern_synch.c:250
> #5  0xc058193f in bwait (bp=0xe15dbc28, pri=76 'L', wchan=0xc06d4ad5
> "biord")
>     at /usr/src/sys/kern/vfs_bio.c:3766
> #6  0xc0580525 in bufwait (bp=0xe15dbc28) at
> /usr/src/sys/kern/vfs_bio.c:3048
> #7  0xc057c9be in breadn (vp=0xc937ba28, blkno=-18688012, size=16384, 
>     rablkno=0x0, rabsize=0x0, cnt=0, cred=0x0, bpp=0x0)
>     at /usr/src/sys/kern/vfs_bio.c:749
> #8  0xc057c724 in bread (vp=0xc937ba28, blkno=-18688012, size=16384,
> cred=0x0, 
>     bpp=0xf835e9d8) at /usr/src/sys/kern/vfs_bio.c:684
> #9  0xc061ab93 in ffs_balloc_ufs2 (vp=0xc937ba28, startoffset=0, size=16384,
> 
>     cred=0xc53d5180, flags=131072, bpp=0xf835eadc)
>     at /usr/src/sys/ufs/ffs/ffs_balloc.c:702
> #10 0xc0621191 in ffs_snapremove (vp=0xc937ba28)
>     at /usr/src/sys/ufs/ffs/ffs_snapshot.c:1463
> #11 0xc0626a70 in softdep_releasefile (ip=0xc9309460)
>     at /usr/src/sys/ufs/ffs/ffs_softdep.c:3266
> #12 0xc063303d in ufs_inactive (ap=0x0) at
> /usr/src/sys/ufs/ufs/ufs_inode.c:88
> #13 0xc063a21f in ufs_vnoperate (ap=0x0)
>     at /usr/src/sys/ufs/ufs/ufs_vnops.c:2819
> #14 0xc058c60e in vput (vp=0xc937ba28) at vnode_if.h:953
> #15 0xc0618992 in sysctl_ffs_fsck (oidp=0x0, arg1=0xf835ec90, arg2=0,
> req=0x0)
>     at /usr/src/sys/ufs/ffs/ffs_alloc.c:2292
> #16 0xc0547553 in sysctl_root (oidp=0x0, arg1=0xf835ec90, arg2=0, 
>     req=0xf835ec08) at /usr/src/sys/kern/kern_sysctl.c:1220
> #17 0xc0547714 in userland_sysctl (td=0x0, name=0xf835ec84, namelen=3, 
>     old=0xf835ec08, oldlenp=0x0, inkernel=0, new=0x8059f00, newlen=0, 
>     retval=0xf835ec80) at /usr/src/sys/kern/kern_sysctl.c:1317
> #18 0xc05475d5 in __sysctl (td=0xc8ee2150, uap=0xf835ed14)
>     at /usr/src/sys/kern/kern_sysctl.c:1254
> #19 0xc06813a7 in syscall (frame=
>       {tf_fs = 47, tf_es = 47, tf_ds = -1078001617, tf_edi = 3, tf_esi = 0,
> tf_ebp = -1077941560, tf_isp = -130683532, tf_ebx = 1746122828, tf_edx =
> 134584952, tf_ecx = 0, tf_eax = 202, tf_trapno = 12, tf_err = 2, tf_eip =
> 1745649783, tf_cs = 31, tf_eflags = 658, tf_esp = -1077941620, tf_ss = 47})
>     at /usr/src/sys/i386/i386/trap.c:1004
> #20 0x680c8077 in ?? ()
> Cannot access memory at address 0xbfbfeac8
> (kgdb) 

I'm not sure where to go from here. A deadlock doesn't seem likely, but
it's possible that background fsck could lock up the system for quite
some time by using this sysctl. How long did you wait before dropping to
ddb (approximately)?


Tim
Received on Sun May 30 2004 - 19:21:45 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:37:55 UTC