On Thu, Jul 09, 2009 at 06:52:42PM +1000, John Marshall wrote: > On Thu, 09 Jul 2009, 17:30 +1000, John Marshall wrote: > > On Thu, 09 Jul 2009, 10:42 +0400, pluknet wrote: > > > 2009/7/9 John Marshall <john.marshall_at_riverwillow.com.au>: > > > > After upgrading... > > > > - boot new kernel to single-user > > > > - make installworld > > > > - make delete-old > > > > - make delete-old-libs > > > > - mergemaster > > > > - reboot > > > > > > > > I re-built a few of my applications. I noticed a problem with ntpd > > > > 4.2.4p7. The build was fine, it started fine, but got stuck in vmmaps > > > > and I couldn't kill it. Stopping the operating system appears to be the > > > > only remedy. I have re-built a few times (starting with 'make > > > > distclean') just to make sure. > > > > > > > > UID PID PPID CPU PRI NI VSZ RSS MWCHAN STAT TT TIME COMMAND > > > > 0 791 1 0 44 0 4944 4920 vmmaps Ds ?? 0:00.01 ntpd > > > > > > > > > > Can you place here 'procstat -k 791', where 791 is pid of ntpd? > > > It'd be nice also if you go through all ddb steps described in > > > Debugging Deadlocks chapter of FreeBSD Developers' Handbook. > > > > Here is some procstat output. I'm just rebuilding the kernel with the > > debugging options enabled - not something I've ever done before. > > > > rwsrv05# procstat 2788 > > PID PPID PGID SID TSID THR LOGIN WCHAN EMUL COMM > > 2788 1 2788 2788 0 1 john vmmaps FreeBSD ELF32 ntpd > > rwsrv05# procstat -k 2788 > > PID TID COMM TDNAME KSTACK > > 2788 100164 ntpd - mi_switch sleepq_switch sleepq_wait _sleep vm_map_unlock_and_wait vm_map_delete vm_map_fixed vm_mmap mmap syscall Xint0x80_syscall > > rwsrv05# procstat -v 2788 > > PID START END PRT RES PRES REF SHD FL TP PATH > > 2788 0x8048000 0x807e000 r-x 54 60 2 1 CN vn /usr/local/bin/ntpd > > 2788 0x807e000 0x8080000 rw- 2 0 1 0 C- vn /usr/local/bin/ntpd > > 2788 0x8080000 0x8100000 rw- 128 0 1 0 C- df > > 2788 0x2807e000 0x280ab000 r-x 45 0 171 75 CN vn /libexec/ld-elf.so.1 > > 2788 0x280ab000 0x280ad000 rw- 2 0 1 0 C- vn /libexec/ld-elf.so.1 > > 2788 0x280ad000 0x280c0000 rw- 19 0 1 0 C- df > > 2788 0x280c0000 0x280d7000 r-x 23 0 1 0 CN vn /lib/libm.so.5 > > 2788 0x280d7000 0x280d8000 r-x 1 0 1 0 CN vn /lib/libm.so.5 > > 2788 0x280d8000 0x280d9000 rw- 1 0 1 0 C- vn /lib/libm.so.5 > > 2788 0x280d9000 0x28211000 r-x 312 0 1 0 CN vn /lib/libcrypto.so.5 > > 2788 0x28211000 0x28212000 r-x 1 0 1 0 CN vn /lib/libcrypto.so.5 > > 2788 0x28212000 0x2822a000 rw- 24 0 1 0 C- vn /lib/libcrypto.so.5 > > 2788 0x2822a000 0x2822c000 rw- 2 0 1 0 C- df > > 2788 0x2822c000 0x28232000 r-x 6 0 1 0 CN vn /lib/libkvm.so.4 > > 2788 0x28232000 0x28233000 r-x 1 0 1 0 CN vn /lib/libkvm.so.4 > > 2788 0x28233000 0x28234000 rw- 1 0 1 0 C- vn /lib/libkvm.so.4 > > 2788 0x28234000 0x2824c000 r-x 24 0 1 0 CN vn /usr/lib/libelf.so.1 > > 2788 0x2824c000 0x2824d000 r-x 1 0 1 0 CN vn /usr/lib/libelf.so.1 > > 2788 0x2824d000 0x2824e000 rw- 1 0 1 0 C- vn /usr/lib/libelf.so.1 > > 2788 0x2824e000 0x28251000 r-x 3 0 15 10 CN vn /usr/lib/librt.so.1 > > 2788 0x28251000 0x28252000 r-x 1 0 1 0 CN vn /usr/lib/librt.so.1 > > 2788 0x28252000 0x28253000 rw- 1 0 1 0 C- vn /usr/lib/librt.so.1 > > 2788 0x28253000 0x28260000 r-x 13 0 1 0 CN vn /lib/libmd.so.4 > > 2788 0x28260000 0x28261000 r-x 1 0 1 0 CN vn /lib/libmd.so.4 > > 2788 0x28261000 0x28262000 rw- 1 0 1 0 C- vn /lib/libmd.so.4 > > 2788 0x28262000 0x28351000 r-x 239 0 1 0 CN vn /lib/libc.so.7 > > 2788 0x28351000 0x28352000 r-x 1 0 1 0 CN vn /lib/libc.so.7 > > 2788 0x28352000 0x28358000 rw- 6 0 1 0 C- vn /lib/libc.so.7 > > 2788 0x28358000 0x2836e000 rw- 22 0 1 0 C- df > > 2788 0x2836e000 0x2837a000 --- 0 0 0 0 -- -- > > 2788 0x28400000 0x28500000 rw- 256 0 1 0 C- df > > 2788 0xbfbe0000 0xbfc00000 rwx 32 0 1 0 C- df > > rwsrv05# > > OK, now that I've rebuilt the kernel with the debugging options not > commented out, I'm getting a number of 'lock order reversal' messages > printed on the console: is that normal? > > From the Debugging Deadlocks chapter to which I was referred by pluknet > (above) it appears that I need to enter 'sysctl debug.kdb.enter=1' or > 'sysctl debug.kdb.panic=1' after I get the process into the desired > 'stuck' state. If I enter either of those commands, the system reboots. > Now *I'm* stuck. Since you have mostly working system, and interesting information most easy accessible by kgdb, attach it to the live kernel: kgdb <path to kernel.debug> /dev/mem From there, switch to the stuck process, process <pid> do bt find the frame for vm_map_delete, and print the entry: p entry Also, I need to see the information you posted earlier, namely, procstat -k and -v output for the process.
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:51 UTC