current freeze from time to time

From: Denis Shaposhnikov <dsh_at_vlink.ru>
Date: Fri, 23 Dec 2005 16:32:13 +0300
Hi!

I have a problem with CURRENT. It freezes from time to time on SMP
dual Xeon system. For some processes I see MWCHAN = ufs and "D" in the
STAT. And I can't kill such processes even with -9. And system can't
kill them too on shutdown. So, system can't do shutdown and wait
forever after "All buffers synced" message. At this moment I've
entered to KDB do "show lockedvnods":

Locked vnodes

0xc687cb58: tag ufs, type VDIR
    usecount 1, writecount 0, refcount 2 mountedhere 0
    flags ()
    v_object 0xcb5b1934 ref 0 pages 0
     lock type ufs: EXCL (count 1) by thread 0xc795d600 (pid 74686) with 1 pending
        ino 2072602, on dev ad4s1g

0xc687ca50: tag ufs, type VDIR
    usecount 31, writecount 0, refcount 32 mountedhere 0
    flags ()
    v_object 0xc85d2744 ref 0 pages 1
     lock type ufs: EXCL (count 1) by thread 0xc7683d80 (pid 74178) with 6 pending
        ino 2072603, on dev ad4s1g

0xc687c948: tag ufs, type VDIR
    usecount 2, writecount 0, refcount 3 mountedhere 0
    flags ()
    v_object 0xc875d000 ref 0 pages 1
     lock type ufs: EXCL (count 1) by thread 0xc91f3300 (pid 65610) with 1 pending
        ino 2072615, on dev ad4s1g

0xc691f420: tag ufs, type VDIR
    usecount 2, writecount 0, refcount 3 mountedhere 0
    flags ()
    v_object 0xc8a773e0 ref 0 pages 1
     lock type ufs: EXCL (count 1) by thread 0xc68e5780 (pid 519) with 1 pending
        ino 2072680, on dev ad4s1g

0xc691f318: tag ufs, type VDIR
    usecount 3, writecount 0, refcount 4 mountedhere 0
    flags ()
    v_object 0xc8a7b2e8 ref 0 pages 1
     lock type ufs: EXCL (count 1) by thread 0xc7019780 (pid 74103) with 2 pending
        ino 2072795, on dev ad4s1g

0xc69bb528: tag ufs, type VDIR
    usecount 2, writecount 0, refcount 3 mountedhere 0
    flags ()
    v_object 0xc7890744 ref 0 pages 1
     lock type ufs: EXCL (count 1) by thread 0xc91f4600 (pid 74129) with 1 pending
        ino 2072767, on dev ad4s1g
Locked vnodes

0xc687cb58: tag ufs, type VDIR
    usecount 1, writecount 0, refcount 2 mountedhere 0
    flags ()
    v_object 0xcb5b1934 ref 0 pages 0
     lock type ufs: EXCL (count 1) by thread 0xc795d600 (pid 74686) with 1 pending
        ino 2072602, on dev ad4s1g

0xc687ca50: tag ufs, type VDIR
    usecount 31, writecount 0, refcount 32 mountedhere 0
    flags ()
    v_object 0xc85d2744 ref 0 pages 1
     lock type ufs: EXCL (count 1) by thread 0xc7683d80 (pid 74178) with 6 pending
        ino 2072603, on dev ad4s1g

0xc687c948: tag ufs, type VDIR
    usecount 2, writecount 0, refcount 3 mountedhere 0
    flags ()
    v_object 0xc875d000 ref 0 pages 1
     lock type ufs: EXCL (count 1) by thread 0xc91f3300 (pid 65610) with 1 pending
        ino 2072615, on dev ad4s1g

0xc691f420: tag ufs, type VDIR
    usecount 2, writecount 0, refcount 3 mountedhere 0
    flags ()
    v_object 0xc8a773e0 ref 0 pages 1
     lock type ufs: EXCL (count 1) by thread 0xc68e5780 (pid 519) with 1 pending
        ino 2072680, on dev ad4s1g

0xc691f318: tag ufs, type VDIR
    usecount 3, writecount 0, refcount 4 mountedhere 0
    flags ()
    v_object 0xc8a7b2e8 ref 0 pages 1
     lock type ufs: EXCL (count 1) by t(kgdb) 

After that I've done "call doadump" and got vmcore.

ps show me:

(kgdb) ps
During symbol reading, Incomplete CFI data; unspecified registers at 0xc04d97eb.
  pid    proc   uid  ppid  pgrp   flag stat comm         wchan
74686 c9464000    0     1     1  000000  1  sh           ufs c687caa8
74195 c970d000    0  3074 74195  4000100  1  sshd         ufs c687caa8
74178 c7682adc    0  3074 74178  004000  1  sshd         ufs c687c9a0
74129 c9b82adc 1008     1  5504  004000  1  parser3.cgi   ufs c691f370
74103 c70b5458 1008     1  5504  000000  1  httpd        ufs c69bb580
65610 c92c0458 1005     1 65610  004000  1  sftp-server   ufs c691f478
 5518 c6247458 1008     1  5516  004002  1  perl5.8.7    ufs c687caa8
 3081 c7523d08    0     1  3081  000000  1  cron         ufs c687caa8
 3074 c7682d08    0     1  3074  000100  1  sshd         ufs c687caa8
 3016 c7523adc    0     1  3016  000000  1  syslogd      ufs c687caa8
  519 c68e4d08   80     1   518  000100  1  nginx        ufs c691f370
   34 c6260000    0     0     0  000204  1  schedcpu     - e88b3cf0
   33 c62438b0    0     0     0  000204  1  syncer       ktsusp c6243938
   32 c6243adc    0     0     0  000204  1  vnlru        ktsusp c6243b64
   31 c6243d08    0     0     0  000204  1  bufdaemon    ktsusp c6243d90
   30 c6244000    0     0     0  00020c  1  pagezero     pgzero c06c21a0
   29 c624422c    0     0     0  000204  1  vmdaemon     psleep c06c1d08
   28 c6244458    0     0     0  000204  1  pagedaemon   psleep c06c1cc8
   27 c602e684    0     0     0  000204  1  irq1: atkbd0   
   26 c602e8b0    0     0     0  000204  1  swi0: sio    
   25 c602eadc    0     0     0  000204  1  irq18: atapci1   
   24 c602ed08    0     0     0  000204  1  irq15: ata1   
   23 c6074000    0     0     0  000204  1  irq14: ata0   
   22 c607422c    0     0     0  000204  1  irq27: em1   
   21 c6074458    0     0     0  000204  1  irq26: em0   
   20 c6074684    0     0     0  000204  1  irq9: acpi0   
   19 c60748b0    0     0     0  000204  1  swi2: cambio   
   18 c6074adc    0     0     0  000204  1  swi6: task queue   
    9 c5fd822c    0     0     0  000204  1  acpi_task2   - c6061e40
    8 c5fd8458    0     0     0  000204  1  acpi_task1   - c6061e40
    7 c5fd8684    0     0     0  000204  1  acpi_task0   - c6061e40
   17 c5fd88b0    0     0     0  000204  1  swi6: Giant taskq   
    6 c5fd8adc    0     0     0  000204  1  thread taskq   - c6062040
   16 c5fd8d08    0     0     0  000204  1  swi5: Fast taskq   
    5 c602e000    0     0     0  000204  1  kqueue taskq   - c6062100
   15 c602e22c    0     0     0  000204  1  yarrow       - c06b1720
    4 c602e458    0     0     0  000204  1  g_down       - c06b1fc0
    3 c5fd3000    0     0     0  000204  1  g_up         - c06b1fbc
    2 c5fd322c    0     0     0  000204  1  g_event      - c06b1fb4
   14 c5fd3458    0     0     0  000204  1  swi3: vm     
   13 c5fd3684    0     0     0  00020c  1  swi4: clock sio   
   12 c5fd38b0    0     0     0  000204  1  swi1: net    
   11 c5fd3adc    0     0     0  00020c  1  idle: cpu0   
   10 c5fd3d08    0     0     0  00020c  1  idle: cpu1   
    1 c5fd8000    0     0     1  004200  1  init         ufs c687cbb0
    0 c06b20c0    0     0     0  000200  1  swapper      
There is no member named p_pptr.

Note, I've tried to check v_object addresses from "show lockedvnodes":

(kgdb) p/x *0xcb5b1934
$1 = 0xc068f280
(kgdb) p/x *0xc85d2744
$2 = 0xc068f280
(kgdb) p/x *0xc875d000
$3 = 0xc068f280
(kgdb) p/x *0xc8a773e0
$4 = 0xc068f280

... and so on.

May be that's important?

Could anybody help me to fix this problem?

PS. The system works fine with debug.mpsafevfs=0.

-- 
DSS5-RIPE DSS-RIPN 2:550/5068_at_fidonet 2:550/5069_at_fidonet
mailto:dsh_at_vlink.ru http://neva.vlink.ru/~dsh/
Received on Fri Dec 23 2005 - 12:32:26 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:38:49 UTC