Re: snapshot dump hangs

From: Randy Bush <randy_at_psg.com>
Date: Tue, 8 Jun 2004 22:12:18 -0700
> (1) Could you do a "ps awxl" and see what wait channel dump is blocked on?

  UID   PID  PPID CPU PRI NI   VSZ  RSS MWCHAN STAT  TT       TIME COMMAND
    0   942   925   0   8  0  1332 1152 wait   Ss    p0    0:00.05 -bash (bash)
    0  1701   942   0   8  0  1248 1052 wait   S+    p0    0:00.01 /usr/local/bin/bash /do-dump
    0  1745  1701   0   8  0  1496 1220 wait   S+    p0    0:00.01 /sbin/rdump 0Luaf raid0.psg.com:/data/backup/psg.2004-06-09/var /dev/twed0s1d (rdump)
    0  1748  1745   0   4  0  1624 1268 sbwait S+    p0    0:02.55 rdump: /dev/twed0s1d: pass 4: 60.19% done, finished in 0:00 (rdump)
    0  1749  1748   0  20  0  1496 1220 pause  S+    p0    0:03.99 /sbin/rdump 0Luaf raid0.psg.com:/data/backup/psg.2004-06-09/var /dev/twed0s1d (rdump)
    0  1750  1748   0  20  0  1496 1220 pause  S+    p0    0:03.95 /sbin/rdump 0Luaf raid0.psg.com:/data/backup/psg.2004-06-09/var /dev/twed0s1d (rdump)
    0  1751  1748   0  20  0  1496 1220 pause  S+    p0    0:03.95 /sbin/rdump 0Luaf raid0.psg.com:/data/backup/psg.2004-06-09/var /dev/twed0s1d (rdump)
    0  1766  1764   0   8  0  1328 1148 wait   Ss    p1    0:00.03 -bash (bash)
    0  1995  1766   0  76  0  1460  968 -      R+    p1    0:00.00 /bin/ps -l
    0   690     1   0  -8  0  2324 1812 piperd S     d0-   0:00.02 /usr/local/bin/perl /usr/local/sbin/exim-pop/popwatch
    0   691     1   0   8  0  2260 1764 nanslp S     d0-   0:00.01 /usr/local/bin/perl /usr/local/sbin/exim-pop/popauth
    0   692     1   0   8  0  2260 1704 nanslp S     d0-   0:00.01 /usr/local/bin/perl /usr/local/sbin/exim-pop/popclean
    0   703   690   0   4  0  1240  652 kqread S     d0-   0:00.00 /usr/bin/tail -f /var/log/poplog
    0   743     1   0   8  0  1676 1220 wait   S     d0-   0:00.02 /bin/sh /usr/local/bin/mysqld_safe --user=mysql --datadir=/var/db/mysql --pid-file=/var/db/mysql/psg.pid
    0   904     1   0   5  0  1288  944 ttyin  Ss+   d0    0:00.01 /usr/libexec/getty std.9600 ttyd0
    0   896     1   0   5  0  1288  944 ttyin  Ss+   v0    0:00.01 /usr/libexec/getty Pc ttyv0
    0   897     1   0   5  0  1288  944 ttyin  Ss+   v1    0:00.01 /usr/libexec/getty Pc ttyv1
    0   898     1   0   5  0  1288  944 ttyin  Ss+   v2    0:00.01 /usr/libexec/getty Pc ttyv2
    0   899     1   0   5  0  1288  944 ttyin  Ss+   v3    0:00.01 /usr/libexec/getty Pc ttyv3
    0   900     1   0   5  0  1288  944 ttyin  Ss+   v4    0:00.01 /usr/libexec/getty Pc ttyv4
    0   901     1   0   5  0  1288  944 ttyin  Ss+   v5    0:00.01 /usr/libexec/getty Pc ttyv5
    0   902     1   0   5  0  1288  944 ttyin  Ss+   v6    0:00.01 /usr/libexec/getty Pc ttyv6
    0   903     1   0   5  0  1288  944 ttyin  Ss+   v7    0:00.01 /usr/libexec/getty Pc ttyv7

> (2) Could you break into DDB and generate a stack trace for dump?

db> trace 1745
sched_switch(c66cb150,8b088ded,52586aad,ffc00014,c66cb150) at sched_switch+0x145
mi_switch(1,c04d9400,c66ce6e0,c04d8e52,0) at mi_switch+0x1ab
sleepq_switch(c66ce6e0,0,e3bdcc40,c04b97e1,c66ce6e0) at sleepq_switch+0x16f
sleepq_wait_sig(c66ce6e0,5c,c66ce74c,c05fc4b5,0) at sleepq_wait_sig+0x14
msleep(c66ce6e0,c66ce74c,15c,c05fc4b5,0) at msleep+0x511
kern_wait(c66cb150,ffffffff,e3bdcc8c,0,e3bdcc90) at kern_wait+0xa19
wait4(c66cb150,e3bdcd14,10,7,4) at wait4+0x32
syscall(2f,2f,2f,6d1,bfbfe0c8) at syscall+0x320
Xint0x80_syscall() at Xint0x80_syscall+0x1d
--- syscall (7, FreeBSD ELF32, wait4), eip = 0x280c970f, esp = 0xbfbfe08c, ebp = 0xbfbfe0a8 ---

db> trace 1748
sched_switch(c61e97e0,ba59592b,b60532c8,ffc06014,c61e97e0) at sched_switch+0x145
mi_switch(1,c04d9400,e3997b74,c04a61ef,0) at mi_switch+0x1ab
sleepq_switch(c64466fc,0,e3997ba8,c04b97e1,c64466fc) at sleepq_switch+0x16f
sleepq_wait_sig(c64466fc,58,0,0,0) at sleepq_wait_sig+0x14
msleep(c64466fc,0,158,c05ff4a1,0) at msleep+0x511
sbwait(c64466dc,c613e900,c64fcd20,e3997bfc,4) at sbwait+0x4b
soreceive(c6446690,0,e3997c80,0,0) at soreceive+0x2a5
soo_read(c61ff990,e3997c80,c5fcde80,0,c61e97e0) at soo_read+0x93
dofileread(c61e97e0,c61ff990,7,bfbddb98,4) at dofileread+0xdc
read(c61e97e0,e3997d14,c,c61e97e0,3) at read+0x6b
syscall(807002f,805002f,bfbd002f,7,bfbddb98) at syscall+0x320
Xint0x80_syscall() at Xint0x80_syscall+0x1d
--- syscall (3, FreeBSD ELF32, read), eip = 0x280c978f, esp = 0xbfbddb5c, ebp = 0xbfbddb78 ---

whoops!  how do i tell it which process?

> (3) Could you run "show lockedvnods" in DDB and show the results?

db> show lockedvnods
Locked vnodes

> (4) Could you run "show locks <pid>" on the dump process?

db> show locks 1745
No such command
Received on Wed Jun 09 2004 - 03:12:20 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:37:56 UTC