zfs stuck, cannot do any I/O, processes in Disk Wait

From: Adam McDougall <mcdouga9_at_egr.msu.edu>
Date: Sun, 28 Oct 2007 22:43:56 -0400
I think I have had this happen at least once before, but someone else
rebooted the system before I could see it.  I have a server with a 
number of zfs filesystems mounted from a raidz, but it won't transfer
any data.  I'm not sure why its stuck.  It is running 7.0-PRERELEASE
Wed Oct 17 and I'm pretty sure it is WITHOUT vm_kern.c.2.patch.  The
system is amd64 and I have not seen a kmem panic since I raised kmem
to 1.5G.  

I logged in to scp a file off of zfs, was able to ls -l to see the file
but the scp hung before transferring any bytes.  Now I cannot do a ls -l
in that directory, /z.  I noticed several days worth of rsync processes
stuck in disk wait, it must have been in this state for several days.
I have no urgent need to reboot this system, its more important to try
to get a permanent fix.  Please let me know what other information I 
can provide.  

10:34PM  up 10 days, 12:37, 3 users, load averages: 0.00, 0.00, 0.00

Stuck rsync processes (started from cron):
1:01AM
4:00AM
5:00AM
Fri01AM
Fri04AM
Fri05AM
Mon05AM
Sat01AM
Sat04AM
Sat05AM
Thu01AM
Thu04AM
Thu05AM
Tue01AM
Tue04AM
Tue05AM
Wed01AM
Wed04AM
Wed05AM

# more /boot/loader.conf 
vm.kmem_size=1610612736
vm.kmem_size_max=1610612736

# sysctl -a | grep vnodes
kern.maxvnodes: 100000
kern.minvnodes: 25000
vfs.freevnodes: 25000
vfs.wantfreevnodes: 25000
vfs.numvnodes: 49996

No errors in dmesg, and I can dd from the drives in the raidz1 fine.

z/backups             101508480         0 101508480     0%    /backups
z/backups/a           101508480         0 101508480     0%    /backups/a
z/backups/b           149992448  48483968 101508480    32%    /backups/b
z/backups/c           219571968 118063488 101508480    54%    /backups/c
z/backups/d           105923968   4415488 101508480     4%    /backups/d
z/data                199868032  98359552 101508480    49%    /data
z                     103982976   2474496 101508480     2%    /z
z/data4               206146688 104638208 101508480    51%    /z/data4
z/mysqldb             102015488    507008 101508480     0%    /z/mysqldb

# zpool list
NAME                    SIZE    USED   AVAIL    CAP  HEALTH     ALTROOT
z                       696G    540G    156G    77%  ONLINE     -

# zpool status
  pool: z
 state: ONLINE
 scrub: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        z           ONLINE       0     0     0
          raidz1    ONLINE       0     0     0
            ad4     ONLINE       0     0     0
            ad6     ONLINE       0     0     0
            ad8     ONLINE       0     0     0

errors: No known data errors
Received on Mon Oct 29 2007 - 17:48:59 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:20 UTC