Re: Filesystem wedge, SUJ-related?

From: Gavin Atkinson <gavin_at_FreeBSD.org>
Date: Sun, 18 Jul 2010 14:02:24 +0100 (BST)
On Sat, 17 Jul 2010, Gavin Atkinson wrote:
> Semi-regularly (every two-three days) I'm seeing what appears to be some 
> sort of filesystem wedge.  I usually see it initially with web browsers, 
> but it's possible that's only because it's what produces most disk 
> activity on this machine.  I've seen it with both Opera and Firefox.
> 
> What happens is that the process will just wedge.  A "procstat -kk" on it 
> shows the following stack backtrace:
> 
>  9012 100243 firefox-bin      initial thread   mi_switch+0x21d 
> sleepq_switch+0x123 sleepq_wait+0x4d _sleep+0x357 getdirtybuf+0x21e 
> flush_deplist+0x6f softdep_sync_metadata+0x153 ffs_syncvnode+0x213 
> ffs_fsync+0x43 fsync+0x148 syscallenter+0x1b5 syscall+0x4c 
> Xfast_syscall+0xe2 

A bit more detail: it does look like whatever is supposed to periodically 
flush the journal just stops doing it's job.  Presumably this is also the 
root cause of the "softdep: Out of journal space!" messages I have been 
seeing in the past, which I had assumed may have been fixed by r209717.

(I'm running r209723 at the moment)

While processes are starting to hang, "sh ffs" from ddb shows:

db> sh ffs
mp 0xffffff0002c45be0 / devvp 0xffffff0002c51000 fs 0xffffff0002c67000 su_wl 0 su_wl_in 0 su_deps 0 su_req 0
mp 0xffffff0002d705f0 /tmp devvp 0xffffff0002d48780 fs 0xffffff0002c64800 su_wl 0 su_wl_in 0 su_deps 0 su_req 0
mp 0xffffff0002c458e8 /usr devvp 0xffffff0002d485a0 fs 0xffffff0002c66000 su_wl 0 su_wl_in 0 su_deps 17345 su_req 0
mp 0xffffff0002c455f0 /var devvp 0xffffff0002d483c0 fs 0xffffff0002c66800 su_wl 0 su_wl_in 0 su_deps 55 su_req 0

Leaving it another couple of hours, I then see:

db> sh ffs
mp 0xffffff0002c45be0 / devvp 0xffffff0002c51000 fs 0xffffff0002c67000 su_wl 0 su_wl_in 0 su_deps 0 su_req 0
mp 0xffffff0002d705f0 /tmp devvp 0xffffff0002d48780 fs 0xffffff0002c64800 su_wl 0 su_wl_in 0 su_deps 36 su_req 0
mp 0xffffff0002c458e8 /usr devvp 0xffffff0002d485a0 fs 0xffffff0002c66000 su_wl 0 su_wl_in 0 su_deps 31899 su_req 0
mp 0xffffff0002c455f0 /var devvp 0xffffff0002d483c0 fs 0xffffff0002c66800 su_wl 0 su_wl_in 0 su_deps 95 su_req 0

so, su_deps is increasing significantly.

During reboot, vnlru failed to stop within 60 seconds, and gave up on 
syncing 125 vnodes and 140 buffers (no idea if these are related).  On 
reboot, SU+J fsck shows for /usr: 

** SU+J Recovering /dev/ad4s1f
** Reading 33554432 byte journal from inode 150.
** Building recovery table.
** Resolving unreferenced inode list.
** Processing journal entries.
** 405991 journal records in 18194944 bytes for 71.40% utilization
** Freed 3872 inodes (0 dirs) 48157 blocks, and 8744 frags.

So it seems clear that somehow the journal is filling up, and never being 
written.

Any other suggestions as to where I should go from here?

Thanks,

Gaivn
Received on Sun Jul 18 2010 - 11:02:29 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:05 UTC