5.3-STABLE hangs under load (by bufdaemon)

From: Mikhail Teterin <mi+kde_at_aldan.algebra.com>
Date: Sat, 23 Oct 2004 00:46:23 -0400
5.3-STABLE amd64. Under heavy load -- database dumps over NFS to local
disk -- there comes a point, when the system process `bufdaemon' starts
taking almost entire CPU.

The machine stops doing anything else, but the earlier started `systat'
continues to work. After two hours of this, even that stops and the
systat's display remains frozen at:

------------------------------------------------------------------------
    4 users    Load  3.07  2.90  2.86                  Oct 23 00:05
Mem:KB    REAL            VIRTUAL                     VN PAGER  SWAP PAGER
        Tot   Share      Tot    Share    Free         in  out     in out
Act   29220    8624    66660    13100 1356976 count
All  657940   11900  1333164    18692         pages
                                                          zfod Interrupts
Proc:r  p  d  s  w    Csw  Trp  Sys  Int  Sof  Flt        cow    1490 total
           5 41      1628    4  162 1857    9      248892 wire        1: atkb
                                                    21364 act    1026 0: clk
93.9%Sys   0.8%Intr  0.0%User  0.0%Nice  5.3%Idl   390852 inact       6: fdc0
|    |    |    |    |    |    |    |    |    |          8 cache   128 8: rtc
===============================================   1356968 free    160 9: acpi
                                                          daefr 14: ata
Namei         Name-cache    Dir-cache                     prcfr 15: ata
    Calls     hits    %     hits    %                     react     8 16: ahc
                                                          pdwak   160 17: pcm
                                                          pdpgs     8 24: bge
Disks  afd0   ad6 amrd0   sa0 pass0                       intrn 26: amr
KB/t   0.00 16.00  0.00  0.00  0.00                218832 buf
tps       0   161     0     0     0                  3106 dirtybuf
MB/s   0.00  2.51  0.00  0.00  0.00                100000 desiredvnodes
% busy    0     7     0     0     0                   807 numvnodes
Showing vmstat, refresh every 1 seconds.              247
------------------------------------------------------------------------

The ad6 is the disk in question. What is it doing at 2.51Mb/s for two
hours remains a mistery -- as far as the NFS-client can tell, the server
stopped responding long ago.

Any advice on tuning this? The machine has 2Gb of RAM and runs on a
single Opteron. Shortly before going into this coma, the system reports
write-errors with the ad6:

Oct 22 21:31:24 pandora kernel: ad6: FAILURE - WRITE_DMA 
status=51<READY,DSC,ERROR> error=4<ABORTED> LBA=370211679
Oct 22 21:31:32 pandora kernel: ad6: TIMEOUT - WRITE_DMA retrying (2 retries 
left) LBA=373975135

but why would a device's trouble cause bufdaemon to to freak out?


 -mi
Received on Sat Oct 23 2004 - 02:46:26 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:38:19 UTC