Re: dump is stuck

From: Danny Braniss <danny_at_cs.huji.ac.il>
Date: Fri, 28 Jan 2005 22:30:13 +0200
> In message <E1CuXry-000JJd-OV_at_cs1.cs.huji.ac.il>, Danny Braniss writes:
> >> In message <20050128144733.GA91982_at_green.homeunix.org>, Brian Fundakowski Feldman writes:
> >> >On Fri, Jan 28, 2005 at 09:08:54AM +0200, Danny Braniss wrote:
> >> >> hi,
> >> >> 	while running 'dump 0f - /dist | restore rf -'
> >> >> the dump proc. got stuck, it seems it's waiting on some lock:
> >> >> 
> >> >> UID   PID  PPID CPU PRI NI   VSZ  RSS MWCHAN STAT  TT       TIME COMMAND
> >> >> 
> >> >>  0 30924 30922   0   4  0  3396 2852 sbwait T     p1    1:00.88 dump: 
> >> >> /dev/amrd0s3h: ...
> >> >>  0 30925 30924   1  -8  0  3268 2784 physrd TL    p1    0:53.84 dump 0f - 
> >> >> /dist (dump)
> >> >>  0 30926 30924   1  20  0  3268 2784 pause  T     p1    0:53.69 dump 0f - 
> >> >> /dist (dump)
> >> >>  0 30927 30924   1  20  0  3268 2784 pause  T     p1    0:54.12 dump 0f - 
> >> >> /dist (dump)
> >> >> 
> >> >> (this is  5.3-STABLE, cvs'ed about a week ago, and it's a SMP system).
> >> >> how can i find which lock? or who is holding it?
> >> >
> >> >Is the one in physrd not actually reading anything from the disk right
> >> >now?  I would suspect that should be how you really determine if it's
> >> >hung or not.  You should be able to see how long it's been waiting
> >> >and how long it's due to wait still, using kgdb.
> >> 
> >> Check also with gstat(8) if there is I/O activity going on and/or if any
> >> I/O requests are stuck.
> >
> >it's stuck. i.e. not doing anything. i've been monitoring it via
> >iostat, and nothing is moving, nada, the machine is very idle :-(
> 
> Please use gstat(8) and look for stuck I/O requests.

ok, I had to run it again, this time it worked several times, but it finaly
got stuck, btw, it's been stuck for several hours.

gstat:
 L(q)  ops/s    r/s   kBps   ms/r    w/s   kBps   ms/w   %busy Name
    0      0      0      0    0.0      0      0    0.0    0.0| fd0
    2      0      0      0    0.0      0      0    0.0    0.0| amrd0
    0      0      0      0    0.0      0      0    0.0    0.0| amrd0s1
    0      0      0      0    0.0      0      0    0.0    0.0| amrd0s2
    2      0      0      0    0.0      0      0    0.0    0.0| amrd0s3

the rest is all zero.

top says:
last pid:  1678;  load averages:  0.00,  0.00,  0.00                            up 0+05:22:18  22:28:00
42 processes:  1 running, 41 sleeping
CPU states:  0.0% user,  0.0% nice,  0.1% system,  0.3% interrupt, 99.6% idle
Mem: 231M Active, 3121M Inact, 203M Wired, 132M Cache, 112M Buf, 73M Free
Swap: 4096M Total, 4096M Free

  PID USERNAME PRI NICE   SIZE    RES STATE  C   TIME   WCPU    CPU COMMAND
 1165 root      -8    0 50420K 49924K piperd 3   4:51  0.00%  0.00% restore
 1168 root       4    0 36168K 35624K sbwait 3   1:20  0.00%  0.00% dump
 1170 root      -8    0 36040K 35552K physrd 2   0:51  0.00%  0.00% dump
 1169 root      20    0 36040K 35552K pause  1   0:51  0.00%  0.00% dump
 1171 root      20    0 36040K 35552K pause  3   0:51  0.00%  0.00% dump

waiting further instructions :-)
danny
Received on Fri Jan 28 2005 - 19:30:18 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:38:27 UTC