Re: firefox is stuck in getbuf()

From: Gavin Atkinson <gavin_at_FreeBSD.org>
Date: Thu, 29 Jul 2010 21:47:20 +0100 (BST)
On Wed, 21 Jul 2010, Gavin Atkinson wrote:
> On Tue, 2010-07-20 at 16:29 +0300, Kostik Belousov wrote:
> > On Tue, Jul 20, 2010 at 10:58:00AM +0800, David Xu wrote:
> > > With newest -HEAD code, firefox is stuck in getbuf().
> > > 
> > > top
> > > 
> > > last pid:  1814;  load averages:  0.00,  0.05,  0.07 
> > > 
> > >                                         up 0+00:37:11  10:54:01
> > > 135 processes: 1 running, 134 sleeping
> > > CPU:  3.7% user,  0.0% nice,  0.6% system,  0.0% interrupt, 95.7% idle
> > > Mem: 259M Active, 393M Inact, 151M Wired, 1484K Cache, 111M Buf, 186M Free
> > > Swap: 2020M Total, 2020M Free
> > > 
> > >   PID USERNAME    THR PRI NICE   SIZE    RES STATE   C   TIME   WCPU 
> > > COMMAND
> > >  1427 davidxu       1  45    0   114M   101M select  0   1:24  0.29% Xorg
> > >  1588 davidxu      10  44    0   279M   145M getbuf  0   2:15  0.00% 
> > > firefox-bin
> > > 
> > > 
> > > procstat  -k 1588
> > >   PID    TID COMM             TDNAME           KSTACK 
> > > 
> > >  1588 100200 firefox-bin      initial thread   mi_switch sleepq_switch 
> > > sleepq_wait _sleep getdirtybuf flush_deplist softdep_sync_metadata 
> > > ffs_syncvnode ffs_fsync VOP_FSYNC_APV fsync syscallenter syscall 
> > > Xint0x80_syscall
> > >  1588 100207 firefox-bin      -                mi_switch sleepq_switch 
> > > sleepq_catch_signals sleepq_wait_sig _cv_wait_sig seltdwait poll 
> > > syscallenter syscall Xint0x80_syscall
> > >  1588 100208 firefox-bin      -                mi_switch sleepq_switch 
> > > sleepq_catch_signals sleepq_wait_sig _sleep __umtx_op_cv_wait _umtx_op 
> > > syscallenter syscall Xint0x80_syscall
> > >  1588 100209 firefox-bin      -                mi_switch sleepq_switch 
> > > sleepq_catch_signals sleepq_timedwait_sig _sleep __umtx_op_cv_wait 
> > > _umtx_op syscallenter syscall Xint0x80_syscall
> > >  1588 100210 firefox-bin      -                mi_switch sleepq_switch 
> > > sleepq_catch_signals sleepq_timedwait_sig _sleep __umtx_op_cv_wait 
> > > _umtx_op syscallenter syscall Xint0x80_syscall
> > >  1588 100216 firefox-bin      -                mi_switch sleepq_switch 
> > > sleepq_catch_signals sleepq_wait_sig _sleep __umtx_op_cv_wait _umtx_op 
> > > syscallenter syscall Xint0x80_syscall
> > >  1588 100220 firefox-bin      -                mi_switch sleepq_switch 
> > > sleepq_wait _sleep getdirtybuf flush_deplist softdep_sync_metadata 
> > > ffs_syncvnode ffs_fsync VOP_FSYNC_APV fsync syscallenter syscall 
> > > Xint0x80_syscall
> > >  1588 100238 firefox-bin      -                mi_switch sleepq_switch 
> > > sleepq_catch_signals sleepq_wait_sig _sleep __umtx_op_cv_wait _umtx_op 
> > > syscallenter syscall Xint0x80_syscall
> > >  1588 100239 firefox-bin      -                mi_switch sleepq_switch 
> > > sleepq_catch_signals sleepq_wait_sig _sleep __umtx_op_cv_wait _umtx_op 
> > > syscallenter syscall Xint0x80_syscall
> > >  1588 100240 firefox-bin      -                mi_switch sleepq_switch 
> > > sleepq_catch_signals sleepq_wait_sig _sleep __umtx_op_cv_wait _umtx_op 
> > > syscallenter syscall Xint0x80_syscall
> > 
> > Can you, please, do the following:
> > show the backtraces for the system processes, in particular, syncer,
> > bufdaemon, softdepflush daemon, pagedaemon and vm ?
> > for the stuck firefox thread, find the address of the buffer
> > supplied as an argument to getdirtybuf, and print the *(struct buf *)addr ?
> > This can be done on the live/stuck system using kgdb on /dev/mem.
> 
> I can relatively easily recreate this, see my thread on -current on the
> 17th July ("Filesystem wedge, SUJ-related?"), which (and the followup
> emails) contain additional info.  I'm currently trying to find the
> commit responsible for introducing this, and have established that a

OK, sorry for the delay.  I have the information requested.

Please see http://people.freebsd.org/~gavin/rho-fs-hang.txt

I've started to try and narrow down where exactly the hangs started:

r208700 - June 1st  - seems to work fime
r209425 - June 22st - hangs occur

If you need any more info, let me know.

Thanks,

Gavin
Received on Thu Jul 29 2010 - 19:17:36 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:05 UTC