On Wed, Jun 05, 2013 at 01:50:43PM +0400, Gleb Smirnoff wrote: > On Wed, Jun 05, 2013 at 10:18:21AM +0200, Ian FREISLICH wrote: > I> I have the following recurring panic on all my heavily network > I> loaded -CURRENT routers. The current process is always different. > I> > I> Gleb, can you please chime in with what you've managed to uncover. > > The panics appear on selfd mutex. The mtx_lock value is a free mutex, but > it has 1 extra bit set: > > (kgdb) p/x sfp->sf_mtx->mtx_lock > $3 = 0x1000004 > > Rarely (only one panic observed) more than one bit is set: > > $3 = 0x21000004 > > It is important that selfd mutexes are taken from mtxpool(9), which > is allocated at a early boot stage. Thus, across reboots all possible > sfp->sf_mtx mutexes usually fall into the same virtual memory region. > I'm not sure, but I suppose, they fall into same physical region. > > This can lead one to idea that RAM in the box has problems. But it > is running ECC memory, and it doesn't experience other random panics. > > The only special about the box is that it is running pf(4) with huge > ruleset and a lot of traffic. So the pf(4) is the number one suspected, > albeit it isn't closely related to selfds. > So is the virtual address of the corrupted word same for each panic ? If yes, set up the hw watchpoint in ddb.
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:38 UTC