Re: panic on one cpu leaves others running...

From: Robert Watson <rwatson_at_freebsd.org> Date: Thu, 8 Apr 2004 11:51:24 -0400 (EDT) · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:37:50 UTC

On Thu, 8 Apr 2004, Marcel Moolenaar wrote:

> On Thu, Apr 08, 2004 at 09:43:06AM -0400, Robert Watson wrote:
> > 
> > On Wed, 7 Apr 2004, Marcel Moolenaar wrote:
> > 
> > > On Thu, Apr 08, 2004 at 12:13:39AM -0400, Robert Watson wrote:
> > > > 
> > > > Funky, eh?  I thought we used to have code to ipi the other cpu's and halt
> > > > them until the cpu in ddb was out agian.  I guess I mis-remember, or that
> > > > code is broken...
> > > 
> > > You remember correctly.
> > 
> > And it's still going this morning:
> *snip*
> > Apr  8 13:39:30  sm-mta[4707]: i3879Tjc003922: SYSERR(root): cannot
> > flock(/etc/mail/aliases, fd=5, type=1, omode=40000, euid=0): Operation not
> > supported
> > 
> > Debugger(c07c3990) at Debugger+0x46
> > db> 
> 
> Do you have SMP and/or made modifications to <machine/smptests.h>? 
> What's pcpu->pc_other_cpus and what is stopped_cpus currently? 

No changes to smptests.h.

Unfortunately, I don't have access to serial gdb for this box, and causing
a dump might well change all that, so I only have the value of
stopped_cpus:

db> print stopped_cpus
c0950594
db> print *stopped_cpus
       d

> > Presumably in large part because I'm in code that doesn't require Giant,
> > so there are no lock conflicts.
> 
> I don't think that's the case. It think we're just not stopping the CPUs
> or keep them stopped. 

I agree with that interpretation -- I was suggesting that the reason this
problem might not be noticed is that a lot of our code paths require
Giant, and it's only when you panic in code without Giant that 

> 
> This is all a hunch and I have no way to test this myself... 

Robert N M Watson             FreeBSD Core Team, TrustedBSD Projects
robert_at_fledge.watson.org      Senior Research Scientist, McAfee Research