SMP opteron system freezes, Was: Re: Freezing or stalling current system

From: Willem Jan Withagen <wjw_at_withagen.nl>
Date: Mon, 20 Oct 2008 17:33:56 +0200
Willem Jan Withagen wrote:
> Steve Kargl wrote:
>> On Sat, Oct 18, 2008 at 12:20:49AM +0200, Willem Jan Withagen wrote:
>>> I'm sort of assuming that the bge0: timeouts and coalesced links are 
>>> due to the freezing.
>>>
>>
>> Does the following help?
>>
> 
> Just a little...
> It now takes a little longer for the system to freeze, but eventally it 
> will.
> The coalesced messages did not return.
> 
> Just out of curiosity is also plugged in an fxp-card.
> And there it takes even longer for the system to freeze, but in the end 
> it does freeze.
> 
> The "funny" part is it once in a while is revivable by going into the 
> kernel-debugger and then just continue.
> Sometimes a long wait (10 sec) will suffice, during which there is no 
> keyboard response what so ever.
> But on other instances the system is dead in the water, and only a 
> hardware reset gets it back.
> 
> Something I'm still wondering if this only is with NFS traffic, or with 
> all other types of network traffic. But I haven't tested thids.

Well I tested something different.

This is a (older) dual opteron 244 system. So each chip has only one core.
And I removed one of the processors...

Guess what:
	It just runs without any problems as far as I could test.

With 2 processors it is just enough to let init start all the nfs related 
stuff in /etc/rc.d and lock up the system.

So I guess we need to look at totally different things.
Given enough time, I'll check and see whether 7.x does run without trouble.

If somebody thinks this thread should go to amd64, just say so.
But I am running the i386 stuff.

dmesg and stuff in http://www.tegenbosch28.nl:/FreeBSD/toy
(although I see I have to fire up the system again to get a correct dmesg.)

--WjW
Received on Mon Oct 20 2008 - 14:06:31 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:36 UTC