Re: Functional RAID controller?

From: Scott Long <scottl_at_samsco.org>
Date: Tue, 08 May 2007 17:08:01 -0600
Barrett Lyon wrote:
>> If you have "a good idea what's wrong with the twa driver", would you 
>> mind
>> sharing a stack trace or other information?  So far I have only been 
>> told that
>> "system hangs when I do heavy I/O".  This is _not_ reproducable here.
>> Have you run memtest86 on the machine?  Have you run a PCI analyzer on
>> your machine to see who is on the PCI bus before/during the hang?
> 
> We have done everything including asking to bring the machines that are 
> crashing to AMCC's offices which are down the street.  I have not been 
> doing the technical debugging but a few members of AMCC's staff have 
> been trying to help.  We've been running memtest, etc.  When the 
> machines hang there are no debugging options, it's completely frozen 
> without any details pointing to why.  Its not clear from that condition 
> whether the problem is due to an unacknowledged interrupt or a mutex 
> deadlock of some sort.  We are assuming that in this case it is due to 
> the driver trying to do work assuming the interrupt is valid and getting 
> stuck or returning early before the interrupt is acknowledged, causing 
> it to trigger over and over and over.
> 
> If you want to see it reproduced, we are more than happy to provide you 
> two machines that both have this condition.
> 
>> You claim the hang doesn't happen on the 6.2 series twa driver,
>> the driver changes between the 6.x and 7.x twa driver are _very_ minimal,
>> some simple time keeping changes, and some XPT_* path inquiry handling
>> changes.
> 
> Under 6.x the systems as built function completely stable.
> 
>> I am really surprised that you are trying to design servers around the
>> FreeBSD un-stable kernel.
> 
> There are other reasons for this which I don't want to discuss here, but 
> the other components we are using work very well within 7.0 and we have 
> a lot of performance gains that make it worth using a development 
> kernel.  The 10GbE drivers like mxge are having a lot of development 
> work done in HEAD and as a result the 6.x is getting left behind on some 
> of the work we are doing.  At the very least, I want to make sure I 
> deploy hardware that will function beyond 6.x.
> 
> 
> -Barrett

The biggest difference between 7-CURRENT and 6-STABLE right now in this
space is the MPSAFE work in CAM.  It should have been a complete NO-OP
for the 3ware driver, but it's always possible that either I overlooked
something, or the driver was doing something screwy before that was 
unsafe, and it's now being caught.

I'll look at this tonight, as well as look at committing the update that
Adam mentioned (sorry Adam!).  My 3ware hardware inventory is very 
limited, so if I can't spot the problem by code inspection then I'll 
need to work with you and Adam to help narrow it down.

Scott
Received on Tue May 08 2007 - 21:08:17 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:09 UTC