Re: Panics after AHCI timeouts

From: C. P. Ghost <cpghost_at_cordula.ws> Date: Mon, 24 Oct 2011 20:27:49 +0200 · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:19 UTC

On Tue, Oct 18, 2011 at 3:13 PM, Alexey Shuvaev
<shuvaev_at_physik.uni-wuerzburg.de> wrote:
> On Tue, Oct 18, 2011 at 06:19:19AM +0800, Adrian Chadd wrote:
>> On 18 October 2011 03:00, Alexey Shuvaev
>> <shuvaev_at_physik.uni-wuerzburg.de> wrote:
>> > On Sat, Oct 08, 2011 at 10:14:56PM +0200, Alexey Shuvaev wrote:
>> >> Hello list!
>> >>
>> > Errr... Replying to myself... Ping? Should I file a PR and put it
>> > in the back burner? :)
>>
>> I think filing a PR is a good move. Then just be proactive and poke
>> people about it. It'd be good to get this fixed. :)
>>
> Done, kern/161768.
>
> Question to the list: does anybody see successful recovery from AHCI
> timeout an a recent CURRENT? Recent means June 2011 or newer, so 9.0
> branch counts also. That is, there are some kernel messages like this:
>
> ahcich0: Timeout on slot 29 port 0
> ahcich0: is 00000000 cs 00000000 ss ffffffff rs ffffffff tfd 40 serr 00000000 cmd 0000fc17
>
> but then AHCI recovers and the system does not panic?

I'm seeing these timeouts too on an 8.2-STABLE amd64 r222832
from June 7. The system hangs partially -- or, more precisely, all
processes that attempt to access the disk on this channel hang,
everything else continues as normal.

I suspect a faulty cable, but I don't have physical access to the system
to replace parts right now. A panic would be a regression, so I'm holding
off updates on that server until AHCI becomes more tolerant and somewhat
self-healing. :(

> Poking Alexey.

-cpghost.

-- 
Cordula's Web. http://www.cordula.ws/