Re: Interrupt Problems

From: Alexander Motin <mav_at_FreeBSD.org>
Date: Wed, 28 Jul 2010 21:39:40 +0300
David Naylor wrote:
> I have been having interrupt related problems with various subsystems.  I 
> suspect this is related to the changes in the event timer infrastructure.  
> 
> The subsystems that have experienced interrupt problems:
>  - hda: this is the easiest to reproduce and what I used to isolate the 
> commits.  I get ``pcm0: chn_write(): pcm0:virtual:dsp0.vp0: play interrupt 
> timeout, channel dead'' reported and sound no longer plays.
>  - nfe: this has happened on occasion with no reliable way to reproduce.  
> ``watchdog timeouts'' are reported.  After this happens all network traffic dies 
> and doing `ifconfig nfe0 down; ifconfig nfe0 up' panics the computer.
>  - dc: same thing as above.  
>  - nvidia: has reported interrupt timeouts.  This is independent of the 
> locking problem (that is fixed with recently published patch).  No reliable way 
> to reproduce, appears to happen when under heavy load.  X freezes as a result.  
>  - ata: I had a HDD detach twice.  I am not sure if this is related.  I have 
> two HDD, each attached to a different controller.  
> 
> I tested this by using a kernel built from a cvsup date of 2010/06/20 and 
> 2010/06/22 (at midnight for both, aka 00:00:00).  The former kernel does not 
> exhibit any problems while the latter does.  This problem is also present with 
> a kernel from today.  
> 
> The motherboard is a N650SLI-DS4L with one graphics card.  See attached for 
> more system information.  
> 
> Is there anything I can do to help diagnose the problem?  

Hardly I can explain how timer related changes could cause problems with
such a long list of devices, using different IRQs. MCP51 seems to have
quite bright history of different problems (at least I know about SATA
and HDA MSI problems), so I won't be very surprised if it is some one
more hardware-specific issue.

Does problem happens randomly or can be triggered somehow? Have you
tried to look what happens with interrupts during/after the problem
appears? Are all of them dying or selectively each time? Is there way to
restore operation after problem? Have you tried to switch to using other
event timers? HPET event timers were never used before this, so bugs are
not studied yet.

PS: Verbose dmesg could be more useful.

-- 
Alexander Motin
Received on Wed Jul 28 2010 - 16:39:46 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:05 UTC