Re: Is anything being done re: the pcm timeout issue?

From: Rusty Nejdl <rnejdl_at_ringofsaturn.com> Date: Tue, 10 Aug 2004 22:29:34 -0500 (CDT) · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:38:05 UTC

>> 
>> And I have seen that these will eventually stop working one by one
>> until I have none left.  lsof and fstat don't show any programs using 
>> them, but nonetheless, programms like xmms and gaim can't use them 
>> anymore.

Well, try as much as I could, I haven't been able to duplicate this tonight.  I've got 4 vchans setup and I was running madplay continuously on 4 channels for 4 hours and it worked the whole time.

> 
> The vchan code is fairly broken.  I was hoping to have to some time to
> work on this (and other problems in the top half of the sound code) before
> 5.3, but it looks like the clock has just about run out.

I'm not seeing the locked channels yet, but that doesn't mean that they aren't there.

> 
> 
>> Do you have any more details on the pcm play timeout?  Are you using
>> vchans?  What program are you using?
> 
> My suspicion is that there is either a problem in ich_intr() that it
> causing it to stop receiving interrupts or to stop calling chn_intr(), or
> there is enough interrupt latency to allow the DMA pointer to wrap and
> fool chn_dmaupdate() into thinking no data was consumed.  It is possible
> that the ich_intr() problem is specific to amd64.
> 
> I previously sent out these suggestions on how to debug the problem:

I remembered seeing these, but I'm learning as I go so that is a bit more than I can do at present.

> 
> 
> ------ Forwarded message ------
> From: Don Lewis <truckman_at_freebsd.org>
> Subject: Re: Questionable code in sys/dev/sound/pcm/channel.c
> Date: Tue, 27 Jul 2004 15:15:06 -0700 (PDT)
> To: mat_at_cnd.mcgill.ca
> Cc: freebsd-current_at_freebsd.org
> 
> 
> On 27 Jul, Mathew Kanner wrote:
> 
>> On Jul 26, John-Mark Gurney wrote:
>> 
>>> Conrad J. Sabatier wrote this message on Mon, Jul 26, 2004 at 16:35
>>> -0500:
>>> 
>>>> Why the formulaic calculation of timeout, if it's simply going to
>>>> be unconditionally set to 1 immediately afterwards anyway?  What's
>>>> going on here?
>>> 
>>> Well, if you look at the annotations, that absolute set of timeout
>>> was added in rev 1.65 by cg with the comment: tweaks to reduce
>>> latency/pauses in output
>>> 
>> 
>> 
>> I think this has been raised on the mailling list before.
>> IIRC, the logic for this is to check frequently for dead channels but
>> CG is the authoriy.
>> 
> 
> My suspicion is that this change was made to reduce the consequences of
> lost wakeups from the interrupt routine.  This would have been more of a 
> problem when tsleep() was used in chn_sleep() and shouldn't be needed now
> that the top and bottom halves of the code use the channel lock and 
> chn_sleep() uses msleep() to atomically release the lock and wait for the
> wakeup from the interrupt code.  That said, setting timeout to 1 shouldn't
> hurt anything and will just waste a bit of CPU time.
> 
> 
>>>> Also, at the end of the function:
>>>> 
>>>> 
>>>> if (count <= 0) { c->flags |= CHN_F_DEAD; printf("%s: play interrupt
>>>> timeout, channel dead\n", c->name); }
>>>> 
>>>> 
>>>> return ret; }
>>>> 
>>> 
>>> that was changed in rev1.52 (by cg also), and previously was just a
>>> check for count == 0..
>>> 
>>> So, I'd recommend a message off to cg and ask why he made this
>>> changes...
> 
> The original version of the code always set timeout to 1 and looped on
> (count > 0), so count could never go negative.  When the code was
> changed to set count to something larger than 1, count could go negative if
> (hz % timeout != 0), so the condition for setting CHN_F_DEAD had to
> be modified accordingly.
> 
> My suspicion is that there is sometimes enough latency in executing the
> interrupt routine that the hardware DMA pointer is wrapping and 
> chn_dmaupdate() is calculating delta as zero.  This would cause 
> chn_wrfeed() not to consume any data from the software buffer (and skip 
> the wakeup()), which might be enough to cause the chn_write() to time out
> while waiting for space to become available in the software buffer. It
> would be interesting to enable the debug code in chn_dmaupdate(), and add
> (delta == 0) as a condition to trigger the device_printf().
> 
> 
> The bigger question is what is the cause of the latency ...
> 
> 
> 
> ------ Forwarded message ------
> From: Don Lewis <truckman_at_freebsd.org>
> Subject: Re: Questionable code in sys/dev/sound/pcm/channel.c
> Date: Tue, 27 Jul 2004 15:21:57 -0700 (PDT)
> To: conrads_at_cox.net
> Cc: freebsd-current_at_freebsd.org
> 
> 
> On 27 Jul, Conrad J. Sabatier wrote:
> 
>> 
>> On 26-Jul-2004 Conrad J. Sabatier wrote:
>> 
>>> 
>>> On 26-Jul-2004 Conrad J. Sabatier wrote:
>>> 
>>>> I'm a little perplexed at the following bit of logic in chn_write()
>>>>  (which is where the "interrupt timeout, channel dead" messages are
>>>>  being generated).
>> 
>> [snip]
>> 
>> 
>>>> Also, at the end of the function:
>>>> 
>>>> 
>>>> if (count <= 0) { c->flags |= CHN_F_DEAD; printf("%s: play interrupt
>>>> timeout, channel dead\n", c->name); }
>>>> 
>>>> 
>>>> return ret; }
>>>> 
>>>> 
>>>> Could it be that the conditional test is wrong here?  Perhaps
>>>> we should be using (count < 0) instead?
>>> 
>>> I'm now running a kernel built with this last conditional test
>>> changed to "if (count < 0)" and sound is still working OK.  Have yet to
>>> see if this eliminates the interrupt timeout messages.
>> 
>> Well, that was a failure.  :-)  Didn't see any timeout error messages,
>> but the device still died eventually, nonetheless.  I've since changed 
>> back to the original code.
> 
> That's an interesting data point. At this point I'd start looking at the
> driver code for your sound hardware.  I suspect that the driver interrupt
> code is either no longer seeing interrupts, or it is no longer calling
> chn_intr().
> 
> 
>