Re: sound/pcm/* bugs (was: Re: page fault panic tracked down (selwakeuppri()) - really sound/pcm/*)

From: Don Lewis <truckman_at_FreeBSD.org>
Date: Thu, 8 Jan 2004 12:55:43 -0800 (PST)
On  8 Jan, Mathew Kanner wrote:
> On Jan 08, Don Lewis wrote:
>> On  7 Jan, Stefan Ehmann wrote:
>> > On Wed, 2004-01-07 at 21:15, Don Lewis wrote:
>> >> Try the following patch.  I won't guarantee that you still won't see
>> >> panics, but at least it should panic cleanly instead of stomping on some
>> >> innocent block of memory that corrupts data or some critical structure
>> >> that will trigger a mysterious and hard to debug panic at some later
>> >> time.
>> >> 
>> >> The reason for changing the KASSERT to a test and an explicit call to
>> >> panic() is to make sure that the error is always caught because the
>> >> sound code is typically not compiled with INVARIANTS, so the KASSERT
>> >> will typically be ignored.  The vchan stuff really needs a proper fix,
>> >> but I don't understand the sound code well enough.
>> > 
>> > Nice patch - unfortunately sound doesn't work any longer with vchans
>> > enabled (which is set to 4 at boot time here).
>> > 
>> > I get "pcm0:virtual:0: play interrupt timeout, channel dead" whenever I
>> > try to play anything.
>> > 
>> > However, if I do "hw.snd.pcm0.vchans: 4 -> 0" it works nice again.
>> > 
>> > I'll probably try later if I'm able to get a panic with vchans disabled
>> > (Especially those that had - at least not in an obvious way - any sound
>> > stuff in backtrace)
>> >  
>> > If not, maybe the panic gets triggered if for some reason the dsp device
>> > isn't ready again after a song played and a new virtual dsp device is
>> > opened and vchan code gets in action (combined with other bad things).
>> > 
>> > It can't be vchans alone because I just had three mpg123s open and they
>> > just played fine.
>> 
>> Some other wierdness that I noticed is that if one of the
>> chn_setblocksize() is called for one of the vchans, vchan_setblocksize()
>> will get called, which will call chn_notify(parent, CHN_N_BLOCKSIZE).
>> When this happens, the parent will interate over all of its children
>> looking for the one with the minimum bufhard blksz.  It will then call
>> chn_setblocksize() on itself, and chn_setblocksize() will call
>> sndbuf_remalloc() on its bufsoft, which will set reallocate the buffer
>> with the size (blkcnt * blksz).  If this channel is the parent vchan and
>> the new size of bufsoft is smaller than the size of bufhard (which never
>> gets reallocated), feed_vchan_s16() will write past the end of bufsoft
>> and things will go boom sometime later.
>> 
>> Try the patch below in place of my previous patch.  As you might guess,
>> I'm grasping at straws.
>> 
> 
> 	Hello Don,
> 	I'm joining this discussion late and I dont understand the
> circumstances that cause the panic.  Is it reproduceable with by a regular
> user?

Yes.

The problem is that a vchan is getting tweaked so that the size of its
bufsoft is smaller that the size of its bufhard.  When we finally got to
the point where we could panic the system close to where the fatal
problem occured (by compiling with INVARIANTS so that KASSERTS are
checked), we found that the size of bufhard was 2048 and bufsoft was
688.  We tripped the KASSERT when chn_intr() called chn_wrintr(), which
called chn_wrfeed() which called sndbuf_feed(), which called
feed_vchan_s16().

chn_wrfeed() calculates "amt" as the free space in bufhard, which ends
up being 2048, then it passes bufsoft, bufhard, and amt to sndbuf_feed()
as the "from", "to", and "count" parameters.

sndbuf_feed() verifies that the free space in "to" aka bufhard is >=
"count", which is obviously going to be true.  It then feed_vchan_s16(),
passing to->tmpbuf (bufsoft's temporary buffer), "count", and "from"
to feed_vchan_s16() as the "b", "count", and "source" parameters.

If feed_vchan_s16() has been compiled with INVARIANTS, it will panic if
the size of "source" (bufsoft) is less than count, but since modules
historically not been compiled with kernel options, and/or FreeBSD
releases are not compiled with INVARIANTS, the KASSERT code here is
typically ignored.  If so, feed_vchan_s16() executes the following code:
        tmp = (int16_t *)sndbuf_getbuf(src);
        bzero(tmp, count);
If src aka bufsoft is smaller than count, bzero() will write past the
end.  The system may run for a while, but eventually some vital kernel
data structure may get stomped on and the system will panic() for some
mysterious reason that is unlikely to appear related to the sound code.

> 	Anyway, the area which you are working (vchans, feeders, etc)
> is extremely confusing to me but I will spend some time with it this
> weekend.

You are not the only one confused ...

> 	I think Orion might be a good person to ask about the role of
> buffers and channels, I've cc'ed him with this e-mail.
> 
> 	--Mat
> 	
Received on Thu Jan 08 2004 - 11:56:41 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:37:37 UTC