Re: sound/pcm/* bugs (was: Re: page fault panic tracked down (selwakeuppri()) - really sound/pcm/*)

From: Mathew Kanner <mat_at_cnd.mcgill.ca>
Date: Sat, 10 Jan 2004 13:39:55 -0500
On Jan 08, Don Lewis wrote:
> On  8 Jan, Mathew Kanner wrote:
> > On Jan 08, Don Lewis wrote:
> >> On  7 Jan, Stefan Ehmann wrote:
> >> > On Wed, 2004-01-07 at 21:15, Don Lewis wrote:
> >> >> Try the following patch.  I won't guarantee that you still won't see
> >> >> panics, but at least it should panic cleanly instead of stomping on some
> >> >> innocent block of memory that corrupts data or some critical structure
> >> >> that will trigger a mysterious and hard to debug panic at some later
> >> >> time.
> >> >> 
> >> >> The reason for changing the KASSERT to a test and an explicit call to
> >> >> panic() is to make sure that the error is always caught because the
> >> >> sound code is typically not compiled with INVARIANTS, so the KASSERT
> >> >> will typically be ignored.  The vchan stuff really needs a proper fix,
> >> >> but I don't understand the sound code well enough.
> >> > 
> >> > Nice patch - unfortunately sound doesn't work any longer with vchans
> >> > enabled (which is set to 4 at boot time here).
> >> > 
> >> > I get "pcm0:virtual:0: play interrupt timeout, channel dead" whenever I
> >> > try to play anything.
> >> > 
> >> > However, if I do "hw.snd.pcm0.vchans: 4 -> 0" it works nice again.
> >> > 
> >> > I'll probably try later if I'm able to get a panic with vchans disabled
> >> > (Especially those that had - at least not in an obvious way - any sound
> >> > stuff in backtrace)
> >> >  
> >> > If not, maybe the panic gets triggered if for some reason the dsp device
> >> > isn't ready again after a song played and a new virtual dsp device is
> >> > opened and vchan code gets in action (combined with other bad things).
> >> > 
> >> > It can't be vchans alone because I just had three mpg123s open and they
> >> > just played fine.
> >> 
> >> Some other wierdness that I noticed is that if one of the
> >> chn_setblocksize() is called for one of the vchans, vchan_setblocksize()
> >> will get called, which will call chn_notify(parent, CHN_N_BLOCKSIZE).
> >> When this happens, the parent will interate over all of its children
> >> looking for the one with the minimum bufhard blksz.  It will then call
> >> chn_setblocksize() on itself, and chn_setblocksize() will call
> >> sndbuf_remalloc() on its bufsoft, which will set reallocate the buffer
> >> with the size (blkcnt * blksz).  If this channel is the parent vchan and
> >> the new size of bufsoft is smaller than the size of bufhard (which never
> >> gets reallocated), feed_vchan_s16() will write past the end of bufsoft
> >> and things will go boom sometime later.
> >> 
> >> Try the patch below in place of my previous patch.  As you might guess,
> >> I'm grasping at straws.
> >> 
> > 
> > 	Hello Don,
> > 	I'm joining this discussion late and I dont understand the
> > circumstances that cause the panic.  Is it reproduceable with by a regular
> > user?
> 
> Yes.
> 
> The problem is that a vchan is getting tweaked so that the size of its
> bufsoft is smaller that the size of its bufhard.  When we finally got to
> the point where we could panic the system close to where the fatal
> problem occured (by compiling with INVARIANTS so that KASSERTS are
> checked), we found that the size of bufhard was 2048 and bufsoft was
> 688.  We tripped the KASSERT when chn_intr() called chn_wrintr(), which
> called chn_wrfeed() which called sndbuf_feed(), which called
> feed_vchan_s16().
> 
> chn_wrfeed() calculates "amt" as the free space in bufhard, which ends
> up being 2048, then it passes bufsoft, bufhard, and amt to sndbuf_feed()
> as the "from", "to", and "count" parameters.
> 
> sndbuf_feed() verifies that the free space in "to" aka bufhard is >=
> "count", which is obviously going to be true.  It then feed_vchan_s16(),
> passing to->tmpbuf (bufsoft's temporary buffer), "count", and "from"
> to feed_vchan_s16() as the "b", "count", and "source" parameters.
> 
> If feed_vchan_s16() has been compiled with INVARIANTS, it will panic if
> the size of "source" (bufsoft) is less than count, but since modules
> historically not been compiled with kernel options, and/or FreeBSD
> releases are not compiled with INVARIANTS, the KASSERT code here is
> typically ignored.  If so, feed_vchan_s16() executes the following code:
>         tmp = (int16_t *)sndbuf_getbuf(src);
>         bzero(tmp, count);
> If src aka bufsoft is smaller than count, bzero() will write past the
> end.  The system may run for a while, but eventually some vital kernel
> data structure may get stomped on and the system will panic() for some
> mysterious reason that is unlikely to appear related to the sound code.

	Hello Don,
	(I want to say that you've completely amazed me with your
deductive powers).

	This is my first impression after trying to understand the
this.
	(frag size == block size)
	I've seen problems with programs manipulating sound buffers
(frag size/block count) in stupid ways.  One patch I proposed limits
underruns by reasonably bounding the size of the fragment.

	Thinking along these lines, I propose we ignore fragment sizes
that are smaller than the parent (in the vchan case) which should the
address the problem as per your analysis.  Page 97-98 of the OSS
programmers guide says that this is perfectly legal.

	Perhaps a heavy handed approach but until someone can untangle
and own this problem...

	Whimsically, I wish some super-hacker could wrestle, unify,
unifdef and de-kobj the sound code so I could eventually comprehend
it.
	
	I hope to have more this afternoon.

	--Mat

-- 
	If you optimize everything, you will always be unhappy.
			- Don Knuth
Received on Sat Jan 10 2004 - 09:43:26 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:37:37 UTC