Re: Kernel memory leak in ATAPI/CAM or ATAng?

From: Scott Long <scottl_at_freebsd.org>
Date: Fri, 07 Nov 2003 00:45:47 -0700
Kevin Oberman wrote:
>>Date: Thu, 6 Nov 2003 11:23:30 -0500 (EST)
>>From: Robert Watson <rwatson_at_freebsd.org>
>>
>>
>>On Thu, 6 Nov 2003, Kevin Oberman wrote:
>>
>>
>>>I have learned a bit more about the problems I have been having with
>>>the DVD drive on my T30 laptop. When I have run the drive for an
>>>extended time (like 2 or 3 hours), I invariably have my system lock up
>>>because it can't malloc kernel memory for the ATAPI/CAM or ATA
>>>device. (Usually it's both.)
>>>
>>>The only recovery seems to be to reboot the system.
>>
>>Is it possible to drop to DDB and generate a coredump at that point?  If
>>so, you can run vmstat on the core to look at memory use statistics in a
>>post-mortem way.  As to what to look for: "big numbers" is about the limit
>>of what I can suggest, I'm afraid :-).  Usually the activity of choice is
>>to compare vmstat statistics (with -m and -z) during normal operation and
>>when the leak has occurred, and look for any marked differences.  It's
>>worth observing that there are two failure modes here that appear almost
>>identical: (1) a memory leak resulting in address space exhaustion for the
>>kernel, and (2) a tunable maximum allocation being too high for the
>>available address space.  Note that (2) isn't a leak, simply a poorly
>>tuned value.  We've noticed a number of tuned memory limits were set when
>>memory sizes on systems were much lower, and so we've had to readjust the
>>tuning parameters for large memory systems.  Likewise, a number of
>>problems were observed when PAE was introduced, as some of the tuning
>>parameters scaled with the amount of physical memory, not with the
>>addressable space for the kernel.  So we probably want to be on the look
>>out for both of these possibilities.
> 
> 
> Well, I have no details to this point, but 'vmstat -m' makes the
> problem obvious. The amount of kernel memory allocated to ATA request
> climbs forever and after enough data is transferred, it runs out of
> KVM. This is a continual leak, and monitoring it on the running system
> makes it pretty clear that something is leaking. I don't think (2) is
> the issue. Because the field allocated in vmstat are not large enough,
> this is a bit hard to read. The field all merge into some REALLY large
> numbers. After reboot, it is <5K. When running mencode I see this
> increasing at a rate of a bit under 1.9 MB per minute.
> 
> It does not look like a tuning issue. No matter how big KVM is allowed
> to grow, it's only a matter of time until it is gone.
> 
> I am going to do some testing to see what operations seem to causse
> this. I assume it does not happen all of the time or everyone would
> have seen it. I suspect it only happens with ATAPI/CAM activity,
> possibly only with simultaneous ATA and ATAPI/COM activity.

Does vmstat -m show which malloc type is growing?  Knowing this will
greatly speed up the debugging process.

Thanks!

Scott
Received on Thu Nov 06 2003 - 22:46:31 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:37:28 UTC