Re: Crash in accounting code: encode_long(), due to bad rusage data?

From: Jeff Roberson <jroberson_at_chesapeake.net>
Date: Sun, 19 Aug 2007 16:52:12 -0700 (PDT)
On Mon, 20 Aug 2007, Diomidis Spinellis wrote:

> Robert Watson wrote:
>> I recently upgraded two servers from FreeBSD 6-STABLE to FreeBSD 7-CURRENT 
>> in anticipation of the forthcoming release.  Both of them run with 
>> accounting enabled at all times.  When a large pine session was exiting on 
>> one of the two boxes, I ran into the following panic:
>> 
>> panic: encode_long: -ve value -32749
>
> Getting rid of the panic is easy:
>
> --- kern_acct.c	2007-08-20 01:15:18.000000000 +0300
> +++ kern_acct.c.new	2007-08-20 01:16:06.000000000 +0300
> _at__at_ -523,8 +523,7 _at__at_
> 	int norm_exp;	/* Normalized exponent */
> 	int shift;
>
> -	KASSERT(val >= 0,  ("encode_long: -ve value %ld", val));
> -	if (val == 0)
> +	if (val <= 0)
> 		return (0);
> 	norm_exp = fls(val) - 1;
> 	shift = FLT_MANT_DIG - norm_exp - 1;
>
> However, as you wrote, this doesn't fix the root of the problem.
>
>> I find the large negative value in ru_idrss somewhat sad to contemplate, 
>> and while this could well be a problem with capturing of process runtime 
>> information, I'd like it if the accounting code were robust against this 
>> sort of bug, rather than panicking, and I guess I'd also rather than the 
>> process runtime information also be correctly captured :-).
>
> Do you think it makes any sense for encode_long to be correctly encoding 
> negative numbers, or should we concentrate on locating and fixing the 
> negative ru_idrss problem?

The number overflowed.  Based on information from robert on IRC it 
probably wrapped more than once so the data is meaningless.  For this to 
continue to be useful we'd have to make irss and drss 64bit on 32bit 
platforms.  The problem probably doesn't occur on 64bit machines.

Basically irss/drss are kilobytes per tick.  If ticks are 1000 that works 
out to almost bytes per second of runtime.  So you can see how this easily 
overflows with a long-running high-memory application like pine.

What do you think about simply putting in a max value if we overflow?  We 
could then make a note about it in process accounting docs.  We might want 
to fix this in rusage as well.

Jeff


>
> Diomidis Spinellis - http://www.spinellis.gr
>
Received on Sun Aug 19 2007 - 21:49:33 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:16 UTC