Re: Crash in accounting code: encode_long(), due to bad rusage data?y

From: Mike Pritchard <mpp_at_freebsd.org> Date: Tue, 21 Aug 2007 01:31:08 -0500 · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:16 UTC

On Mon, Aug 20, 2007 at 01:23:34AM +0300, Diomidis Spinellis wrote:
> Robert Watson wrote:
> >I recently upgraded two servers from FreeBSD 6-STABLE to FreeBSD 7-CURRENT in 
> >anticipation of the forthcoming release.  Both of them run with accounting 
> >enabled at all times.  When a large pine session was exiting on one of the two 
> >boxes, I ran into the following panic:
> >panic: encode_long: -ve value -32749
> 
> Getting rid of the panic is easy:
> 
> --- kern_acct.c	2007-08-20 01:15:18.000000000 +0300
> +++ kern_acct.c.new	2007-08-20 01:16:06.000000000 +0300
> _at__at_ -523,8 +523,7 _at__at_
>  	int norm_exp;	/* Normalized exponent */
>  	int shift;
> 
> -	KASSERT(val >= 0,  ("encode_long: -ve value %ld", val));
> -	if (val == 0)
> +	if (val <= 0)
>  		return (0);
>  	norm_exp = fls(val) - 1;
>  	shift = FLT_MANT_DIG - norm_exp - 1;
> 
> However, as you wrote, this doesn't fix the root of the problem.
> 
> >I find the large negative value in ru_idrss somewhat sad to contemplate, and 
> >while this could well be a problem with capturing of process runtime 
> >information, I'd like it if the accounting code were robust against this sort 
> >of bug, rather than panicking, and I guess I'd also rather than the process 
> >runtime information also be correctly captured :-).

Thats the exact same fix I applied to my system to work around this panic.
I started seeing it over a month ago (but that system had been running a 4
or 5 month old kernel until then).  But have just been to busy to look into 
it more than preventing the panic.

I always see it from a perl proc spawned by procmail/spamassin.  I think
something in that code path is buggy, because a process that is only
around for 2 mins tops shouldn't be able to generate this panic.
I think a KASSERT around the integral update for idrss might be in order
(I think thats the value that looked like it was overflowing... Sorry, its
been a a month since I looked at this)

(rest of this is off topic, sorry...but been thinking about mothballing
that computer, but its been so good to me...)

I'm guessing less and less of us are running 32bit uniprocessor machines.
The machine I see it on is my server, which is a 9 year old K6-2-550
machine.  Its a workhorse.  lost 2 cpu fans (maybe 3) and 1 Hd and it 
is still kicking!  I want to replace it, but its been been so good
to me.  Its best track record:  (shut it down due to a power outage
that my UPS couldn't ride out...power was out for 3 days it turns out)

 4:14PM  up 505 days, 12:01, 2 users, load averages: 38.23, 19.59, 12.24
-- 
Mike Pritchard
mpp _at_ FreeBSD.org
"If tyranny and oppression come to this land, it will be in the guise
of fighting a foreign enemy."  - James Madison (1787)