Re: CURRENT + amd64 + user-ppp = panic

From: Victor Snezhko <snezhko_at_indorsoft.ru>
Date: Fri, 04 Nov 2005 17:01:46 +0600
Victor Snezhko <snezhko_at_indorsoft.ru> writes:

>>> (kgdb) up 11
>>> #11 0xc066e0c2 in softclock (dummy=0x0) at
>>> /usr/src/sys/kern/kern_timeout.c:220 220				if (c->c_time != curticks) {
>>> (kgdb) list
>>> 215			curticks = softticks;
>>> 216			bucket = &callwheel[curticks & callwheelmask];
>>> 217			c = TAILQ_FIRST(bucket);
>>> 218			while (c) {
>>> 219				depth++;
>>> 220				if (c->c_time != curticks) {
>>> 221					c = TAILQ_NEXT(c, c_links.tqe);
>>> 222					++steps;
>>> 223					if (steps >= MAX_SOFTCLOCK_STEPS) {
>>> 224						nextsoftcheck = c;
>>> (kgdb) print *bucket
>>> $1 = {tqh_first = 0xc1891d80, tqh_last = 0xc1891d80}
>>> (kgdb) print c
>>> $2 = (struct callout *) 0xdeadc0de
>>> (kgdb) print *(bucket->tqh_first)
>>> $3 = {c_links = {sle = {sle_next = 0xdeadc0de}, tqe = {tqe_next =
>>> 0xdeadc0de, tqe_prev = 0xdeadc0de}}, c_time = -559038242, c_arg =
>>> 0xdeadc0de, c_func = 0xdeadc0de, c_mtx = 0xdeadc0de, c_flags = -559038242}
>>> (kgdb) print steps
>>> $4 = 1
>>
>> Well, from thus it seems that a callout was free'd while it was still on the 
>> list.  Perhaps there is a case wehre callout_stop() isn't called.  Also, 
>> callout_drain() should really be used.  If the callout function is rearming, 
>> then it might have been running when callout_stop() is called, and it could 
>> have rearmed itself and then been overwritten when it was freed.  In fact, 
>> that is likely your problem.  You can try this patch, but there might be lock 
>> order problems that would require the callout_drain() to happen later when 
>> locks aren't held:
>>
>> Index: nd6.c
>> ===================================================================
>> RCS file: /usr/cvs/src/sys/netinet6/nd6.c,v
>> retrieving revision 1.62
>> diff -u -r1.62 nd6.c
>> --- nd6.c       22 Oct 2005 05:07:16 -0000      1.62
>> +++ nd6.c       3 Nov 2005 19:56:42 -0000
>> _at__at_ -398,7 +398,7 _at__at_
>>         if (tick < 0) {
>>                 ln->ln_expire = 0;
>>                 ln->ln_ntick = 0;
>> -               callout_stop(&ln->ln_timer_ch);
>> +               callout_drain(&ln->ln_timer_ch);
>>         } else {
>>                 ln->ln_expire = time_second + tick / hz;
>>                 if (tick > INT_MAX) {
>
> Hmmm, no, this patch didn't change anything for me. The same trap, the
> same bucket full of garbage.
>
> Tomorrow I'll try to trace all callout-related operations in nd6
> and/or the whole netinet6. 

Hmmm... trace shows that the callout_stop/callout_drain call
always receives a pointer that has not been initialized via
callout_init, at least not in /usr/src/sys/netinet6/*

I'll debug this further and report the results.

-- 
WBR, Victor V. Snezhko
EMail: snezhko_at_indorsoft.ru
Received on Fri Nov 04 2005 - 10:02:00 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:38:47 UTC