Re: Random processes hanging in unkillable state in -BETA6

From: Julian Elischer <julian_at_elischer.org>
Date: Wed, 29 Sep 2004 12:47:06 -0700
EEEKKK!

Damian Gerow wrote:

>Thus spake Julian Elischer (julian_at_elischer.org) [29/09/04 04:01]:
>: oh yeahh the output of the ps for that process would be good to seetoo.
>: (the ps in ddb)
>: 
>: there is an option to make ddb use printf()
>
>sysctl debug.ddb_use_print, for archival purposes.
>
>: which will make it's outut show up in dmesg after you 'c'
>: (continue) back running again.. otherwise you'd need a serial consol to 
>: record it all.
>
>It's up at <http://www.afflictions.org/~dgerow/ddb.out>.  I added in the
>commands I typed, just for clarity (not needed, I know).
>
>  - Damian
>

struct kg_sched {
        struct thread   *skg_last_assigned;        == NULL
        int     skg_avail_opennings;     ==  0x7d000  
<----------!!!!!!!!!!!!!!!!
        int     skg_concurrency;       = 1
        int     skg_runq_kses;        = 0
};\


in the 6 ksegrp scheduler private structures we have, we see:

skg_last_assigned       skg_avail_opennings     skg_concurrency   
skg_runq_kses

0               7d000           1               0
0               ce02            8c5             0 
0               7d000           1               0
0               7d000           1               0 
0               7d000           1               0 
0               1ecb0           408             0 


all the values of 7d000 are impossible.. in fact all the values in that column are
"impossible".

the values of 8c5 and 408 are also impossible for concurrency..

either we have corruption of the structures, or we have a failure to initialise the contents..
or we ahve a "leak" of opennings

looking at the values and the fact that 7d000 appears in several of them
I am suspicious that we didn't clear it properly at init.
 (goes to look at code..)
hmmm yep i tlooks like htat might be it.

try the following diff 
warning: cut'n'paste.. apply by hand.

diff -u -r1.199 kern_thread.c
--- kern/kern_thread.c  25 Sep 2004 00:53:46 -0000      1.199
+++ kern/kern_thread.c  29 Sep 2004 19:45:56 -0000
_at__at_ -282,13 +282,13 _at__at_
  * Initialize type-stable parts of a ksegrp (when newly created).
  */
 static int
-ksegrp_init(void *mem, int size, int flags)
+ksegrp_ctor(void *mem, int size, int flags)
 {
        struct ksegrp   *kg;
 
        kg = (struct ksegrp *)mem;
+       bzero(mem, size);
        kg->kg_sched = (struct kg_sched *)&kg[1];
-       /* sched_newksegrp(kg); */
        return (0);
 }
 
_at__at_ -369,7 +369,7 _at__at_
        tid_zone = uma_zcreate("TID", sizeof(struct tid_bitmap_part),
            NULL, NULL, NULL, NULL, UMA_ALIGN_CACHE, 0);
        ksegrp_zone = uma_zcreate("KSEGRP", sched_sizeof_ksegrp(),
-           NULL, NULL, ksegrp_init, NULL,
+           ksegrp_ctor, NULL, NULL, NULL,
            UMA_ALIGN_CACHE, 0);
        kseinit();      /* set up kse specific stuff  e.g. upcall zone*/
 }
Received on Wed Sep 29 2004 - 17:47:07 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:38:14 UTC