Re: panic: destroying non-empty racct: 2113536 allocated for resource 4

From: Andriy Gapon <avg_at_FreeBSD.org>
Date: Mon, 6 Jun 2016 15:04:14 +0300
I've just got the same panic again.
This time I didn't do anything unusual, just ran a poudriere build and the
systems paniced at the end of it:

Unread portion of the kernel message buffer:
panic: destroying non-empty racct: 2113536 allocated for resource 4

KDB: stack backtrace:
db_trace_self_wrapper() at 0xffffffff804131eb = db_trace_self_wrapper+0x2b/frame
0xfffffe051992a7f0
kdb_backtrace() at 0xffffffff806636d9 = kdb_backtrace+0x39/frame 0xfffffe051992a8a0
vpanic() at 0xffffffff8062dd9c = vpanic+0x14c/frame 0xfffffe051992a8e0
panic() at 0xffffffff8062dae3 = panic+0x43/frame 0xfffffe051992a940
racct_destroy_locked() at 0xffffffff8061eebc = racct_destroy_locked+0xac/frame
0xfffffe051992a960
racct_destroy() at 0xffffffff8061ede5 = racct_destroy+0x35/frame 0xfffffe051992a980
prison_racct_free_locked() at 0xffffffff805fdcdc =
prison_racct_free_locked+0x4c/frame 0xfffffe051992a9a0
prison_racct_free() at 0xffffffff805fdc2d = prison_racct_free+0x6d/frame
0xfffffe051992a9c0
prison_racct_detach() at 0xffffffff805fdd8e = prison_racct_detach+0x3e/frame
0xfffffe051992a9e0
prison_deref() at 0xffffffff805fb26b = prison_deref+0x23b/frame 0xfffffe051992aa10
prison_remove_one() at 0xffffffff805fc9c5 = prison_remove_one+0x125/frame
0xfffffe051992aa40
sys_jail_remove() at 0xffffffff805fc884 = sys_jail_remove+0x204/frame
0xfffffe051992aa90
syscallenter() at 0xffffffff80820cdd = syscallenter+0x31d/frame 0xfffffe051992ab00
amd64_syscall() at 0xffffffff808208af = amd64_syscall+0x1f/frame 0xfffffe051992abf0
Xfast_syscall() at 0xffffffff80808d5b = Xfast_syscall+0xfb/frame 0xfffffe051992abf0

It's interesting that the resource and the value are exactly the same.
I have a crash dump this time as well.


On 17/05/2016 09:22, Andriy Gapon wrote:
> 
> To be fair I got this panic after some exotic sequence of events: running
> poudriere, sending SIGSTOP to one of build processes, forgetting about it,
> seeing poudriere timeout that job, sending SIGCONT...
> 
> This is amd64 head r297350.
> 
> Some details:
> (kgdb) bt
> #0  doadump (textdump=1) at /usr/src/sys/kern/kern_shutdown.c:295
> #1  0xffffffff8062d7ef in kern_reboot (howto=<optimized out>) at
> /usr/src/sys/kern/kern_shutdown.c:363
> #2  0xffffffff8062de38 in vpanic (fmt=<optimized out>, ap=0xfffffe0519b73920) at
> /usr/src/sys/kern/kern_shutdown.c:639
> #3  0xffffffff8062db43 in panic (fmt=<unavailable>) at
> /usr/src/sys/kern/kern_shutdown.c:572
> #4  0xffffffff8061ef1c in racct_destroy_locked (racctp=<optimized out>) at
> /usr/src/sys/kern/kern_racct.c:478
> #5  0xffffffff8061ee45 in racct_destroy (racct=0xfffff802f6301518) at
> /usr/src/sys/kern/kern_racct.c:495
> #6  0xffffffff805fdd3c in prison_racct_free_locked (prr=0xfffff802f6301400) at
> /usr/src/sys/kern/kern_jail.c:4564
> #7  0xffffffff805fdc8d in prison_racct_free (prr=0xfffff802f6301400) at
> /usr/src/sys/kern/kern_jail.c:4583
> #8  0xffffffff805fddee in prison_racct_detach (pr=0xfffff802b0730000) at
> /usr/src/sys/kern/kern_jail.c:4658
> #9  0xffffffff805fb2cb in prison_deref (pr=<optimized out>, flags=3) at
> /usr/src/sys/kern/kern_jail.c:2663
> #10 0xffffffff805fca25 in prison_remove_one (pr=<optimized out>) at
> /usr/src/sys/kern/kern_jail.c:2358
> #11 0xffffffff805fc8e4 in sys_jail_remove (td=<optimized out>, uap=<optimized
> out>) at /usr/src/sys/kern/kern_jail.c:2313
> #12 0xffffffff80820ddd in syscallenter (td=0xfffff801146019e0,
> sa=0xfffffe0519b73b80) at /usr/src/sys/amd64/amd64/../../kern/subr_syscall.c:135
> #13 0xffffffff808209af in amd64_syscall (td=0xfffff801146019e0, traced=0) at
> /usr/src/sys/amd64/amd64/trap.c:943
> 
> RACCT_RSS is 4.
> 
> (kgdb) p *prr
> $5 = {
>   prr_next = {
>     le_next = 0xfffff80382fe4400,
>     le_prev = 0xfffff8017ac90600
>   },
>   prr_name = "basejail-default-job-03", '\000' <repeats 232 times>,
>   prr_refcount = 0,
>   prr_racct = 0xfffff802e3f520b0
> }
> (kgdb) p *prr->prr_racct
> $6 = {
>   r_resources = {13884177072, 0, 0, 0, 2113536, 0 <repeats 14 times>,
> 13611325009, 0},
>   r_rule_links = {
>     lh_first = 0x0
>   }
> }
> 
> Could it be that somehow the CONT'd process failed to deduct its resources from
> the jail's resources because the jail was already marked for destruction or
> something like that?
> 


-- 
Andriy Gapon
Received on Mon Jun 06 2016 - 10:05:19 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:05 UTC