Re: Build failed in Jenkins: FreeBSD_HEAD-tests2 #854

From: Mateusz Guzik <mjguzik_at_gmail.com> Date: Fri, 20 Mar 2015 10:52:06 +0100 · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:56 UTC

On Fri, Mar 20, 2015 at 11:34:51AM +0200, Konstantin Belousov wrote:
> On Fri, Mar 20, 2015 at 03:20:26AM +0100, Mateusz Guzik wrote:
> > On Fri, Mar 20, 2015 at 02:08:23AM +0000, jenkins-admin_at_freebsd.org wrote:
> > > lib/libc/sys/setrlimit_test:setrlimit_nproc  ->  maxproc limit exceeded by uid 977 (pid 29170); see tuning(7) and login.conf(5)
> > > passed  [0.551s]
> > > lib/libc/sys/setrlimit_test:setrlimit_perm  ->  panic: mutex process lock not owned at /builds/FreeBSD_HEAD/sys/kern/kern_prot.c:1974
> > > cpuid = 1
> > > KDB: stack backtrace:
> > > db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe009749a8e0
> > > vpanic() at vpanic+0x189/frame 0xfffffe009749a960
> > > panic() at panic+0x43/frame 0xfffffe009749a9c0
> > > __mtx_assert() at __mtx_assert+0xc2/frame 0xfffffe009749a9d0
> > > proc_set_cred() at proc_set_cred+0x36/frame 0xfffffe009749a9f0
> > > fork1() at fork1+0x27e/frame 0xfffffe009749aac0
> > > sys_fork() at sys_fork+0x1f/frame 0xfffffe009749aae0
> > > amd64_syscall() at amd64_syscall+0x27f/frame 0xfffffe009749abf0
> > > Xfast_syscall() at Xfast_syscall+0xfb/frame 0xfffffe009749abf0
> > > --- syscall (0, FreeBSD ELF64, nosys), rip = 0x8019c216a, rsp = 0x7fffffffc2d8, rbp = 0x7fffffffc340 ---
> > > KDB: enter: panic
> > > [ thread pid 660 tid 100065 ]
> > > Stopped at      kdb_enter+0x3e: movq    $0,kdb_why
> > 
> > Weird, I'll look at that.
> 
> This is due to p_ucred not initialized on allocation of struc proc.
> The member is not in p_startzero/p_endzero region, so it contains
> garbage at the stage of the fork where proc_set_cred() is called,
> while the function makes assertion based on the p_ucred content.

Yes I figured, but proc_init left me quite puzzled:

static int
proc_init(void *mem, int size, int flags)
{
	struct proc *p;

	p = (struct proc *)mem;
	SDT_PROBE(proc, kernel, init, entry, p, size, flags, 0, 0);
	p->p_sched = (struct p_sched *)&p[1];
	bzero(&p->p_mtx, sizeof(struct mtx));
	mtx_init(&p->p_mtx, "process lock", NULL, MTX_DEF | MTX_DUPOK);

We bzero only the first mutex to make sure mtx_init works?

	mtx_init(&p->p_slock, "process slock", NULL, MTX_SPIN);
	mtx_init(&p->p_statmtx, "pstatl", NULL, MTX_SPIN);
	mtx_init(&p->p_itimmtx, "pitiml", NULL, MTX_SPIN);
	mtx_init(&p->p_profmtx, "pprofl", NULL, MTX_SPIN);

How about these?

So the trivial fix would be to move p_ucred or explicitely NULL it here.

All these mtx_init calls would use MTX_NEW flag and bzero could be
eliminated.

I'll commit a quick fix shortly.

I'm really confused what's the purpose of checking for double
initialisation of locks. I'm not aware of any actual bug caught by this,
on the other hand I'm aware of quite a few cases where bzero or M_ZERO
were used just to make sure mtx_init passes.

So preferably I would just get rid of such a check and effectively make
the behaviour as if MTX_NEW is always used.

-- 
Mateusz Guzik <mjguzik gmail.com>