Re: Panic on boot with r351461 (AMD ThreadRipper 2990WX)

From: Mark Johnston <markj_at_freebsd.org>
Date: Sun, 25 Aug 2019 12:15:13 -0400
On Sun, Aug 25, 2019 at 05:30:34PM +0300, Konstantin Belousov wrote:
> On Sun, Aug 25, 2019 at 07:17:20AM -0600, Rebecca Cran wrote:
> > On 2019-08-25 00:24, Konstantin Belousov wrote:
> > > What are the panic messages ?
> > 
> > Fatal trap 18: integer divide fault while in kernel mode
> > 
> > instruction pointer = 0x20:0xffffffff80f1027c
> > 
> > stack pointer = 0x28:0xffffffff845809f0
> > 
> > frame pointer = 0x28:0xffffffff84580a00
> > 
> > code segment = base 0x0, limit 0xffffff, type 0x1b
> > 
> >     = DPL 0, pres 1, long 1, def32 0, gran 1
> > 
> > processor eflags = resume, IOPL = 0
> > 
> > current process = 0 ()
> > 
> > trap number = 18
> > 
> > panic: integer divide fault
> > 
> > cpuid = 0
> > 
> > time = 1
> > 
> > 
> > > What is the source line ?
> > 
> > (gdb) info line *0xffffffff80f1027c
> > Line 102 of "/usr/src/sys/vm/vm_domainset.c" starts at address
> > 0xffffffff80f10267 <vm_domainset_iter_first+151>
> >    and ends at 0xffffffff80f1027f <vm_domainset_iter_first+175>.
> 
> There was one more source line I asked about.
> 
> So what happens, IMO, is that for memory-less domains ds_cnt is zero
> because ds_mask is zero, which causes the exception on divide.  You
> can try the following combined patch, but I really dislike the fact
> that I cannot safely use DOMAINSET_FIXED (if my diagnosis is correct).

I think this is simply a bug.  Something like the following hack should
work: we want to leave the _FIXED domainsets unmodified, but they should
be removed from the global list (to ensure that userspace cannot specify
impossible policies).

diff --git a/sys/kern/kern_cpuset.c b/sys/kern/kern_cpuset.c
index 87f9333bf43b..931fe7e157e5 100644
--- a/sys/kern/kern_cpuset.c
+++ b/sys/kern/kern_cpuset.c
_at__at_ -503,9 +503,17 _at__at_ domainset_empty_vm(struct domainset *domain)
 	int i, j, max;
 
 	max = DOMAINSET_FLS(&domain->ds_mask) + 1;
-	for (i = 0; i < max; i++)
-		if (DOMAINSET_ISSET(i, &domain->ds_mask) && VM_DOMAIN_EMPTY(i))
+	for (i = 0; i < max; i++) {
+		if (DOMAINSET_ISSET(i, &domain->ds_mask) &&
+		    VM_DOMAIN_EMPTY(i)) {
+			/*
+			 * Leave the domainset unmodified, in case it is a
+			 * static policy defined for use by the kernel.
+			 */
+			if (domain->ds_cnt == 1)
+				return (true);
 			DOMAINSET_CLR(i, &domain->ds_mask);
+		}
 	domain->ds_cnt = DOMAINSET_COUNT(&domain->ds_mask);
 	max = DOMAINSET_FLS(&domain->ds_mask) + 1;
 	for (i = j = 0; i < max; i++) {

> I would prefer for kmem_malloc_domainset(DOMAINSET_FIXED(unpopulated domain))
> to fail with NULL result, and then I would manually fall-back to
> DOMAINSET_PREF().
> 
> OTOH, I think the chunk for mp_realloc_cpu() is the final fix.

Looks ok to me.
Received on Sun Aug 25 2019 - 14:15:21 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:21 UTC