Re: Call for bfe(4) testers.

From: John Baldwin <jhb_at_freebsd.org> Date: Mon, 4 Aug 2008 16:07:55 -0400 · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:33 UTC

On Monday 04 August 2008 02:29:19 pm Ulrich Spoerlein wrote:
> On Mon, 04.08.2008 at 10:02:05 +0900, Pyun YongHyeon wrote:
> > On Sun, Aug 03, 2008 at 12:56:27PM +0200, Ulrich Spoerlein wrote:
> > > no toe capability on 0xc40abc00
> > > 
> > > messages, but they don't seem the culprit. The stats sysctl also works
> > 
> > I think kmacy_at_ fixed this. Please update again.
> 
> I will, as I still get the panics with your patches backed out.
> 
> > > Fatal trap 12: page fault while in kernel mode
> > > cpuid = 0; apic id = 00
> > > fault virtual address   = 0x38
> > > fault code              = supervisor read, page not present
> > > instruction pointer     = 0x20:0xc058ec16
> > > stack pointer           = 0x28:0xfb7b6ac8
> > > frame pointer           = 0x28:0xfb7b6ac8
> > > code segment            = base 0x0, limit 0xfffff, type 0x1b
> > >                         = DPL 0, pres 1, def32 1, gran 1
> > > processor eflags        = interrupt enabled, resume, IOPL = 0
> > > current process         = 1327 (powerd)
> > > 
> > 
> > From this and the fault address 0x38 above suggests cpufreq(4)
> > dereferenced a NULL pointer. It seems powered(4) tried to set CPU
> > frequency and encountered page fault. Full backtrace would be
> > great help.
> 
> The kdb.enter.panic script is not called when panicking due to a page
> fault. Knowing this, I do have a backtrace handy:
> 
> Fatal trap 12: page fault while in kernel mode
> cpuid = 0; apic id = 00
> fault virtual address   = 0x38
> fault code              = supervisor read, page not present
> instruction pointer     = 0x20:0xc058ec16
> stack pointer           = 0x28:0xfb8b8ac8
> frame pointer           = 0x28:0xfb8b8ac8
> code segment            = base 0x0, limit 0xfffff, type 0x1b
>                         = DPL 0, pres 1, def32 1, gran 1
> processor eflags        = interrupt enabled, resume, IOPL = 0
> current process         = 1176 (powerd)
> db:0:kdb.enter.default>  show pcpu
> cpuid        = 0
> curthread    = 0xc4ec0aa0: pid 1176 "powerd"
> curpcb       = 0xfb8b8d90
> fpcurthread  = none
> idlethread   = 0xc3f80cc0: pid 10 "idle: cpu0"
> APIC ID      = 0
> currentldt   = 0x50
> db:0:kdb.enter.default>  bt
> Tracing pid 1176 tid 100103 td 0xc4ec0aa0
> device_is_attached(0,c87e6b40,fb8b8afc,0,101,...) at device_is_attached+0x6
> cf_set_method(c420b600,c87e6b40,64,fb8b8ba4,c87e33b4,...) at 
cf_set_method+0x6a3
> cpufreq_curr_sysctl(c420d840,c4207000,0,fb8b8ba4,fb8b8ba4,...) at 
cpufreq_curr_sysctl+0x232
> sysctl_root(fb8b8ba4,4,1,c4ec0aa0,c4501d38,...) at sysctl_root+0x137
> userland_sysctl(c4ec0aa0,fb8b8c14,4,0,0,...) at userland_sysctl+0x151
> __sysctl(c4ec0aa0,fb8b8cfc,18,fb8b8ca0,46,...) at __sysctl+0xec
> syscall(fb8b8d38) at syscall+0x345
> Xint0x80_syscall() at Xint0x80_syscall+0x20
> --- syscall (202, FreeBSD ELF32, __sysctl), eip = 0x28161bd3, esp = 
0xbfbfe8cc, ebp = 0xbfbfe8f8 ---
> db:0:kdb.enter.default>  capture off
> 
> Seems like I caught RELENG_7 during a bad time. Will update again.

What cpufreq drivers do you have loaded and attached?  This patch might work 
around the issue, but I suspect there is a bug in one of the cpufreq drivers.

Index: kern_cpu.c
===================================================================
RCS file: /usr/cvs/src/sys/kern/kern_cpu.c,v
retrieving revision 1.27.2.2
diff -u -r1.27.2.2 kern_cpu.c
--- kern_cpu.c  9 May 2008 19:02:10 -0000       1.27.2.2
+++ kern_cpu.c  4 Aug 2008 20:07:41 -0000
_at__at_ -329,6 +329,8 _at__at_
        /* Next, set any/all relative frequencies via their drivers. */
        for (i = 0; i < level->rel_count; i++) {
                set = &level->rel_set[i];
+               if (set->dev == NULL)
+                       continue;
                if (!device_is_attached(set->dev)) {
                        error = ENXIO;
                        goto out;

-- 
John Baldwin