Re: PMC enters debugger with NMI on ctrl-C

From: Joseph Koshy <joseph.koshy_at_gmail.com>
Date: Sun, 21 May 2006 16:46:19 +0530
> tiger-2# pmcstat -S unhalted-cycles -O /tmp/sample.out
> ^CNMI ... going to debugger
> [thread pid 898 tid 100175 ]
> Stopped at      p4_stop_pmc+0x70:       movl    $0,%eax
> db> bt
> Tracing pid 898 tid 100175 td 0xc765c360
> p4_stop_pmc(0,1,1,0,0) at p4_stop_pmc+0x70
> pmc_release_pmc_descriptor(c703ca00,c06c03f3,c1032860,c103cac8,e992fb18) at
> pmc_release_pmc_descriptor+0x61
> pmc_syscall_handler(c765c360,e992fd04,2,202,c09c07d8) at
> pmc_syscall_handler+0xf6f
> syscall(3b,3b,bfbf003b,0,8050ee0) at syscall+0x2ee
> Xint0x80_syscall() at Xint0x80_syscall+0x1f
> --- syscall (210, FreeBSD ELF32, pmc_syscall_handler), eip = 0x280d104d, esp =
> 0xbfbfe6e0, ebp = 0xbfbfe6f8 ---
> db>

> I'll leave it in the debugger overnight in case there's
> anything useful to be done with it.

There appears to be a race here and here is my guess as
to how this is getting triggered.

 0. An NMI is posted by the PMC and is in the process of
    working its way through the processors innards.

 1. In the meantime, at p4_stop_pmc+0x6e, the CPU turns off
    the interrupting PMC by turning off its enable bit with a
    WRMSR instruction.

    This also zeroes the CCCR_OVF bits in that register.

 2. The processor now takes its NMI interrupt at the boundary
    of the WRMSR instruction (at p4_stop_pmc+6e).  The
    PMC handler p4_intr() doesn't find any PMC with a
    CCCR_OVF bit set and so assumes that the NMI wasn't
    caused by a PMC.  It bounces the NMI to trap()
    which promptly panics.

I need to think about this.

Here is a work-around for now:

Index: hwpmc_piv.c
===================================================================
RCS file: /cvs/FreeBSD/src/sys/dev/hwpmc/hwpmc_piv.c,v
retrieving revision 1.13
diff -u -u -r1.13 hwpmc_piv.c
--- hwpmc_piv.c 28 Mar 2006 14:09:21 -0000      1.13
+++ hwpmc_piv.c 21 May 2006 11:02:26 -0000
_at__at_ -1698,7 +1698,7 _at__at_
        atomic_add_int(did_interrupt ? &pmc_stats.pm_intr_processed :
            &pmc_stats.pm_intr_ignored, 1);

-       return did_interrupt;
+       return 1;
 }



> BTW, saw some oddness.  I capture the PMC samples on one
> box, and post-process on another.  This results in the
> following oddness: I used the above pmcstat command to
> track unhalted-cycles on a Dual Xeon, then post-processed
> on an amd64 box, so pmcstat generated gmon output with
> the name p4-global-power-events.  Perhaps pmcstat should
> capture the event name in its data file so that when doing
> later post-processing, it can use the names from the
> machine the captures were on, rather than the names of the
> machine the processing is being done on?

It does this already.  'unhalted-cycles' is an alias that
is converted to the machine specific PMC name at the time
of data collection.  Intel P4 Xeons alias unhalted-cycles to
event 'p4-global-power-events'.  On an amd64 'unhalted-cycles'
maps to 'k8-bu-cpu-clk-unhalted'.

-- 
FreeBSD Developer,     http://people.freebsd.org/~jkoshy
Received on Sun May 21 2006 - 09:16:20 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:38:56 UTC