On 28/10/2016 16:01, Andriy Gapon wrote: > I suspect that under the right conditions it's possible for wrmsr to cause a > counter overflow, such that an interrupt (if enabled) is generated after wrmsr > is executed, even if wrmsr disables the counter. > > In amd_intr() we have this code: > wrmsr(evsel, config & ~AMD_PMC_ENABLE); > wrmsr(perfctr, AMD_RELOAD_COUNT_TO_PERFCTR_VALUE(v)); > > /* Restart the counter if logging succeeded. */ > error = pmc_process_interrupt(cpu, PMC_HR, pm, tf, > TRAPF_USERMODE(tf)); > if (error == 0) > wrmsr(evsel, config | AMD_PMC_ENABLE); > > I suspect that in the scenario above, if it is indeed possible, the last wrmsr > would re-enable the counter that's supposed to be stopped. > > I think that writing back the original value should be more correct, that is: > wrmsr(evsel, config); > > I'll test if this change would help. So, I have tried this change: --- a/sys/dev/hwpmc/hwpmc_amd.c +++ b/sys/dev/hwpmc/hwpmc_amd.c _at__at_ -577,6 +577,7 _at__at_ amd_start_pmc(int cpu, int ri) PMCDBG1(MDP,STA,2,"amd-start config=0x%x", config); + KASSERT(cpu == PCPU_GET(cpuid), ("requested cpu is not current cpu")); wrmsr(pd->pm_evsel, config); return 0; } _at__at_ -613,6 +614,7 _at__at_ amd_stop_pmc(int cpu, int ri) /* turn off the PMC ENABLE bit */ config = pm->pm_md.pm_amd.pm_amd_evsel & ~AMD_PMC_ENABLE; + KASSERT(cpu == PCPU_GET(cpuid), ("requested cpu is not current cpu")); wrmsr(pd->pm_evsel, config); return 0; } _at__at_ -676,6 +678,7 _at__at_ amd_intr(int cpu, struct trapframe *tf) perfctr = AMD_PMC_PERFCTR_0 + i; v = pm->pm_sc.pm_reloadcount; config = rdmsr(evsel); + PMCDBG1(MDP,INT,2, "enabled=%d", config & AMD_PMC_ENABLE); KASSERT((config & ~AMD_PMC_ENABLE) == (pm->pm_md.pm_amd.pm_amd_evsel & ~AMD_PMC_ENABLE), _at__at_ -689,12 +692,13 _at__at_ amd_intr(int cpu, struct trapframe *tf) error = pmc_process_interrupt(cpu, PMC_HR, pm, tf, TRAPF_USERMODE(tf)); if (error == 0) - wrmsr(evsel, config | AMD_PMC_ENABLE); + wrmsr(evsel, config); } atomic_add_int(retval ? &pmc_stats.pm_intr_processed : &pmc_stats.pm_intr_ignored, 1); + PMCDBG1(MDP,INT,3, "reval=%d", retval); return (retval); } And I couldn't reproduce the problem with it. Also, in the debug log I see the following, for instance: 315466 1 1068994822044 MDP:INT:3: reval=1 315465 1 1068994821176 MDP:INT:2: enabled=4194304 315464 1 1068994820930 MDP:INT:1: cpu=1 tf=0xfffffe03c45ecf30 um=1 315463 1 1068994796610 MDP:INT:3: reval=1 315462 1 1068994795833 MDP:INT:2: enabled=4194304 315461 1 1068994795589 MDP:INT:1: cpu=1 tf=0xfffffe03c45ecf30 um=1 315460 1 1068994771107 MDP:INT:3: reval=1 315459 1 1068994770176 MDP:INT:2: enabled=4194304 315458 1 1068994769933 MDP:INT:1: cpu=1 tf=0xfffffe03c45ecf30 um=1 315457 0 1068994766498 MDP:SWO:1: pc=0xfffff8001c6e0e00 pp=0x0 enable-msr=0 315456 0 1068994765449 CSW:SWO:1: cpu=0 proc=0xfffff80073767a50 (1655, pmcstat) pp=0x0 315455 1 1068994742201 MDP:INT:3: reval=1 315454 1 1068994739535 MDP:INT:2: enabled=4194304 315453 1 1068994739169 MDP:INT:1: cpu=1 tf=0xfffffe03c45ecf30 um=1 315452 1 1068994700076 MDP:SWI:1: pc=0xfffff8007318ea00 pp=0xfffff80021f56400 enable-msr=0 315451 1 1068994698957 MDP:STA:2: amd-start config=0x5300c0 315450 1 1068994698699 MDP:STA:1: amd-start cpu=1 ri=0 315449 1 1068994697513 MDP:WRI:1: amd-write cpu=1 ri=0 v=ffffffffffff0000 315448 1 1068994696676 CSW:SWI:1: cpu=1 ri=17 new=65536 315447 1 1068994694784 MDP:CFG:1: cpu=1 ri=0 pm=0xfffff8001c6e0900 315446 1 1068994691210 CSW:SWI:1: cpu=1 proc=0xfffff8017b0d3a50 (1654, burnK7) pp=0xfffff80021f56400 315445 0 1068994674368 MDP:SWO:1: pc=0xfffff8001c6e0e00 pp=0xfffff80021f56400 enable-msr=0 315444 0 1068994674033 MDP:CFG:1: cpu=0 ri=0 pm=0x0 315443 0 1068994673597 CSW:SWO:1: cpu=0 ri=17 tmp=-3205 (samp) 315442 0 1068994673412 MDP:REA:2: amd-read (post-munge) id=0 -> 65536 315441 0 1068994673247 MDP:REA:2: amd-read (pre-munge) id=0 -> 281474976645120 315440 0 1068994673006 MDP:REA:1: amd-read id=0 class=2 315439 0 1068994672443 MDP:INT:3: reval=1 315438 0 1068994670981 MDP:INT:2: enabled=0 315437 0 1068994670591 MDP:INT:1: cpu=0 tf=0xffffffff81d769d0 um=0 315436 0 1068994669599 MDP:STO:1: amd-stop ri=0 315435 0 1068994668389 CSW:SWO:1: cpu=0 proc=0xfffff8017b0d3a50 (1654, burnK7) pp=0xfffff80021f56400 315434 0 1068994642304 MDP:INT:3: reval=1 315433 0 1068994641339 MDP:INT:2: enabled=4194304 315432 0 1068994640412 MDP:INT:1: cpu=0 tf=0xffffffff81d769d0 um=1 315431 0 1068994598561 MDP:INT:3: reval=1 315430 0 1068994596854 MDP:INT:2: enabled=4194304 315429 0 1068994596590 MDP:INT:1: cpu=0 tf=0xffffffff81d769d0 um=0 315428 0 1068994533426 MDP:INT:3: reval=1 315427 0 1068994532643 MDP:INT:2: enabled=4194304 315426 0 1068994532390 MDP:INT:1: cpu=0 tf=0xffffffff81d769d0 um=1 315425 0 1068994508088 MDP:INT:3: reval=1 315424 0 1068994507167 MDP:INT:2: enabled=4194304 315423 0 1068994506915 MDP:INT:1: cpu=0 tf=0xffffffff81d769d0 um=1 I think that event #315438 confirms my hypothesis and that previously we would get a stray NMI after that event. There is a dozen such events per more than a million of interrupts. -- Andriy GaponReceived on Fri Oct 28 2016 - 11:51:44 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:08 UTC