On Friday, October 28, 2016 04:50:40 PM Andriy Gapon wrote: > On 28/10/2016 16:01, Andriy Gapon wrote: > > I suspect that under the right conditions it's possible for wrmsr to cause a > > counter overflow, such that an interrupt (if enabled) is generated after wrmsr > > is executed, even if wrmsr disables the counter. > > > > In amd_intr() we have this code: > > wrmsr(evsel, config & ~AMD_PMC_ENABLE); > > wrmsr(perfctr, AMD_RELOAD_COUNT_TO_PERFCTR_VALUE(v)); > > > > /* Restart the counter if logging succeeded. */ > > error = pmc_process_interrupt(cpu, PMC_HR, pm, tf, > > TRAPF_USERMODE(tf)); > > if (error == 0) > > wrmsr(evsel, config | AMD_PMC_ENABLE); > > > > I suspect that in the scenario above, if it is indeed possible, the last wrmsr > > would re-enable the counter that's supposed to be stopped. > > > > I think that writing back the original value should be more correct, that is: > > wrmsr(evsel, config); > > > > I'll test if this change would help. > > So, I have tried this change: > > --- a/sys/dev/hwpmc/hwpmc_amd.c > +++ b/sys/dev/hwpmc/hwpmc_amd.c > _at__at_ -577,6 +577,7 _at__at_ amd_start_pmc(int cpu, int ri) > > PMCDBG1(MDP,STA,2,"amd-start config=0x%x", config); > > + KASSERT(cpu == PCPU_GET(cpuid), ("requested cpu is not current cpu")); > wrmsr(pd->pm_evsel, config); > return 0; > } > _at__at_ -613,6 +614,7 _at__at_ amd_stop_pmc(int cpu, int ri) > > /* turn off the PMC ENABLE bit */ > config = pm->pm_md.pm_amd.pm_amd_evsel & ~AMD_PMC_ENABLE; > + KASSERT(cpu == PCPU_GET(cpuid), ("requested cpu is not current cpu")); > wrmsr(pd->pm_evsel, config); > return 0; > } > _at__at_ -676,6 +678,7 _at__at_ amd_intr(int cpu, struct trapframe *tf) > perfctr = AMD_PMC_PERFCTR_0 + i; > v = pm->pm_sc.pm_reloadcount; > config = rdmsr(evsel); > + PMCDBG1(MDP,INT,2, "enabled=%d", config & AMD_PMC_ENABLE); > > KASSERT((config & ~AMD_PMC_ENABLE) == > (pm->pm_md.pm_amd.pm_amd_evsel & ~AMD_PMC_ENABLE), > _at__at_ -689,12 +692,13 _at__at_ amd_intr(int cpu, struct trapframe *tf) > error = pmc_process_interrupt(cpu, PMC_HR, pm, tf, > TRAPF_USERMODE(tf)); > if (error == 0) > - wrmsr(evsel, config | AMD_PMC_ENABLE); > + wrmsr(evsel, config); > } > > atomic_add_int(retval ? &pmc_stats.pm_intr_processed : > &pmc_stats.pm_intr_ignored, 1); > > + PMCDBG1(MDP,INT,3, "reval=%d", retval); > return (retval); > } > > > > And I couldn't reproduce the problem with it. > Also, in the debug log I see the following, for instance: > 315466 1 1068994822044 MDP:INT:3: reval=1 > 315465 1 1068994821176 MDP:INT:2: enabled=4194304 > 315464 1 1068994820930 MDP:INT:1: cpu=1 tf=0xfffffe03c45ecf30 um=1 > 315463 1 1068994796610 MDP:INT:3: reval=1 > 315462 1 1068994795833 MDP:INT:2: enabled=4194304 > 315461 1 1068994795589 MDP:INT:1: cpu=1 tf=0xfffffe03c45ecf30 um=1 > 315460 1 1068994771107 MDP:INT:3: reval=1 > 315459 1 1068994770176 MDP:INT:2: enabled=4194304 > 315458 1 1068994769933 MDP:INT:1: cpu=1 tf=0xfffffe03c45ecf30 um=1 > 315457 0 1068994766498 MDP:SWO:1: pc=0xfffff8001c6e0e00 pp=0x0 enable-msr=0 > 315456 0 1068994765449 CSW:SWO:1: cpu=0 proc=0xfffff80073767a50 (1655, > pmcstat) pp=0x0 > 315455 1 1068994742201 MDP:INT:3: reval=1 > 315454 1 1068994739535 MDP:INT:2: enabled=4194304 > 315453 1 1068994739169 MDP:INT:1: cpu=1 tf=0xfffffe03c45ecf30 um=1 > 315452 1 1068994700076 MDP:SWI:1: pc=0xfffff8007318ea00 > pp=0xfffff80021f56400 enable-msr=0 > 315451 1 1068994698957 MDP:STA:2: amd-start config=0x5300c0 > 315450 1 1068994698699 MDP:STA:1: amd-start cpu=1 ri=0 > 315449 1 1068994697513 MDP:WRI:1: amd-write cpu=1 ri=0 v=ffffffffffff0000 > 315448 1 1068994696676 CSW:SWI:1: cpu=1 ri=17 new=65536 > 315447 1 1068994694784 MDP:CFG:1: cpu=1 ri=0 pm=0xfffff8001c6e0900 > 315446 1 1068994691210 CSW:SWI:1: cpu=1 proc=0xfffff8017b0d3a50 (1654, > burnK7) pp=0xfffff80021f56400 > 315445 0 1068994674368 MDP:SWO:1: pc=0xfffff8001c6e0e00 > pp=0xfffff80021f56400 enable-msr=0 > 315444 0 1068994674033 MDP:CFG:1: cpu=0 ri=0 pm=0x0 > 315443 0 1068994673597 CSW:SWO:1: cpu=0 ri=17 tmp=-3205 (samp) > 315442 0 1068994673412 MDP:REA:2: amd-read (post-munge) id=0 -> 65536 > 315441 0 1068994673247 MDP:REA:2: amd-read (pre-munge) id=0 -> 281474976645120 > 315440 0 1068994673006 MDP:REA:1: amd-read id=0 class=2 > 315439 0 1068994672443 MDP:INT:3: reval=1 > 315438 0 1068994670981 MDP:INT:2: enabled=0 > 315437 0 1068994670591 MDP:INT:1: cpu=0 tf=0xffffffff81d769d0 um=0 > 315436 0 1068994669599 MDP:STO:1: amd-stop ri=0 > 315435 0 1068994668389 CSW:SWO:1: cpu=0 proc=0xfffff8017b0d3a50 (1654, > burnK7) pp=0xfffff80021f56400 > 315434 0 1068994642304 MDP:INT:3: reval=1 > 315433 0 1068994641339 MDP:INT:2: enabled=4194304 > 315432 0 1068994640412 MDP:INT:1: cpu=0 tf=0xffffffff81d769d0 um=1 > 315431 0 1068994598561 MDP:INT:3: reval=1 > 315430 0 1068994596854 MDP:INT:2: enabled=4194304 > 315429 0 1068994596590 MDP:INT:1: cpu=0 tf=0xffffffff81d769d0 um=0 > 315428 0 1068994533426 MDP:INT:3: reval=1 > 315427 0 1068994532643 MDP:INT:2: enabled=4194304 > 315426 0 1068994532390 MDP:INT:1: cpu=0 tf=0xffffffff81d769d0 um=1 > 315425 0 1068994508088 MDP:INT:3: reval=1 > 315424 0 1068994507167 MDP:INT:2: enabled=4194304 > 315423 0 1068994506915 MDP:INT:1: cpu=0 tf=0xffffffff81d769d0 um=1 > > I think that event #315438 confirms my hypothesis and that previously we would > get a stray NMI after that event. There is a dozen such events per more than a > million of interrupts. I think your patch is probably correct. hwpmc_core.c seems to have the same issue. I think it could be fixed similarly though it doesn't seem to have ever triggered. -- John BaldwinReceived on Fri Oct 28 2016 - 13:15:13 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:08 UTC