(unknown charset) some experience with a many core machine: event timer, hwpmc

From: (unknown charset) Andriy Gapon <avg_at_FreeBSD.org>
Date: Thu, 24 Oct 2013 13:47:31 +0300
I don't think that I have seen observations like the following posted before.

I had some brief contact with a 48 core Opteron system (4 packages).

Observation #1.

Event timers subsystem picked a HPET timer as its source.  This resulted in a
lot of inter-core / inter-package traffic to re-distribute timer interrupts.
This also caused contention on a lock used internally by the kern_et code in the
case of a single global timer, because many CPUs tried to grab it concurrently.
 Additionally, I saw some statistics artifacts like top reported weird and
unstable results.

I believe that there should be some logic to prefer per-CPU timers over global
timers as number of CPUs increases.

Observation #2.

hwpmc was quite unusable on that system.  Attempts to use it resulted in lockups
or panics like waiting too long on spinlock.  It appears that hwpmc performs
some actions on each CPU and those actions are driven by timer interrupts.  The
actions use a single global lock for arbitration.  It appears that contention on
that lock make hwpmc unusable.  Just in case, this was the case even after I
switched the timer to per-CPU LAPIC timers.  HZ was default 1000.  So perhaps
1ms / 42 (~24us) was not enough for hwpmc to do its per tick per CPU actions
before the next tick.

The contention appeared to be in pmclog_reserve (called from
pmclog_process_callchain).


Some details about the hardware just in case:

CPU: AMD Opteron(tm) Processor 6172 (2100.07-MHz K8-class CPU)
  Origin = "AuthenticAMD"  Id = 0x100f91  Family = 0x10  Model = 0x9  Stepping = 1

Features=0x178bfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,MMX,FXSR,SSE,SSE2,HTT>
  Features2=0x802009<SSE3,MON,CX16,POPCNT>
  AMD Features=0xee500800<SYSCALL,NX,MMX+,FFXSR,Page1GB,RDTSCP,LM,3DNow!+,3DNow!>
  AMD
Features2=0x837ff<LAHF,CMP,SVM,ExtAPIC,CR8,ABM,SSE4A,MAS,Prefetch,OSVW,IBS,SKINIT,WDT,NodeId>
  TSC: P-state invariant

FreeBSD/SMP: Multiprocessor System Detected: 48 CPUs
FreeBSD/SMP: 4 package(s) x 12 core(s)

-- 
Andriy Gapon
Received on Thu Oct 24 2013 - 08:48:29 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:43 UTC