Re: best approximation of getcpu() ?

From: John Baldwin <jhb_at_freebsd.org>
Date: Fri, 16 Dec 2016 13:27:21 -0800
On Friday, December 16, 2016 12:10:01 PM Adrian Chadd wrote:
> On 16 December 2016 at 11:45, Luigi Rizzo <rizzo_at_iet.unipi.it> wrote:
> > On Fri, Dec 16, 2016 at 09:29:15AM +0000, David Chisnall wrote:
> >> On 16 Dec 2016, at 03:10, Alan Somers <asomers_at_FreeBSD.org> wrote:
> >> >
> >> > What about pthread_setaffinity(3) and friends?  You can use it to pin
> >> > a thread to a single CPU, and know that it will never migrate.
> >>
> >> This is not a useable solution for anything that needs to live in a library and also doesn???t solve the problem.
> >>
> >> The Linux get_cpu call() is used for caches that are somewhere between global and thread-local.  Accessing them still requires a lock, but it???s very likely to be uncontended (contention only happens when you???re context switched at exactly the wrong time, or if a thread is migrated between cores in between the get_cpu() call and usage) and so you can use the userspace fast path for the lock and not suffer from cache contention effects.
> >>
> >> One x86, you can use cpuid from userspace and get the current core ID.  I have some code that does this and re-checks every few hundred accesses, storing the current CPU ID in a thread-local variable.  Using the per-CPU caches is a lot faster than using the global cache (and reduces contention on the global cache).  It would be great if we could have a syscall to do this on FreeBSD (it would be even better if we could have specify a TLS variable that the kernel automatically updates for the userspace thread when the scheduler migrates the thread between cores).
> >
> > indeed the following line seems to do the job for x86
> >         asm volatile("cpuid" : "=d"(curcpu), "=a"(tmp), "=b"(tmp), "=c"(tmp) : "a"(0xb) );
> > (there must be a better way to tell the compiler that eax, ebx, ecx, edx are
> > all clobbered).
> >
> > 0xb is the CPUID function that returns the current APIC id for the
> > core (not necessarily matching the OS core-id)
> >
> > The only problem is that this instruction is serialising and slow,
> > seems to take some 70-100ns on several of my machines so you
> > cannot afford to call it at all times but need the value cached
> > somewhere. Exposing it as thread local storage, or a VDSO syscall,
> > would be nicer because the kernel knows when it is actually changing
> > value.
> 
> The problem is your CPU ID can change in the middle of packet handling.
> 
> So if you want it to be accurate, you need to bind your worker thread to a CPU.

Well, it seems the goal is to have something much lighter weight akin to
critical_enter/exit or sched_pin/unpin in the kernel.  It's not that you
care about a specific CPU, you just want to not migrate.  (UMA uses
critical sections when accessing the per-CPU buckets for the same reasons.)
The problem with using cli/sti in userland is that you might page fault and
context switch during the fault handler (or get preempted in the fault
handler which will run with interrupts enabled) and migrate.  You could
prevent this if you are able to mlock() all of the pages holding any code
you will execute or data you will access to prevent faults, but you have to
ensure you can do this for every potential page.

-- 
John Baldwin
Received on Fri Dec 16 2016 - 20:31:01 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:09 UTC