Hi, I recently came across a class of errors which lead me into investigating the "kern/kern_timeout.c" and its subsystem. From what I can see new features like the SMP awareness has been "added" instead of fully "integrated". When going into the cornercases I've uncovered that the internal callout statemachine can sometimes report wrong values via its callout_active() and callout_pending() bits to its clients, which in turn can make the clients behave badly. I further did an investigation on how the safety of callout migration between CPU's is maintained. When I looked into the code and found stuff like "volatile" and "while()" loops to figure which CPU a callout belongs I understood that such logic completely undermines the cleverness found in the turnstiles of mutexes and decided to go through all of the logic inside "kern_timeout.c". Also static code analysis is harder when we don't use the basic mutexes and condition variables available in the kernel. First of all we need to make some driving rules for everyone: 1) A new feature called direct callbacks which execute the timer callbacks from the fast interrupt handler was added. All these callbacks _must_ be associated with a regular spinlocks, to maintain a safe callout_drain(). Else they should only be executed on CPU0. 2) All Giant locked callbacks should only execute on CPU0 to avoid congestion. 3) Callbacks using read-only locks for its callback should also only execute on CPU0 to avoid multiple instances pending for completion on multiple CPU's, because read-only locks can be entered multiple times. From what I can see, there are currently no consumers of this feature in the kernel. Secondly, we need a way to drain callbacks asynchronously, for example like in the TCP stack. Currently the TCP shutdown sequence for a connection looks like this: ... void pcb_teardown(pcb) { lock(pcb); callout_stop(&pcb->c1); callout_stop(&pcb->c2); callout_stop(&pcb->c3); unlock(pcb); free(pcb); } I guess some of you see what is wrong: Use after free. ... With a new function I've added, scenarios like this can be solved more elegantly and without having to fear use after free situations! static void pcb_callback(void *pcb) { lock(pcb); do_free = (--(pcb->refcount) == 0); unlock(pcb); if (do_free == 0) return; destroy_mtx(pcb); free(pcb); } void pcb_teardown(pcb) { lock(pcb); pcb->refcount = 4; if (callout_drain_async(&pcb->c1, pcb_callback, pcb) == 0) pcb->refcount--; if (callout_drain_async(&pcb->c2, pcb_callback, pcb) == 0) pcb->refcount--; if (callout_drain_async(&pcb->c3, pcb_callback, pcb) == 0) pcb->refcount--; unlock(pcb); pcb_callback(pcb); } Please find attached a patch for 11-current as of today. Comments and feedback is appreciated. BTW: Happy New Year everyone! --HPS
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:54 UTC