В Mon, 29 Dec 2014 21:03:24 +0100 Hans Petter Selasky <hps_at_selasky.org> пишет: > Hi, > > I recently came across a class of errors which lead me into > investigating the "kern/kern_timeout.c" and its subsystem. From what > I can see new features like the SMP awareness has been "added" > instead of fully "integrated". When going into the cornercases I've > uncovered that the internal callout statemachine can sometimes report > wrong values via its callout_active() and callout_pending() bits to > its clients, which in turn can make the clients behave badly. I > further did an investigation on how the safety of callout migration > between CPU's is maintained. When I looked into the code and found > stuff like "volatile" and "while()" loops to figure which CPU a > callout belongs I understood that such logic completely undermines > the cleverness found in the turnstiles of mutexes and decided to go > through all of the logic inside "kern_timeout.c". Also static code > analysis is harder when we don't use the basic mutexes and condition > variables available in the kernel. > > First of all we need to make some driving rules for everyone: > > 1) A new feature called direct callbacks which execute the timer > callbacks from the fast interrupt handler was added. All these > callbacks _must_ be associated with a regular spinlocks, to maintain > a safe callout_drain(). Else they should only be executed on CPU0. > > 2) All Giant locked callbacks should only execute on CPU0 to avoid > congestion. > > 3) Callbacks using read-only locks for its callback should also only > execute on CPU0 to avoid multiple instances pending for completion on > multiple CPU's, because read-only locks can be entered multiple > times. From what I can see, there are currently no consumers of this > feature in the kernel. > ... panic: spin lock held too long http://paste.org.ru/?acf7ioReceived on Wed Dec 31 2014 - 21:56:19 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:54 UTC