On Sun, Jan 06, 2013 at 04:23:13PM +0100, Marius Strobl wrote: > On Wed, Dec 26, 2012 at 09:24:46PM +0200, Alexander Motin wrote: > > On 26.12.2012 01:21, Marius Strobl wrote: > > > On Tue, Dec 18, 2012 at 11:03:47AM +0200, Alexander Motin wrote: > > >> Experiments with dummynet shown ineffective support for very short > > >> tick-based callouts. New version fixes that, allowing to get as many > > >> tick-based callout events as hz value permits, while still be able to > > >> aggregate events and generating minimum of interrupts. > > >> > > >> Also this version modifies system load average calculation to fix some > > >> cases existing in HEAD and 9 branches, that could be fixed with new > > >> direct callout functionality. > > >> > > >> http://people.freebsd.org/~mav/calloutng_12_17.patch > > >> > > >> With several important changes made last time I am going to delay commit > > >> to HEAD for another week to do more testing. Comments and new test cases > > >> are welcome. Thanks for staying tuned and commenting. > > > > > > FYI, I gave both calloutng_12_15_1.patch and calloutng_12_17.patch a > > > try on sparc64 and it at least survives a buildworld there. However, > > > with the patched kernels, buildworld times seem to increase slightly but > > > reproducible by 1-2% (I only did four runs but typically buildworld > > > times are rather stable and don't vary more than a minute for the > > > same kernel and source here). Is this an expected trade-off (system > > > time as such doesn't seem to increase)? > > > > I don't think build process uses significant number of callouts to > > affect results directly. I think this additional time could be result of > > the deeper next event look up, done by the new code, that is practically > > useless for sparc64, which effectively has no cpu_idle() routine. It > > wouldn't affect system time and wouldn't show up in any statistics > > (except PMC or something alike) because it is executed inside timer > > hardware interrupt handler. If my guess is right, that is a part that > > probably still could be optimized. I'll look on it. Thanks. > > > > > Is there anything specific to test? > > > > Since the most of code is MI, for sparc64 I would mostly look on related > > MD parts (eventtimers and timecounters) to make sure they are working > > reliably in more stressful conditions. I still have some worries about > > possible deadlock on hardware where IPIs are used to fetch present time > > from other CPU. > > Well, I've just learnt two things the hard way: > a) We really need the mutex in that path. > b) Assuming that the initial synchronization of the counters is good > enough and they won't drift considerably accross the CPUs so we can > always use the local one makes things go south pretty soon after > boot. At least with your calloutng_12_26.patch applied. > > I'm not really sure what to do about that. Earlier you already said > that sched_bind(9) also isn't an option in case if td_critnest > 1. > To be honest, I don't really unerstand why using a spin lock in the > timecounter path makes sparc64 the only problematic architecture > for your changes. The x86 i8254_get_timecount() also uses a spin lock > so it should be in the same boat. > > The affected machines are equipped with a x86-style south bridge > which exposes a powermanagment unit (intended to be used as a SMBus > bridge only in these machines) on the PCI bus. Actually, this device > also includes an ACPI power management timer. However, I've just > spent a day trying to get that one working without success - it > just doesn't increment. Probably its clock input isn't connected as > it's not intended to be used in these machines. > That south bridge also includes 8254 compatible timers on the ISA/ > LPC side, but are hidden from the OFW device tree. I can hack these > devices into existence and give it a try, but even if that works this > likely would use the same code as the x86 i8254_get_timecount() so I > don't see what would be gained with that. > > The last thing in order to avoid using the tick counter as timecounter > in the MP case I can think of is that the Broadcom MACs in the affected > machines also provide a counter driven by a 1 MHz clock. If that's good > enough for a timecounter I can hook these up (in case these work ...) > and hack bge(4) to not detach in that case (given that we can't detach > timecounters ...). > > > > > Here is small tool we are using for test correctness and performance of > > different user-level APIs: http://people.freebsd.org/~mav/testsleep.c > > > > I've run Ian's set of tests on a v215 with and without your > calloutng_12_26.patch and on a v210 (these uses the IPI approach) > with the latter also applied. > I'm not really sure what to make out of the numbers. > > v215 w/o v215 w/ v210 w/ > ---------- ---------------- ---------------- ---------------- > select 1 1999.61 1 23.87 1 29.97 > poll 1 1999.70 1 1069.61 1 1075.24 > usleep 1 1999.86 1 23.43 1 28.99 > nanosleep 1 999.92 1 23.28 1 28.66 > kqueue 1 1000.12 1 1071.13 1 1076.35 > kqueueto 1 999.56 1 26.33 1 31.34 > syscall 1 1.89 1 1.92 1 2.88 > select 300 1999.72 300 326.08 300 332.24 > poll 300 1999.12 300 1069.78 300 1075.82 > usleep 300 1999.91 300 325.63 300 330.94 > nanosleep 300 999.82 300 23.25 300 26.76 > kqueue 300 1000.14 300 1071.06 300 1075.96 > kqueueto 300 999.53 300 26.32 300 31.42 > syscall 300 1.90 300 1.93 300 2.89 > select 3000 3998.18 3000 3176.51 3000 3193.86 > poll 3000 3999.29 3000 3182.21 3000 3193.12 > usleep 3000 3998.46 3000 3191.60 3000 3192.50 > nanosleep 3000 1999.79 3000 23.21 3000 27.02 > kqueue 3000 3000.12 3000 3189.13 3000 3191.96 > kqueueto 3000 1999.99 3000 26.28 3000 31.91 > syscall 3000 1.91 3000 1.91 3000 2.90 > select 30000 30990.85 30000 31489.18 30000 31548.77 > poll 30000 30995.25 30000 31518.80 30000 31487.92 > usleep 30000 30992.00 30000 31510.42 30000 31475.50 > nanosleep 30000 1999.46 30000 38.67 30000 41.95 > kqueue 30000 30006.49 30000 30991.86 30000 30996.77 > kqueueto 30000 1999.09 30000 41.67 30000 46.36 > syscall 30000 1.91 30000 1.91 30000 2.88 > select 300000 300990.65 300000 301864.98 300000 301787.01 > poll 300000 300998.09 300000 301831.36 300000 301741.62 > usleep 300000 300990.80 300000 301824.67 300000 301770.10 > nanosleep 300000 1999.15 300000 325.74 300000 331.01 > kqueue 300000 300000.87 300000 301031.11 300000 300992.28 > kqueueto 300000 1999.39 300000 328.77 300000 333.45 > syscall 300000 1.91 300000 1.91 300000 2.88 the nanosleep and kqueueto tests are probably passing the wrong argument to the syscall (meant to be microseconds, but nanosleep takes nanosecond so it should probably be multiplied by 1000). I think that for the time being it would be useful to run at least one set of tests with kern.timecounter.alloweddeviation=0 so we can tell how close we get to the required timeouts cheers luigi > Marius > > _______________________________________________ > freebsd-arch_at_freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-arch > To unsubscribe, send any mail to "freebsd-arch-unsubscribe_at_freebsd.org"Received on Sun Jan 06 2013 - 15:29:44 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:33 UTC