SCHED_ULE / NetGraph interaction broken somwhere between r227874 and r229818

From: Lev Serebryakov <lev_at_FreeBSD.org> Date: Thu, 12 Jan 2012 13:31:12 +0400 · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:23 UTC

Hello, Freebsd-current.

  I have router, which connects to upstream ISP with mpd5 from ports
 using PPPoE.

  I've used SCHED_ULE for long time without nay problems. Under heavy
 network load (router is not the fastest one -- 500Mhz Geode CPU) main
 consumer of CPU was "intr{swi1: netisr 0}" thread. But it never
 consumes  more than 75% and even when upstream channel was
 competently saturated router was accessible and responsive.

  Latest "good" I'm sure about revision is about r227874 (yes, from
  November 2011, I didn't update router's system for long time).

  But revision r229818 behaves completely different: under network
 load 100% CPU is consumed by "ng_queue" thread (which is never ever
 consume any CPU on old system). System is unresponsive, DNS based on
 this system returns timeouts, I could not log-in via SSH or seral
 console (pause between login and passwd is so huge, that it leads to
 timeouts), etc. LA jumps up to 20+, pre-started `top' updates screen
 one time per 3-4 minutes, etc.

  Switching to 4BSD helps. 4BSD works as usual: all CPU time is
 interrupts and network thread, system is responsive under heaviest load,
 normal operations of DNS, DHCP and hostapd.

  There was NO significant changes in netgraph (svn log -r
 227874:229818 sys/netgraph) and three changes (r229429, r228960,
 r228718) in kern/sched_*.c files. But I'm not sure, that these
 changes are only which could affect this behavior.

  Now I'm trying to find "bad" revision by binary search, but it is
 very hard to do: old mpd5 doesn't work on new kernel and vice versa,
 so I need to rebuild whole world, update my build-box, rebuild ports
 with new world, and only after that build NanoBSD image for my
 router. It takes about 5 hours per iteration and here is more than
 512 revisions, so it is about 10 iterations :(

  I could provide any debug information from old and new systems.

-- 
// Black Lion AKA Lev Serebryakov <lev_at_FreeBSD.org>