On 11/29/20, David Wolfskill <david_at_catwhisker.org> wrote: > On Sat, Nov 28, 2020 at 10:47:57AM -0500, Jonathan Looney wrote: >> FWIW, I would try running lockstat on the box. (My supposition is that >> the >> delay is due to a lock. That could be incorrect. Lockstat may provide >> some >> clue as to whether this is a line of inquiry worth pursuing.) >> .... > > Thanks (again), Jonathan. > > So... I did that (during this morning's daily upgrade cycle); the > results may be "of interest" to some. > > I have placed copies of the typescripts in: > > http://www.catwhisker.org/~david/FreeBSD/head/lockstat/ > > I also scribbled a "README" in that same directory (though it doesn't > seem to show up in the listing); it may be accessed via > > http://www.catwhisker.org/~david/FreeBSD/head/lockstat/README > > My prior message in this thread showed what I saw during a "ping albert" > from the laptop while it was running head -- most RTTs were around 0.600 > ms, but some were notably longer, with at least one that was over 68 > seconds. > > So I did a "lockstat ping -c 64 albert" while the laptop was running > stable/12_at_r368123 (as a reference point); it is probably boring. :-} > > Then (this morning), I tried a simple "lockstat sleep 600" on the laptop > while it was running head_at_r368119 (and building head_at_r368143); we see > the "lockstat" output in the "lockstat_head" file. > > It then occurred to me that trying a "lockstat ping albert" might be > useful, so I fired up "lockstat ping -c 600 albert" -- which started up > OK, and demonstrated some long RTTs about every 11 packets or so, but we > see thing come to a screeching halt with: > > ... > 64 bytes from 172.16.8.13: icmp_seq=534 ttl=63 time=0.664 ms > lockstat: dtrace_status(): Abort due to systemic unresponsiveness > 64 bytes from 172.16.8.13: icmp_seq=535 ttl=63 time=9404.383 ms > > and we get no lockstat output. :-/ > > > Finally, as another "control," I ran similar commands from freebeast, > while it was running head_at_r368119 (and building head_at_r368143). Those > results are in the "lockstat_freebeast" file. > According to the data you got the entire kernel "freezes" every 11-12 seconds. So something way off is going on there. Given that the bug seems to be reproducible I think it would be best if you just bisected to the offending commit. -- Mateusz Guzik <mjguzik gmail.com>Received on Sun Nov 29 2020 - 13:20:19 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:26 UTC