Re: [RFC] kern/kern_timeout.c rewrite in progress

From: Konstantin Belousov <kostikbel_at_gmail.com>
Date: Thu, 22 Jan 2015 12:27:40 +0200
On Thu, Jan 22, 2015 at 11:16:41AM +0100, Hans Petter Selasky wrote:
> On 01/20/15 11:47, Slawa Olhovchenkov wrote:
> > On Tue, Jan 20, 2015 at 08:29:47AM +0100, Hans Petter Selasky wrote:
> >
> >> On 01/17/15 23:18, Hans Petter Selasky wrote:
> >>> On 01/17/15 20:11, Jason Wolfe wrote:
> >>>>
> >>>> HPS,
> >>>>
> >>>> Just to give a quick status update, this patch has most certainly
> >>>> resolved our spin lock held too long panics on stable/10.
> >>>>
> >>>> Thank you to JHB for spending some time digging into the issue and
> >>>> leading us to td_slpcallout as the culprit, and HPS for your rewrite.
> >>>> I had heard rumors of other being affected by similar issues, so this
> >>>> seems like a fine candidate for an MFC if possible.
> >>>>
> >>>> Jason
> >>>>
> >>>
> >>> Hi Jason,
> >>>
> >>> I'm glad to hear that my patch has resolved your issue and I'm happy we
> >>> now have a more stable system.
> >>>
> >>> It was actually a co-worker at work which wrote some bad code which I
> >>> started debugging which then lead me to look at the callout subsystem.
> >>> One bug kills the other ;-)
> >>>
> >>> I'm planning a MFC to 10-stable - yes, and will possibly add the
> >>> _callout_stop_safe() function to not break binary compatibility with
> >>> existing drivers as part of the MFC.
> >>>
> >>> --HPS
> >>
> >> Hi,
> >>
> >> Here is a followup patch for the TCP stack like I mentioned in the
> >> beginning of the work done on the callout subsystem:
> >>
> >> https://reviews.freebsd.org/D1563
> >>
> >> If someone has a setup for massive TCP testing please give it a spin.
> >
> > I have on 10.1 (with applied r261906).
> 
> FYI:
> 
> r277213 is going to be pulled out from -current in at maximum a few 
> hours from now, because developers need more time to review patches in 
> surrounding areas like the TCP stack area to restore distribution of 
> callouts on multiple CPUs when using MPSAFE callouts to avoid congestion 
> in the TCP stack.

No, r277213 was requested to be reverted not due to TCP issues.

The main complain is that you left indefinite amount of cases degraded,
and there is no analysis of each such case, nor even a list of the cases
that need to be fixed (or argumentation why consumer of the callout KPI
could be left as is).

Just providing fix for one place is not enough.
Received on Thu Jan 22 2015 - 09:27:46 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:55 UTC