On 1/22/2015 4:27 AM, Konstantin Belousov wrote: > On Thu, Jan 22, 2015 at 11:16:41AM +0100, Hans Petter Selasky wrote: >> On 01/20/15 11:47, Slawa Olhovchenkov wrote: >>> On Tue, Jan 20, 2015 at 08:29:47AM +0100, Hans Petter Selasky wrote: >>> >>>> On 01/17/15 23:18, Hans Petter Selasky wrote: >>>>> On 01/17/15 20:11, Jason Wolfe wrote: >>>>>> >>>>>> HPS, >>>>>> >>>>>> Just to give a quick status update, this patch has most certainly >>>>>> resolved our spin lock held too long panics on stable/10. >>>>>> >>>>>> Thank you to JHB for spending some time digging into the issue and >>>>>> leading us to td_slpcallout as the culprit, and HPS for your rewrite. >>>>>> I had heard rumors of other being affected by similar issues, so this >>>>>> seems like a fine candidate for an MFC if possible. >>>>>> >>>>>> Jason >>>>>> >>>>> >>>>> Hi Jason, >>>>> >>>>> I'm glad to hear that my patch has resolved your issue and I'm happy we >>>>> now have a more stable system. >>>>> >>>>> It was actually a co-worker at work which wrote some bad code which I >>>>> started debugging which then lead me to look at the callout subsystem. >>>>> One bug kills the other ;-) >>>>> >>>>> I'm planning a MFC to 10-stable - yes, and will possibly add the >>>>> _callout_stop_safe() function to not break binary compatibility with >>>>> existing drivers as part of the MFC. >>>>> >>>>> --HPS >>>> >>>> Hi, >>>> >>>> Here is a followup patch for the TCP stack like I mentioned in the >>>> beginning of the work done on the callout subsystem: >>>> >>>> https://reviews.freebsd.org/D1563 >>>> >>>> If someone has a setup for massive TCP testing please give it a spin. >>> >>> I have on 10.1 (with applied r261906). >> >> FYI: >> >> r277213 is going to be pulled out from -current in at maximum a few >> hours from now, because developers need more time to review patches in >> surrounding areas like the TCP stack area to restore distribution of >> callouts on multiple CPUs when using MPSAFE callouts to avoid congestion >> in the TCP stack. > > No, r277213 was requested to be reverted not due to TCP issues. > > The main complain is that you left indefinite amount of cases degraded, > and there is no analysis of each such case, nor even a list of the cases > that need to be fixed (or argumentation why consumer of the callout KPI > could be left as is). > > Just providing fix for one place is not enough. I have a similar concern about out-of-tree work. It would be surprising for a vendor or module developer to find their performance degrade if they missed accounting for this change. At a minimum, an UPDATING entry should be added explaining the change and what must be done for consumers. -- Regards, Bryan Drewery
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:55 UTC