Re: [PATCH] Netdump for review and testing -- preliminary version

From: Attilio Rao <attilio_at_freebsd.org> Date: Thu, 14 Oct 2010 16:10:26 +0200 · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:08 UTC

2010/10/14 Robert N. M. Watson <rwatson_at_freebsd.org>:
>
> On 13 Oct 2010, at 18:46, Ryan Stone wrote:
>
>> On Fri, Oct 8, 2010 at 9:15 PM, Robert Watson <rwatson_at_freebsd.org> wrote:
>>> +               /*
>>> +                * get and fill a header mbuf, then chain data as an
>>> extended
>>> +                * mbuf.
>>> +                */
>>> +               MGETHDR(m, M_DONTWAIT, MT_DATA);
>>>
>>> The idea of calling into the mbuf allocator in this context is just freaky,
>>> and may have some truly awful side effects.  I suppose this is the cost of
>>> trying to combine code paths in the network device driver rather than have
>>> an independent path in the netdump case, but it's quite unfortunate and will
>>> significantly reduce the robustness of netdumps in the face of, for example,
>>> mbuf starvation.
>>
>> Changing this will require very invasive changes to the network
>> drivers.  I know that the Intel drivers allocate their own mbufs for
>> their receive rings and I imagine that all other drivers have to do
>> something similar.  Plus the drivers are responsible for freeing mbufs
>> after they have been transmitted.  It seems to me that the cost of
>> making significant changes to the network drivers to support an
>> alternate lifecycle for netdump mbufs far outweighs the cost of losing
>> a couple of kernel dumps in extreme circumstances.
>
> My concern is less about occasional lost dumps that destabilising the dumping process: calls into the memory allocator can currently trigger a lot of interesting behaviours, such as further calls back into the VM system, which can then trigger calls into other subsystems. What I'm suggesting is that if we want the mbuf allocator to be useful in this context, we need to teach it about things not to do in the dumping / crash / ... context, which probably means helping uma out a bit in that regard. And a watchdog to make sure the dump is making progress.

I think that this would be way too complicated just to cope with panic
within the VM/UMA (not sure what other subsystems you are referring
to, wrt supposed to call). Besides, if we have a panic in the VM I'm
sure that normal dumps could also be affected.
When dealing with netdump, I'm not trying to fix all the bugs related
to our dumping infrastructure because, as long as we already
discussed, we know there are quite a few of them, but trying at least
to follow the same fragile-ness than what we have today.
And again, while I think the "watchdog" idea is good, I think it still
applies to normal dumps too, it is not specific to netdump.

Thanks,
Attilio

-- 
Peace can only be achieved by understanding - A. Einstein