Re: [PATCH] Netdump for review and testing -- preliminary version

From: Attilio Rao <attilio_at_freebsd.org>
Date: Fri, 8 Oct 2010 19:56:52 +0200
2010/9/28 Attilio Rao <attilio_at_freebsd.org>:
> In the last weeks I worked for porting the netdump infrastructure to
> FreeBSD-CURRENT on the behalf of Sandvine Incorporated.
> Netdump is a framework that aims for handling kernel coredumps over
> the TCP/IP suite in order to dump to a separate machine than the
> running one. That may be used on an interesting number of cases
> involving disk-less workstations, disk driver debugging or embedded
> devices.
>
> GENERAL FRAMEWORK ARCHITECTURE
>
> Netdump is composed, right now, by an userland "server" and a kernel
> "client". The former is run on the target machine (where the dump will
> phisically happen) and it is responsible for receiving  the packets
> containing coredumps frame and for correctly writing them on-disk.
> The latter is part of the kernel installed on the source machine
> (where the dump is initiated) and is responsible for building
> correctly UDP packets containing the coredump frames, pushing through
> the network interface and routing them appropriately.
>
> While the server may appear as, pretty much, a simple userland deamon
> dealing with UDP packets, the client imposes interesting problems as
> long as its activity is linked to handling kernel core dumping. More
> precisely, as long as the client is part of the dumping mechanism and
> the kernel may be in general panic conditions, netdump must ensure
> "robustness" of operations. That is partially achieved by reworking
> (and someway replicating) locally some UDP, ARP and IP operations,
> hardsetting some values (like the default gateway, the destination and
> the client address) and reducing further interactions with the kernel
> just to the network interface acitivities.
> More specifically, it implements a very basic UDP / IPv4 / ARP stack
> separate from the standard stack (since that may not be in a
> consistent state).
> It can dump to a server on the same LAN or via a router (correctly
> specifying the connection gateway).
> In order to receive packet on critical conditions, netdump polls the
> interface. Every network driver can implement hooks to be used by
> netdump independently by DEVICE_POLLING option, even if it is
> probabilly a good idea to share some code among them. The reference
> set of hooks is contained into "struct netdump_methods".
> And if_lem/if_em driver modifies may be set as reference for netdump
> hooks implementation.
>
> In order to work into an "up and running" system (meant as with all
> the devices in place) the netdump handler hooks as a pre-sync handler
> (differently from other dumping routines). It however suffers some
> problems typical of other dumping mechanism. For example, on DDB
> entering unlocked version of polling handler is used, in order to
> reduce the risk of deadlocks during inspections*. That reflects, among
> the netdump methods, the existence of 2 versions of polling hooks,
> where the "unlocked" is meant as reducing locking as much as possible.
>
> PATCH AND FURTHER WORK
>
> The patch is not totally complete and it is not intended to be
> committed in SVN yet. What I'm looking for now is more testing and
> review (in particular in terms of architecture) coverage by community.
> The server should be in realtively "committable" state, though, but I
> encourage its stress-testing. A manpage is provided along that should
> be very easy to understand how to use it.
>
> Things that can be further improved, as it is now, in the client, are:
> - Deciding if hardcoding of the kernel parameter is done properly. I
> personally don't like the sysctl usage and I would prefer an userland
> small utility used to testing and maybe add some tunables for enabling
> netdump early in the boot. You may have several opinions on this
> though.
> - VIMAGE and IPv6 support.
> - More drivers support. Right now only if_em (and if_lem) are
> converted to use netdump and can be used as a draft for other device
> drivers. if_ixgb should came along in the final, committing, version
> too. In general I think that all drivers supporting device polling
> could easilly support also netdump
> - Ideally dumpsys() in FreeBSD is too much disk-activity oriented. It
> should be made, instead, more neutral and more flexible to cope better
> with different interfaces. It is a quite a bit of work, however, and
> beyond the scope of netdump introduction (even if it could be
> beneficial for it)
>
> Netdump has been developed on a FreeBSD project branch located here:
> svn://svn.freebsd.org/base/projects/sv/
>
> which could also forsee further informations about every single
> change. However, for your convenience, also a patch has been made
> public which is located here (against FreeBSD-CURRENT_at_213246):
> http://www.freebsd.org/~attilio/Sandvine/STABLE_8/netdump/netdump_alpha_0.diff


This followup is made in order to signal the code has been further
refined and some bugfixes (reported mostly by pluknet_at_ testing) have
been fixed. More support for interfaces has been added (igb, ixgb,
ixgbe).
You can fetch the sourcecode from projects/sv/, as described in
previous e-mail, or use this other patch:
http://www.freebsd.org/~attilio/Sandvine/STABLE_8/netdump/netdump_alpha_1.diff

I didn't try netdump with VIMAGE, but for people willing to do that,
they can apply the following further patch, on top of the projects/sv/
patchset and report:
http://www.freebsd.org/~attilio/Sandvine/STABLE_8/netdump/netdump_alpha_1.fix.diff

That is not contained in the "official" projects/sv/.

Right now I plan to just move netdump_client.c to be style compliant
and add further comments, and eventually commit all the infrastructure
on next Friday, 15th October, if no problems are reported by reviewers
and testers.

You are encouraged to review and test it (and in particular I added
jfv_at_ and rstone_at_ on CC in order to give them a look at driver specific
parts).

Thanks,
Attilio


-- 
Peace can only be achieved by understanding - A. Einstein
Received on Fri Oct 08 2010 - 15:56:54 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:08 UTC