Re: warning of pending commit attempt.

From: Marko Zec <zec_at_icir.org> Date: Thu, 28 Feb 2008 03:43:57 +0100 · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:28 UTC

On Wednesday 27 February 2008 00:24:26 Kris Kennaway wrote:
> Julian Elischer wrote:
> > Kris Kennaway wrote:
> >> Julian Elischer wrote:
> >>> Andre Oppermann wrote:
> >>>> Brooks Davis wrote:
> >>>>> On Mon, Feb 25, 2008 at 08:44:56PM -0800, Julian Elischer wrote:
> >>>>>> At some stage in the next few weeks I will be trying to commit
> >>>>>> Marco Zec's vimage code to -current. (only 'trying' not
> >>>>>> for technical reasons, but political).
> >>>>
> >>>> ...
> >>>>
> >>>>>> Why now?
> >>>>>>  The code is in a shape where teh compiled out version of hte
> >>>>>> system is stable. In the compiled in version, it is functional
> >>>>>> enough to provide nearly all of what people want. It needs
> >>>>>> people with other interests to adapt it to their purposes and
> >>>>>> use it so that it can become a solid product for future
> >>>>>> releases.
> >>>>>
> >>>>> The website has a snapshot with a date over a month old and
> >>>>> many comments about unstable interfaces.  I've seen zero
> >>>>> reports of substantial testing...
> >>>>
> >>>> What about locking and SMP scalability?  Any new choke points?
> >>>
> >>> not that I've seen.
> >>
> >> That's a less than resounding endorsement :)
> >
> > do the 10Gb ethernet adapters have any major problems?
> > are you willing to answer "no"?
> > should we then rip them from the tree?
>
> Those are small, isolated components, so hardly the same thing as a
> major architectural change that touches every part of the protocol
> stack.
>
> But if someone came along and said "I am going to replace the 10ge
> drivers, but I dunno how well they perform" I'd say precisely the
> same thing.
>
> Presumably someone (if not you, then Marko) has enough of a grasp of
> the architectural changes being proposed to comment about what
> changes (if any) were made to synchronisation models, and whether
> there are new sources of performance overhead introduced.
>
> That person can answer Andre's question.

OK first my appologies to everybody for being late in jumping into this 
thread...  I'll attempt to address a few questions rised so far in a 
random order, but SMP scalability definitely tops the list...

I think it's safe to assume that network stack instances / vimages will 
have lifetime frequencies similar to those of jails, i.e. once they get 
instantiated, in typical applications vimages would remain static over 
extended periods of time, rather than created and teared off thousands 
of times per second like TCP sessions or sockets in general.  Hence, 
synchronizing access to global vimage or vnet lists can be probably 
accomplished using rmlocks which are essentially free for read-only 
consumers.  The current code in p4 still uses a handcrafted shared / 
exclusive refcounted locking scheme with refcounts protected by a 
spinlock, since in 7.0 we don't have rmlocks yet, but I'll try 
converting those to rmlocks in the "official" p4 vimage branch which is 
tracking HEAD.

Another thing to note is that the frequency of read-only iterations over 
vnets is also quite low - mostly this needs to be done only in 
slowtimo(), fasttimo() and drain() networking handlers, i.e. only a 
couple of times per second.  All iteration points are easy to fgrep for 
in the code given that they are always implemented using VNET_ITERLOOP 
macros, which simply vanish away when the kernel is compiled without 
options VIMAGE.  But most importantly on the performance critical 
datapaths (i.e. socket - TCP - IP - link layer - device drivers, and 
vice versa) no additional synchronization points / bottlenecks were 
introduced.  In fact, the framework opens up the possibility to 
replicate some of the existing choked locks over multiple vnets, 
potentially reducing contention in cases where load would be evenly 
spread over multiple vimages / vnets.

Other people have asked about vimages and jails: yes it is possible to 
run multiple jails inside a vimage / vnet, with the original semantics 
of jails completely preserved.

Non-developers accessing the code: after freebsd.org's p4 to anoncvs 
autosyncer died last summer I've tried posting source tarballs every 
few weeks on the project's somewhat obscure web site (that Julian has 
advertised every now and then on this list): http://imunes.net/virtnet/  
I've just dumped a diff against -HEAD there, and will post new tarballs 
in a few minutes as well.

Impact of the changes on device drivers: in general no changes were 
needed at the device driver layer, as drivers do not need to be aware 
that they are running on a virtualized kernel.  Each NICs is logically 
attached to one and only one network stack instance at a time, and it 
receives data from upper layers and feeds the upper layers with mbufs 
in exactly the same manner as it does on the standard kernel.  It is 
the link layer that demultiplexes the incoming traffic to the 
appropriate stack instance...

Overall, there's a lot of cleanup and possibly restructuring work left 
to be done on the vimage code in p4, with documenting the new 
interfaces probably being the top priority.  I'm glad to see such a 
considerable amount of (sudden) interest for pushing this code into the 
main tree, so now being smoked out of my rathole I'll be happy to work 
with Julian and other folks to bring the vimage code closer to CVS and 
help maintaining it one way or another once it hopefully gets there, be 
it weeks or months until we reach that point - the sooner the better of 
course...

Cheers,

Marko