HEADS UP: vimage ABI constraints on container structs [was: Re: svn commit: r186057 - head/sys/netinet]

From: Bjoern A. Zeeb <bzeeb-lists_at_lists.zabbadoz.net>
Date: Wed, 17 Dec 2008 10:23:16 +0000 (UTC)
On Tue, 16 Dec 2008, Peter Wemm wrote:

Hi,

let me Cc: virtualization and current_at_ for this reply (to have the
explicit HEADS UP) for what Peter pointed out.

The first to reply may want to trim the Cc: list again; possibly to
only virtualization.


> On Sat, Dec 13, 2008 at 1:59 PM, Bjoern A. Zeeb <bz_at_freebsd.org> wrote:
>>  De-virtualize the MD5 context for TCP initial seq number generation
>>  and make it a function local variable like we do almost everywhere
>>  inside the kernel.
> [..]
>> --- head/sys/netinet/vinet.h    Sat Dec 13 21:17:46 2008        (r186056)
>> +++ head/sys/netinet/vinet.h    Sat Dec 13 21:59:18 2008        (r186057)
>> _at__at_ -142,7 +142,6 _at__at_ struct vnet_inet {
>>        int     _isn_last_reseed;
>>        u_int32_t _isn_offset;
>>        u_int32_t _isn_offset_old;
>> -       MD5_CTX _isn_ctx;
>>
>>        struct  inpcbhead _udb;
>>        struct  inpcbinfo _udbinfo;
>
> I'm bitterly unhappy with this.  Every time these structs are touched,
> either directly or indirectly, there is a guaranteed ABI breakage with
> kernel modules.
>
> There needs to be a __FreeBSD_version bump (or something similar)
> every time any of these structures change, and any kernel modules
> *must* be prevented from loading.  It can't be a >= some version, it
> has to be an exact match.
>
> With the global variable method the linker calculates the offsets at
> load time.  With this abomination, the knowledge of the structure
> layout is compiled into the generated code with no chance of a fixup.
> There are no sanity checks.  If a module that referenced _isn_ctx is
> loaded the old way, there would be a link error.  The new way will
> have it silently trash _udb instead.
>
> There is a whole world of hurt being unleashed here.  I suspect that
> we might even be possible to run out of digits in our
> __FreeBSD_version numbering scheme.
>
> I know we've talked about ABI-stable alternatives after the
> infrastructure is done, but it needs to be absolutely clear that
> touching this structure in the current form is a guaranteed ABI break,
> and is currently undetected.  If you boot kernel.old, you're hosed.
> (Again, with -current, this might be ok for a while, but this scheme
> won't wash with ports or other 3rd party modules)
...
> In the mean time, I'd like to see some compile-time asserts in there
> to make sure there are no accidental size changes of this structure.

Ok, there are multiple things here:

Size changes:
----------------------------------------

* I think catching pure size changes alone are not enough; any
   changes to the structs matter.
   Think of removing an int and adding an int (even at the same
   place). Something working on this will trash it.

* The size changes of course would guard about non-obvious, indirect
   struct changes like (just an example) a change to struct inpcbinfo
   that make me worry even more (like they seem to worry you).


More on structure changes:
----------------------------------------

* While this is on HEAD (refering to "Again, with -current, this might
   be ok for a while ..") I expected the following (major) changes
   coming with the continuing integration and testing:

   1) Final passthrough on the set of virtualized variables. That might
      happen after Step #3 when people can actually test of SVN.

   2) During Benchmarking - this would be about the same time - there
      might be shuffling.

   3) Before we turn to a STABLE branch. *fear*

* Again it's the indirect changes as said above (which are very
   worryingly).


ABI problem:
----------------------------------------

* I agree that there are unaddressed challenges here.

* Ideally we want version checking - at least at load time for modules -
   per container struct, as people do not want to change and recompile
   everything if say something in gif or ipsec changes which might not
   affect them. netgraph has done something like this but my feeling is
   that that would be the wrong way to go, especially wrt to
   vinet/vinet6.

* I am not sure if padding as we had it before will work for stable
   branches. We need to think of this problem as well. To my
   understanding MAC has another really large structure with sufficient
   padding but it's a subsystem more or less living on its own on the
   side, not as heavily changes between branches and MFCed as the
   netstack and it's (function) pointers there and less direct members.

* If you have suggestions or solutions, please share them.

* ..


Misc:
----------------------------------------

* I was aware of the problem and failed on two fronts here:
   1) Doing my commit after Marko and forgetting about it going from
      the p4 branches to SVN.
   2) Forgetting to mention this in the HEADs UP:(

* The http://wiki.freebsd.org/Image/BeginnersGuideFAQ had a (somewhat
   hidden) "A single change would require complete recompiliation."
   I factored it out but will need to be more explicit there or refer
   to this thread.

* Thanks a lot for sending this out to comitters and making them aware
   of the problem. I Cc:ed current_at_ and virtualization_at_ in the reply
   and change the subject.

* I am happy there is another pair of eyes here now:) After losing
   the initial three other reviewers ...

* I think the current mode (is?/)was "getting the infrastructre in"
   and get more hands (again) for the remaining parts when inevitable
   confronted with it and the huge pile of changes is gone and one
   might see more clear again.

* We may want to always keep in mind that, not for 8 but maybe for 9,
   people may want to further virtualize more subsystems so whatever
   we do should possibly be general enough to work for other parts
   of the kernel as well.

* As a very last resort at the moment (which would be a pitty) we
   can still do the rollback, keep the globals (as default) and
   only have virtualization optional. I would expect there to be
   some "keeping in sync" problems but it would be a lot easier to
   have both working and have people to adhere to the ``V_rules''
   than on the side. It would be kind of hard for 3rd party vendors
   to supply (binary) modules with that then though.
   I really hope we won't need to go there.

/bz

-- 
Bjoern A. Zeeb                      The greatest risk is not taking one.
Received on Wed Dec 17 2008 - 09:25:07 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:39 UTC