VIMAGE: vnet, epair and lots of jails on bridgeX - routing

From: O. Hartmann <ohartmann_at_walstatt.org>
Date: Thu, 8 Feb 2018 09:31:15 +0100
Hello,

I fight with the following problem without any kind of success and I need some
help and/or advice.

We are running several CURRENT and 11.1-RELENG-p6 boxes. CURRENT is at the most
recent version as of today.

VIMAGE is compiled in into all kernels.
IPFW is compiled into all kernels and is the one and only firewall used.
On CURRENT, the host's ipfw is set to "OPEN" (using the rc.-scripts so far). By
convention, I address the host running the kernel by "host".

Every jail is created/configured with its own "vnet" cloned network stack
(vnet=new).

All hosts do have at least three physical NICs. The host itself is supposed to
be member of the "friendly" network via a dedicated NIC. The two remaining NICs
are split into fractions belonging to an "hostile" network on which I'd like to
place exposed jails (for now), and to the "friendly" network, on which also
jails will be hosted, but via a dedicated NIC.

Inbetween those two networks, the host will have a third, intermediate,
network, call it the "service" network.

The following will be true for ALL jails created, including the host itself:

net.link.bridge.pfil_member=0
net.link.bridge.pfil_bridge=0
net.link.bridge.pfil_onlyip=0

First, I clone/create three bridge(4) devices, bridge0 (considered to be the
"glue" between the "service" jails), bridge1 (considered to be the glue between
the jails on the friednly network side) and bridge2, which is the glue between
the jails on the hostile side. bridge1 has eth1 as a member, which provides the
physical access to the friendly network, eth2 is member of bridge2, which
provides access to the hostile network.

By convention, when creating epair(4), the a-portion belongs to the jail itself
and is assigned with an IPv6 address. The b-portion of the epair(4) is member
of its bridge according to its realm (friendly, service or hostile network). 

Additionally, there is a special jail, the router, which has three epair(4)
devices, the b-portion of the epair is member of the appropriate bridge(4) and
this router jail has static routes assigned, pointing to the appropriate
epairXXXa that is suppoesd to be the link into the correct bridge/network. IPFW
is set to open on this jail (for now). On this special
jail it is set: net.inet.ip.forwarding=1.

I hope, the topology is clear so far. All epairs or epair endpoints as well as
the bridges are UP! Double checked this.

Jails on bridge0 (service net) have IPs in the range 10.10.0.0/24, the
b-portion of the routing jail's epair is member of bridge0, as described above,
and the a-portion of the epair has IP 10.10.0.1. Default route on each jeail
on bridge0 is set to 10.10.0.1 accordingly.

Consider a similar setup on the other jails on the friendly and hostile
network, except the fact that their bridges do have a physical NIC to which
they may have access to a real network.

The setup might not be ideal and/or applicable for the purpose of separartion
of networks virtually, but that shouldn't be the subject here. More important
is that I assume that I haven't understood some essentials, because the setup
doens't work as expected. Furthermore, it behaves on FreeBSD 11.1-RELENG-p6
sometimes completely unpredictable - but in that special case, I think I ran
IPFW on the host as "WORKSTATION" and dynamic rules may play an important role
here. But focussing on the CURRENT box, the host's IPFW is set to OPEN.

With jexec -l hostA I gain access to host A on the "service" bridge0 and I
want to ping its neighbour, hostB, on the same bridge and in the same net. It
doesn't work! From the routing jail, I CAN NOT ping any host on bridge0. The
routing jail has these network settings:

[... routing jail ...]
 lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> metric 0 mtu 16384
        options=600003<RXCSUM,TXCSUM,RXCSUM_IPV6,TXCSUM_IPV6>
        inet 127.0.0.1 netmask 0xff000000 
        groups: lo
[epair to bridge0 - service net] 
epair4000a: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
        options=8<VLAN_MTU>
        ether 02:57:d0:00:07:0a
        inet 10.10.0.1 netmask 0xffffff00 broadcast 10.10.0.255 
        media: Ethernet 10Gbase-T (10Gbase-T <full-duplex>)
        status: active
        groups: epair
[epair to bridge1, friendly net] 
epair4001a: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
        options=8<VLAN_MTU>
        ether 02:57:d0:00:09:0a
        inet 192.168.11.1 netmask 0xffffff00 broadcast 192.168.11.255 
        media: Ethernet 10Gbase-T (10Gbase-T <full-duplex>)
        status: active
        groups: epair
[epair to bridge2, hostile net] 
epair4002a: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
        options=8<VLAN_MTU>
        ether 02:57:d0:00:0b:0a
        inet 10.10.10.1 netmask 0xfffffc00 broadcast 10.10.10.255 
        media: Ethernet 10Gbase-T (10Gbase-T <full-duplex>)
        status: active
        groups: epair 

routing:
netstat -Warn
Routing tables

Internet:
Destination        Gateway            Flags       Use    Mtu      Netif Expire
10.10.0.0/24       link#2             U            11   1500 epair4000a
10.10.0.1          link#2             UHS           4  16384        lo0
10.10.10.0/24      link#4             U           210   1500 epair4002a
10.10.10.1         link#4             UHS          44  16384        lo0
127.0.0.1          link#1             UH            0  16384        lo0
192.168.11.0/24    link#3             U             9   1500 epair4001a
192.168.11.1       link#3             UHS           0  16384        lo0

Consider a jail hostCC on bridge2 in the hostile network, IP 10.10.10.128. 
I can ping that jail, although it has conceptionally the very same setup as the
unreachable jails on bridge0!

It is weird. On bridge0, no jail can be pinged, it looks like the ethernet is
somehwo down on that bridge. I would expect to ping each host member of the
very same bridge! On 11.1-RELENG-p6, there are other weird issues, I was able
to ping those jails, even ssh to them, but that vanished after several
restarts of the jails system (each bridge, epair is created by jail.conf and
destroyed after the jails has been deactivated and doing so a considerable
amount brings down the FreeBSD 11.1-RELENG-p6 host verys successfully - it
crashes!).

So, since VIMAGE is now default in CURRENT's GENERIC, I consider its
functionality at least "predictable", but I fail somehow here.

Does someone have a deeper insight or realise the mistake I'm celebrating here?

Thanks in adavnce,

Oliver 
Received on Thu Feb 08 2018 - 07:31:33 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:14 UTC