Re: iflib/bridge kernel panic

From: Alexander Leidinger <Alexander_at_leidinger.net>
Date: Mon, 28 Sep 2020 16:44:10 +0200
Quoting Kristof Provost <kp_at_freebsd.org> (from Mon, 28 Sep 2020  
13:53:16 +0200):

> On 28 Sep 2020, at 12:45, Alexander Leidinger wrote:
>> Quoting Kristof Provost <kp_at_freebsd.org> (from Sun, 27 Sep 2020  
>> 17:51:32 +0200):
>>> Here’s an early version of a task queue based approach:  
>>> http://people.freebsd.org/~kp/0001-bridge-Cope-with-if_ioctl-s-that-sleep.patch
>>>
>>> That still needs to be cleaned up, but this should resolve the  
>>> sleep issue and the LOR.
>>
>> There are some issues... seems like inside a jail I can't ping  
>> systems outside of the hardware.
>>
>> Bridge setup:
>>    - member jail A
>>    - member jail B
>>    - member external_if of host
>>
>> If I ping the router from the host, it works. If I ping from one  
>> jail to another, it works. If I ping from the jail to the IP of the  
>> external_if, it works. If I ping from a jail to the router, I do  
>> not get a response.
>>
> Can you check for 'failed ifpromisc' error messages in dmesg? And  
> verify that all bridge member interfaces are in promiscuous mode?

I have a panic for you...:
  - startup still in progress = 22 jails in startup, somewhere after a  
few jails started the panic happened
  - tcpdump was running on the external interface
  - a ping to a jail IP from another system was running, the first  
ping went through, then it paniced

First regarding your questions about promisc mode: no error, but the  
promisc mode is directly disabled again on all interfaces.

Data (external_if = igb0, jail epairs are j_X_Yif with X the ID of the  
jail and Y either h like host-side or j like jail-side):
---snip---
Host:

# ifconfig -a
igb0: flags=8863<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
          
options=4a520b9<RXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,WOL_MAGIC,VLAN_HWFILTER,VLAN_HWTSO,RXCSUM_IPV6,NOMAP>
         ether [...]:a4
         inet 192.168.1.x netmask 0xffffff00 broadcast 192.168.1.255
         inet6 fe80::[...]a4%igb0 prefixlen 64 scopeid 0x1
         inet6 fd73:[...] prefixlen 64
         inet6 2003:[...] prefixlen 64 autoconf
         inet6 fd73:[...] prefixlen 64 autoconf
         media: Ethernet autoselect (1000baseT <full-duplex>)
         status: active
         nd6 options=23<PERFORMNUD,ACCEPT_RTADV,AUTO_LINKLOCAL>
igb1: flags=8822<BROADCAST,SIMPLEX,MULTICAST> metric 0 mtu 1500
          
options=4e527bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,LRO,WOL_MAGIC,VLAN_HWFILTER,VLAN_HWTSO,RXCSUM_IPV6,TXCSUM_IPV6,NOMAP>
         ether [...]:a5
         media: Ethernet autoselect
         status: no carrier
         nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> metric 0 mtu 16384
         options=680003<RXCSUM,TXCSUM,LINKSTATE,RXCSUM_IPV6,TXCSUM_IPV6>
         inet6 ::1 prefixlen 128
         inet6 fe80::1%lo0 prefixlen 64 scopeid 0x3
         inet 127.0.0.1 netmask 0xff000000
         groups: lo
         nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
vswitch0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
         ether [...]:a3
         id 00:00:00:00:00:00 priority 32768 hellotime 2 fwddelay 15
         maxage 20 holdcnt 6 proto stp-rstp maxaddr 2000 timeout 1200
         root id 00:00:00:00:00:00 priority 32768 ifcost 0 port 0
         member: j_weather_hif flags=143<LEARNING,DISCOVER,AUTOEDGE,AUTOPTP>
                 ifmaxaddr 0 port 9 priority 128 path cost 2000
         member: j_web_hif flags=143<LEARNING,DISCOVER,AUTOEDGE,AUTOPTP>
                 ifmaxaddr 0 port 8 priority 128 path cost 2000
         member: j_commit_hif flags=143<LEARNING,DISCOVER,AUTOEDGE,AUTOPTP>
                 ifmaxaddr 0 port 7 priority 128 path cost 2000
         member: j_video_hif flags=143<LEARNING,DISCOVER,AUTOEDGE,AUTOPTP>
                 ifmaxaddr 0 port 6 priority 128 path cost 2000
         member: j_dns_hif flags=143<LEARNING,DISCOVER,AUTOEDGE,AUTOPTP>
                 ifmaxaddr 0 port 5 priority 128 path cost 2000
         member: igb0 flags=143<LEARNING,DISCOVER,AUTOEDGE,AUTOPTP>
                 ifmaxaddr 0 port 1 priority 128 path cost 20000
         groups: bridge
         nd6 options=9<PERFORMNUD,IFDISABLED>
j_dns_hif: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0  
mtu 1500
         options=8<VLAN_MTU>
         ether [...]:0a
         hwaddr [...]:0a
         inet6 fe80::[...]0a%j_dns_hif prefixlen 64 scopeid 0x5
         groups: epair
         media: Ethernet 10Gbase-T (10Gbase-T <full-duplex>)
         status: active
         nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
[... some more jail interfaces ...]

# dmesg | grep promis
igb0: promiscuous mode enabled
igb0: promiscuous mode disabled
j_dns_hif: promiscuous mode enabled
j_dns_hif: promiscuous mode disabled
[... some more like this ...]

# jexec 2 ifconfig -a
lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> metric 0 mtu 16384
         options=680003<RXCSUM,TXCSUM,LINKSTATE,RXCSUM_IPV6,TXCSUM_IPV6>
         inet6 ::1 prefixlen 128
         inet6 fe80::1%lo0 prefixlen 64 scopeid 0x1
         inet 127.0.0.1 netmask 0xff000000
         groups: lo
         nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
j_dns_jif: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0  
mtu 1500
         options=8<VLAN_MTU>
         ether [...]:0b
         hwaddr [...]:0b
         inet 192.168.1.y netmask 0xffffff00 broadcast 192.168.1.255
         inet6 fe80::[...]0b%j_dns_jif prefixlen 64 scopeid 0x2
         inet6 fd73:[...]:y prefixlen 64
         groups: epair
         media: Ethernet 10Gbase-T (10Gbase-T <full-duplex>)
         status: active
         nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
---snip---

And here the backtrace of the panic:
---snip---
panic: if_setflag: decrement non-positive refcount 0 for flag 256
cpuid = 4
time = 1601300532
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe0378ea3920
vpanic() at vpanic+0x182/frame 0xfffffe0378ea3970
panic() at panic+0x43/frame 0xfffffe0378ea39d0
if_setflag() at if_setflag+0x137/frame 0xfffffe0378ea3a30
ifpromisc() at ifpromisc+0x2a/frame 0xfffffe0378ea3a60
bpf_detachd_locked() at bpf_detachd_locked+0x280/frame 0xfffffe0378ea3ab0
bpf_dtor() at bpf_dtor+0x87/frame 0xfffffe0378ea3ad0
devfs_destroy_cdevpriv() at devfs_destroy_cdevpriv+0xa1/frame  
0xfffffe0378ea3af0
devfs_close_f() at devfs_close_f+0x6a/frame 0xfffffe0378ea3b20
_fdrop() at _fdrop+0x20/frame 0xfffffe0378ea3b40
closef() at closef+0x1ea/frame 0xfffffe0378ea3bd0
closefp() at closefp+0x90/frame 0xfffffe0378ea3c10
amd64_syscall() at amd64_syscall+0x13e/frame 0xfffffe0378ea3d30
fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfffffe0378ea3d30


__curthread () at /space/system/usr_src/sys/amd64/include/pcpu_aux.h:55
55              __asm("movq %%gs:%P1,%0" : "=r" (td) : "n"  
(offsetof(struct pcpu,
(kgdb) #0  __curthread () at  
/space/system/usr_src/sys/amd64/include/pcpu_aux.h:55
#1  doadump (textdump=1) at /space/system/usr_src/sys/kern/kern_shutdown.c:394
#2  0xffffffff8051fb46 in kern_reboot (howto=260)
     at /space/system/usr_src/sys/kern/kern_shutdown.c:481
#3  0xffffffff8051ff8a in vpanic (fmt=<optimized out>, ap=<optimized out>)
     at /space/system/usr_src/sys/kern/kern_shutdown.c:913
#4  0xffffffff8051fcf3 in panic (fmt=<unavailable>)
     at /space/system/usr_src/sys/kern/kern_shutdown.c:839
#5  0xffffffff806321f7 in if_setflag (ifp=0xfffff800036cc000,
     flag=<unavailable>, pflag=<optimized out>, refcount=0xfffff800036cc3a8,
     onswitch=<unavailable>) at /space/system/usr_src/sys/net/if.c:3135
#6  0xffffffff8063206a in ifpromisc (ifp=0xfffff800036cc000,
     pswitch=<unavailable>) at /space/system/usr_src/sys/net/if.c:3196
#7  0xffffffff80626450 in bpf_detachd_locked (d=<optimized out>,
     detached_ifp=<optimized out>) at /space/system/usr_src/sys/net/bpf.c:882
#8  0xffffffff80629277 in bpf_detachd (d=0xfffff8074cf42800)
     at /space/system/usr_src/sys/net/bpf.c:836
#9  bpf_dtor (data=0xfffff8074cf42800)
     at /space/system/usr_src/sys/net/bpf.c:913
#10 0xffffffff80487531 in devfs_destroy_cdevpriv (p=0xfffff8074cf29c40)
     at /space/system/usr_src/sys/fs/devfs/devfs_vnops.c:197
#11 0xffffffff8048b16a in devfs_fpdrop (fp=0xfffff8074cebaaf0)
     at /space/system/usr_src/sys/fs/devfs/devfs_vnops.c:211
#12 devfs_close_f (fp=0xfffff8074cebaaf0, td=<optimized out>)
     at /space/system/usr_src/sys/fs/devfs/devfs_vnops.c:787
#13 0xffffffff804c4d70 in fo_close (fp=0xfffff8074cebaaf0, td=<unavailable>)
     at /space/system/usr_src/sys/sys/file.h:364
#14 _fdrop (fp=0xfffff8074cebaaf0, td=<unavailable>)
     at /space/system/usr_src/sys/kern/kern_descrip.c:3120
#15 0xffffffff804c7eca in closef (fp=0xfffff8074cebaaf0,  
td=0xfffffe0382567500)
     at /space/system/usr_src/sys/kern/kern_descrip.c:2606
#16 0xffffffff804c51e0 in closefp (fdp=0xfffffe0307cbd950, fd=3,
     fp=0xfffff8074cebaaf0, td=0xfffffe0382567500, holdleaders=<optimized out>)
     at /space/system/usr_src/sys/kern/kern_descrip.c:1263
#17 0xffffffff808000ae in syscallenter (td=<optimized out>)
     at /space/system/usr_src/sys/amd64/amd64/../../kern/subr_syscall.c:162
---snip---

Bye,
Alexander.

-- 
http://www.Leidinger.net Alexander_at_Leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.org    netchild_at_FreeBSD.org  : PGP 0x8F31830F9F2772BF

Received on Mon Sep 28 2020 - 12:44:18 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:25 UTC