Re: > r353680: multiuser crash due to: m_getzone: Inavlid cluster size 0

From: Fedorov, Aleksandr <aleksandr.fedorov_at_vstack.com>
Date: Wed, 23 Oct 2019 10:11:44 +0000
I discovered a similar kernel panic.

To reproduce, just run CURRENT in bhyve with e1000 network backend.

I think the problem is that the debugnet_any_ifnet_update () function calls iflib_debugnet_init () when the private driver data is not yet fully initialized.

sys/net/iflib.c:
6724iflib_debugnet_init(if_t ifp, int *nrxr, int *ncl, int *clsize)
6725{
6726    if_ctx_t ctx;
6727
6728    ctx = if_getsoftc(ifp);
6729    CTX_LOCK(ctx);
6730    *nrxr = NRXQSETS(ctx);
6731    *ncl = ctx->ifc_rxqs[0].ifr_fl->ifl_size;
6732    *clsize = ctx->ifc_rxqs[0].ifr_fl->ifl_buf_size; <<<<<<<<------ ifl_buf_size is equal zero!!!
6733    CTX_UNLOCK(ctx);
6734}

So, it seems that ifnet_link_event EVENTHANDLER is too early to initialize debugnet.

Because ifl_buf_size is initialized with ctx-> ifc_rx_mbuf_sz, which is initialized with iflib_calc_rx_mbuf_sz (), I use the following patch, as a workaround:

diff --git a/sys/net/iflib.c b/sys/net/iflib.c
index 73606981a492..1caf3505932a 100644
--- a/sys/net/iflib.c
+++ b/sys/net/iflib.c
_at__at_ -6729,7 +6729,8 _at__at_ iflib_debugnet_init(if_t ifp, int *nrxr, int *ncl, int *clsize)
        CTX_LOCK(ctx);
        *nrxr = NRXQSETS(ctx);
        *ncl = ctx->ifc_rxqs[0].ifr_fl->ifl_size;
-       *clsize = ctx->ifc_rxqs[0].ifr_fl->ifl_buf_size;
+       iflib_calc_rx_mbuf_sz(ctx);
+       *clsize = iflib_get_rx_mbuf_sz(ctx);
        CTX_UNLOCK(ctx);
 }

em0: <Intel(R) PRO/1000 Network Connection> port 0x2000-0x2007 mem 0xc0000000-0xc001ffff,0xc0020000-0xc002ffff irq 16 at device 2.0 on pci0
em0: Using 1024 TX descriptors and 1024 RX descriptors
em0: Ethernet address: 00:a0:98:b9:5c:99
em0: netmap queues/slots: TX 1/1024, RX 1/1024
virtio_pci0: <VirtIO PCI Block adapter> port 0x2040-0x207f mem 0xc0030000-0xc0031fff irq 17 at device 3.0 on pci0
vtblk0: <VirtIO Block Adapter> on virtio_pci0
vtblk0: 16384MB (33554432 512 byte sectors)
atkbdc0: <Keyboard controller (i8042)> port 0x60,0x64 irq 1 on acpi0
atkbd0: <AT Keyboard> irq 1 on atkbdc0
kbd0 at atkbd0
atkbd0: [GIANT-LOCKED]
driver bug: Unable to set devclass (class: atkbdc devname: (unknown))
Unhandled ps2 mouse command 0xe1
                                psm0: <PS/2 Mouse> irq 12 on atkbdc0
psm0: [GIANT-LOCKED]
psm0: model Generic PS/2 mouse, device ID 0
uart0: <16550 or compatible> port 0x3f8-0x3ff irq 4 flags 0x10 on acpi0
uart0: console (9600,n,8,1)
uart1: <16550 or compatible> port 0x2f8-0x2ff irq 3 on acpi0
vga0: <Generic ISA VGA> at port 0x3b0-0x3bb iomem 0xb0000-0xb7fff pnpid PNP0900 on isa0
Timecounters tick every 10.000 msec
usb_needs_explore_all: no devclass
em0: link state changed to UP
panic: m_getzone: invalid cluster size 0
cpuid = 0
time = 1
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe0011b8d7f0
vpanic() at vpanic+0x17e/frame 0xfffffe0011b8d850
panic() at panic+0x43/frame 0xfffffe0011b8d8b0
debugnet_mbuf_reinit() at debugnet_mbuf_reinit+0x21b/frame 0xfffffe0011b8d8f0
debugnet_any_ifnet_update() at debugnet_any_ifnet_update+0x107/frame 0xfffffe0011b8d940
do_link_state_change() at do_link_state_change+0x1b3/frame 0xfffffe0011b8d990
taskqueue_run_locked() at taskqueue_run_locked+0x10c/frame 0xfffffe0011b8d9f0
taskqueue_run() at taskqueue_run+0x4a/frame 0xfffffe0011b8da10
ithread_loop() at ithread_loop+0x1c6/frame 0xfffffe0011b8da70
fork_exit() at fork_exit+0x80/frame 0xfffffe0011b8dab0
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe0011b8dab0
--- trap 0, rip = 0, rsp = 0, rbp = 0 ---
KDB: enter: panic
[ thread pid 12 tid 100010 ]
Stopped at      kdb_enter+0x37: movq    $0,0x1098a86(%rip)
db> 
Received on Wed Oct 23 2019 - 08:12:02 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:22 UTC