Panic with r346530 [Re: svn commit: r346530 - in head/sys: netinet netinet6]

From: Enji Cooper <yaneurabeya_at_gmail.com> Date: Mon, 22 Apr 2019 04:25:27 -0700 · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:20 UTC

Hi Hans,

> On Apr 22, 2019, at 1:32 AM, Hans Petter Selasky <hps_at_selasky.org> wrote:
> 
> On 4/22/19 10:10 AM, Hans Petter Selasky wrote:
>> On 4/22/19 9:52 AM, Enji Cooper wrote:
>>> 
>>>> On Apr 22, 2019, at 12:27 AM, Hans Petter Selasky <hselasky_at_FreeBSD.org> wrote:
>>>> 
>>>> Author: hselasky
>>>> Date: Mon Apr 22 07:27:24 2019
>>>> New Revision: 346530
>>>> URL: https://svnweb.freebsd.org/changeset/base/346530
>>>> 
>>>> Log:
>>>>   Fix panic in network stack due to memory use after free in relation to
>>>>   fragmented packets.
>>>> 
>>>>   When sending IPv4 and IPv6 fragmented packets and a fragment is lost,
>>>>   the mbuf making up the fragment will remain in the temporary hashed
>>>>   fragment list for a while. If the network interface departs before the
>>>>   so-called slow timeout clears the packet, the fragment causes a panic
>>>>   when the timeout kicks in due to accessing a freed network interface
>>>>   structure.
>>>> 
>>>>   Make sure that when a network device is departing, all hashed IPv4 and
>>>>   IPv6 fragments belonging to it, get freed.
>>>> 
>>>>   Backtrace:
>>>>   panic()
>>>>   icmp6_reflect()
>>>> 
>>>>   hlim = ND_IFINFO(m->m_pkthdr.rcvif)->chlim;
>>>>   ^^^^ rcvif->if_afdata[AF_INET6] is NULL.
>>>> 
>>>>   icmp6_error()
>>>>   frag6_freef()
>>>>   frag6_slowtimo()
>>>>   pfslowtimo()
>>>>   softclock_call_cc()
>>>>   softclock()
>>>>   ithread_loop()
>>>> 
>>>>   Differential Revision:    https://reviews.freebsd.org/D19622
>>>>   Reviewed by:        bz (network), adrian
>>>>   MFC after:        1 week
>>>>   Sponsored by:        Mellanox Technologies
> 
> Should be fixed by
> 
> r346535
> 
> Else I'll revert.

...

The code compiles, but unfortunately panics when running the test suite. From https://ci.freebsd.org/job/FreeBSD-head-amd64-test/10926/console:

03:05:01  1st 0xffffffff820967f0 allprison (allprison) _at_ /usr/src/sys/kern/kern_jail.c:966
03:05:01  2nd 0xffffffff820c47f0 vnet_sysinit_sxlock (vnet_sysinit_sxlock) _at_ /usr/src/sys/net/vnet.c:575
03:05:01 stack backtrace:
03:05:01 #0 0xffffffff80c477f3 at witness_debugger+0x73
03:05:01 #1 0xffffffff80c4753d at witness_checkorder+0xa7d
03:05:01 #2 0xffffffff80be9088 at _sx_slock_int+0x68
03:05:01 #3 0xffffffff80d0ef97 at vnet_alloc+0x117
03:05:01 #4 0xffffffff80ba4111 at kern_jail_set+0x1bb1
03:05:01 #5 0xffffffff80ba5b70 at sys_jail_set+0x40
03:05:01 #6 0xffffffff810b2e16 at amd64_syscall+0x276
03:05:01 #7 0xffffffff8108b44d at fast_syscall_common+0x101
03:05:01 panic: mtx_lock() of destroyed mutex _at_ /usr/src/sys/netinet/ip_reass.c:628
03:05:01 cpuid = 1
03:05:01 time = 1555927501
03:05:01 KDB: stack backtrace:
03:05:01 db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe0030eec630
03:05:01 vpanic() at vpanic+0x19d/frame 0xfffffe0030eec680
03:05:01 panic() at panic+0x43/frame 0xfffffe0030eec6e0
03:05:02 __mtx_lock_flags() at __mtx_lock_flags+0x12e/frame 0xfffffe0030eec730
03:05:02 ipreass_cleanup() at ipreass_cleanup+0x86/frame 0xfffffe0030eec770
03:05:02 if_detach_internal() at if_detach_internal+0x786/frame 0xfffffe0030eec7f0
03:05:02 if_detach() at if_detach+0x3d/frame 0xfffffe0030eec810
03:05:02 lo_clone_destroy() at lo_clone_destroy+0x16/frame 0xfffffe0030eec830
03:05:02 if_clone_destroyif() at if_clone_destroyif+0x21f/frame 0xfffffe0030eec880
03:05:02 if_clone_detach() at if_clone_detach+0xb8/frame 0xfffffe0030eec8b0
03:05:02 vnet_loif_uninit() at vnet_loif_uninit+0x26/frame 0xfffffe0030eec8d0
03:05:02 vnet_destroy() at vnet_destroy+0x124/frame 0xfffffe0030eec900
03:05:02 prison_deref() at prison_deref+0x29d/frame 0xfffffe0030eec940
03:05:02 sys_jail_remove() at sys_jail_remove+0x28f/frame 0xfffffe0030eec990
03:05:02 amd64_syscall() at amd64_syscall+0x276/frame 0xfffffe0030eecab0
03:05:02 fast_syscall_common() at fast_syscall_common+0x101/frame 0xfffffe0030eecab0
03:05:02 --- syscall (508, FreeBSD ELF64, sys_jail_remove), rip = 0x80031e12a, rsp = 0x7fffffffe998, rbp = 0x7fffffffea20 ---
03:05:02 KDB: enter: panic
03:05:02 [ thread pid 13109 tid 100150 ]
03:05:02 Stopped at      kdb_enter+0x3b: movq    $0,kdb_why
03:05:02 db:0:kdb.enter.panic> show pcpu
03:05:02 cpuid        = 1
03:05:02 dynamic pcpu = 0xfffffe0080191800
03:05:02 curthread    = 0xfffff80005c1f000: pid 13109 tid 100150 "jail"
03:05:02 curpcb       = 0xfffffe0030eecb80
03:05:02 fpcurthread  = 0xfffff80005c1f000: pid 13109 "jail"
03:05:02 idlethread   = 0xfffff800032765a0: tid 100004 "idle: cpu1"
03:05:02 curpmap      = 0xfffff8013d837130
03:05:02 tssp         = 0xffffffff821cd388
03:05:02 commontssp   = 0xffffffff821cd388
03:05:02 rsp0         = 0xfffffe0030eecb80
03:05:02 gs32p        = 0xffffffff821d3fc0
03:05:02 ldt          = 0xffffffff821d4000
03:05:02 tss          = 0xffffffff821d3ff0
03:05:02 tlb gen      = 314416
03:05:02 curvnet      = 0xfffff80139320200
03:05:02 spin locks held:
03:05:02 db:0:kdb.enter.panic> alltrace

	Either the sys/netinet/ or sys/netipsec/ tests triggered the panic. Not sure which right now.
Cheers,
-Enji