Robert Watson wrote: > > On Mon, 3 Apr 2006, Kazuaki Oda wrote: > >>> Also, are you running with INVARIANTS and/or WITNESS? >> >> >> Sorry, I compiled kernel without INVARIANTS and WITNESS. > > > No problem at all -- the debugging information you have sent me is > enough to track down the source of the problem. It looks like we have > an inconsistency in how we handle (especially in my new world order) the > recycling of timewait state for an inpcb that is still present. I've > committed a work-around which should prevent the panic you're seeing, > but I need to investigate a bit more before I can commit a full solution. > > For those interested, the problem is how to handle sockets with attached > inpcbs that represent closed or time wait TCP connections. This can > happen if shutdown() is called on a socket, kicking the TCP state engine > into a close cycle, rather than a reset. In the current world order, > the following sets of socket, pcb, and ppcb protocol state can occur: > > fd -> socket <-> inpcb <-> tcpcb Normal TCP socket in various states. > fd -> socket <-> inpcb <-> twtcp Unclosed TCP socket in time wait. > fd -> socket <-> inpcb <-> NULL Unclosed TCP socket after tw > recycle. > socket <-> inpcb <-> tcpcb Socket closed, buffer still needed. > inpcb <-> twtcp Socket closed, time wait. > > The problem was that the middle case exists, but was not accounted for. > There's another problem that is still present in the new socket/pcb > model, in which the inpcb of an open socket with a closed TCP connection > continues to reserve the address/port combination. This is related to > the inpcb without twtcp case, where we recycle the twtcp, but can't > recylce the inpcb immediately because there's still an fd reference to > the socket, and hence a socket reference to the inpcb. > > My current leaning is to do the following: > > - Since we now keep inpcb's around for the lifetime of the socket, either > teach the inpcb lookup routines to ignore INP_DROPPED, or to move them to > another inpcb list for open but dropped inpcb's. This will avoid the > reservation hanging around. > > - Either eliminate the inp_ppcb pointer NULL case by prohibiting > recycling of > the twtcp state of a socket that is still open, or more formally > supporting > it through the previous change. The trick is to prevent the twtcp/inpcb > pairs from turning up and being used during input and allocation > collision > processing. > > The summary is that we're not quite there yet in terms of how it all > should be working, but that we should avoid the panic for now due to the > workarounds I committed (which basically are changes to handle the > inp_ppcb pointer being NULL for INP_TIMEWAIT sockets). > > Thanks! > > Robert N M Watson > > I re-cvsup'ed today and rebuilt kernel. After reboot with new kernel, I got: # kgdb kernel.debug /var/crash/vmcore.5 [GDB will not be able to debug user-mode threads: /usr/lib/libthread_db.so: Undefined symbol "ps_pglobal_lookup"] GNU gdb 6.1.1 [FreeBSD] Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i386-marcel-freebsd". Unread portion of the kernel message buffer: Fatal trap 12: page fault while in kernel mode cpuid = 0; apic id = 00 fault virtual address = 0x0 fault code = supervisor read, page not present instruction pointer = 0x20:0xc071886b stack pointer = 0x28:0xd4422b74 frame pointer = 0x28:0xd4422b80 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, def32 1, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 14 (swi1: net) trap number = 12 panic: page fault cpuid = 0 Uptime: 9m26s Dumping 510 MB (2 chunks) chunk 0: 1MB (158 pages) ... ok chunk 1: 510MB (130544 pages) 494 478 462 446 430 414 398 382 366 350 334 318 302 286 270 254 238 222 206 190 174 158 142 126 110 94 78 62 46 30 14 #0 doadump () at pcpu.h:166 166 __asm __volatile("movl %%fs:0,%0" : "=r" (td)); (kgdb) bt #0 doadump () at pcpu.h:166 #1 0xc066f042 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:402 #2 0xc066f369 in panic (fmt=0xc088de83 "%s") at /usr/src/sys/kern/kern_shutdown.c:558 #3 0xc083ca50 in trap_fatal (frame=0xd4422b34, eva=0) at /usr/src/sys/i386/i386/trap.c:870 #4 0xc083c78f in trap_pfault (frame=0xd4422b34, usermode=0, eva=0) at /usr/src/sys/i386/i386/trap.c:778 #5 0xc083c3a5 in trap (frame= {tf_fs = 8, tf_es = 40, tf_ds = 40, tf_edi = -1014804444, tf_esi = 0, tf_ebp = -733860992, tf_isp = -733861024, tf_ebx = -997587984, tf_edx = -997587984, tf_ecx = -1020269616, tf_eax = -733860800, tf_trapno = 12, tf_err = 0, tf_eip = -1066301333, tf_cs = 32, tf_eflags = 66118, tf_esp = -997587984, tf_ss = 0}) at /usr/src/sys/i386/i386/trap.c:463 #6 0xc082841a in calltrap () at /usr/src/sys/i386/i386/exception.s:138 #7 0xc071886b in tcp_timewait (tw=0x0, to=0xd4422c40, th=0xc3835024, m=0xc3801b00, tlen=0) at pcpu.h:166 #8 0xc0715a44 in tcp_input (m=0xc3801b00, off0=20) at /usr/src/sys/netinet/tcp_input.c:763 #9 0xc070ee6d in ip_input (m=0xc3801b00) at /usr/src/sys/netinet/ip_input.c:656 #10 0xc06eb98b in netisr_processqueue (ni=0xc0971e38) at /usr/src/sys/net/netisr.c:236 #11 0xc06ebb8a in swi_net (dummy=0x0) at /usr/src/sys/net/netisr.c:349 #12 0xc0659e49 in ithread_execute_handlers (p=0xc32fd468, ie=0xc333f100) at /usr/src/sys/kern/kern_intr.c:662 #13 0xc0659f69 in ithread_loop (arg=0xc32dc840) at /usr/src/sys/kern/kern_intr.c:745 #14 0xc0658d61 in fork_exit (callout=0xc0659f14 <ithread_loop>, arg=0xc32dc840, frame=0xd4422d38) at /usr/src/sys/kern/kern_fork.c:819 #15 0xc082847c in fork_trampoline () at /usr/src/sys/i386/i386/exception.s:199 (kgdb) frame 8 #8 0xc0715a44 in tcp_input (m=0xc3801b00, off0=20) at /usr/src/sys/netinet/tcp_input.c:763 763 if (tcp_timewait(intotw(inp), &to, th, m, tlen)) (kgdb) p inp $1 = (struct inpcb *) 0xc48a03f0 (kgdb) p *inp $2 = {inp_hash = {le_next = 0xc48ad690, le_prev = 0xc4962348}, inp_list = { le_next = 0xc48cddc8, le_prev = 0xc48d2dd0}, inp_flow = 0, inp_inc = { inc_flags = 0 '\0', inc_len = 0 '\0', inc_pad = 0, inc_ie = { ie_fport = 4364, ie_lport = 20480, ie_dependfaddr = {ie46_foreign = { ia46_pad32 = {0, 0, 0}, ia46_addr4 = {s_addr = 84650176}}, ie6_foreign = {__u6_addr = { __u6_addr8 = '\0' <repeats 12 times>, "ĀĻ\v\005", __u6_addr16 = { 0, 0, 0, 0, 0, 0, 43200, 1291}, __u6_addr32 = {0, 0, 0, 84650176}}}}, ie_dependladdr = {ie46_local = {ia46_pad32 = {0, 0, 0}, ia46_addr4 = {s_addr = 51095744}}, ie6_local = { __u6_addr = {__u6_addr8 = '\0' <repeats 12 times>, "ĀĻ\v\003", __u6_addr16 = {0, 0, 0, 0, 0, 0, 43200, 779}, __u6_addr32 = {0, 0, 0, 51095744}}}}}}, inp_ppcb = 0x0, inp_pcbinfo = 0xc0972ba0, inp_socket = 0xc494a530, inp_label = 0x0, inp_flags = 8388608, inp_sp = 0x0, inp_vflag = 41 ')', inp_ip_ttl = 64 '_at_', inp_ip_p = 0 '\0', inp_ip_minttl = 0 '\0', inp_depend4 = {inp4_ip_tos = 0 '\0', inp4_options = 0x0, inp4_moptions = 0x0}, inp_depend6 = { inp6_options = 0x0, inp6_outputopts = 0x0, inp6_moptions = 0x0, inp6_icmp6filt = 0x0, inp6_cksum = 0, inp6_ifindex = 0, inp6_hops = 0}, inp_portlist = {le_next = 0xc48cddc8, le_prev = 0xc48d2e44}, inp_phd = 0xc341e780, inp_gencnt = 926118, inp_mtx = {mtx_object = { lo_name = 0xc08b6e09 "inp", lo_type = 0xc08b4936 "tcpinp", lo_flags = 21692416, lo_witness_data = {lod_list = {stqe_next = 0x0}, lod_witness = 0x0}}, mtx_lock = 3274697680, mtx_recurse = 0}} (kgdb) p *inp->inp_socket $3 = {so_count = 1, so_type = 1, so_options = 12, so_linger = 0, so_state = 8192, so_qstate = 0, so_pcb = 0xc48a03f0, so_proto = 0xc093a7c8, so_head = 0x0, so_incomp = {tqh_first = 0x0, tqh_last = 0x0}, so_comp = { tqh_first = 0x0, tqh_last = 0x0}, so_list = {tqe_next = 0x0, tqe_prev = 0xc37a4bd0}, so_qlen = 0, so_incqlen = 0, so_qlimit = 0, so_timeo = 0, so_error = 0, so_sigio = 0x0, so_oobmark = 0, so_aiojobq = { tqh_first = 0x0, tqh_last = 0xc494a578}, so_rcv = {sb_sel = {si_thrlist = { tqe_next = 0x0, tqe_prev = 0xc3ecaa50}, si_thread = 0x0, si_note = { kl_list = {slh_first = 0x0}, kl_lock = 0xc0653644 <knlist_mtx_lock>, kl_unlock = 0xc065367c <knlist_mtx_unlock>, kl_locked = 0xc06536b8 <knlist_mtx_locked>, kl_lockarg = 0xc494a5a4}, si_flags = 0}, sb_mtx = {mtx_object = {lo_name = 0xc08adcf7 "so_rcv", lo_type = 0xc08adcf7 "so_rcv", lo_flags = 16973824, lo_witness_data = { lod_list = {stqe_next = 0x0}, lod_witness = 0x0}}, mtx_lock = 4, mtx_recurse = 0}, sb_state = 32, sb_mb = 0x0, sb_mbtail = 0x0, sb_lastrecord = 0x0, sb_cc = 0, sb_hiwat = 66608, sb_mbcnt = 0, sb_mbmax = 262144, sb_ctl = 0, sb_lowat = 1, sb_timeo = 0, sb_flags = 0}, so_snd = {sb_sel = {si_thrlist = {tqe_next = 0x0, tqe_prev = 0x0}, si_thread = 0x0, si_note = {kl_list = {slh_first = 0x0}, kl_lock = 0xc0653644 <knlist_mtx_lock>, kl_unlock = 0xc065367c <knlist_mtx_unlock>, kl_locked = 0xc06536b8 <knlist_mtx_locked>, kl_lockarg = 0xc494a610}, si_flags = 0}, sb_mtx = {mtx_object = {lo_name = 0xc08adcf0 "so_snd", lo_type = 0xc08adcf0 "so_snd", lo_flags = 16973824, lo_witness_data = { lod_list = {stqe_next = 0x0}, lod_witness = 0x0}}, mtx_lock = 4, mtx_recurse = 0}, sb_state = 16, sb_mb = 0x0, sb_mbtail = 0x0, sb_lastrecord = 0x0, sb_cc = 0, sb_hiwat = 33304, sb_mbcnt = 0, sb_mbmax = 262144, sb_ctl = 0, sb_lowat = 2048, sb_timeo = 0, sb_flags = 0}, so_upcall = 0, so_upcallarg = 0x0, so_cred = 0xc3b91600, so_label = 0x0, so_peerlabel = 0x0, so_gencnt = 622463, so_emuldata = 0x0, so_accf = 0x0} (kgdb) p *inp->inp_ppcb Attempt to dereference a generic pointer. (kgdb) frame 7 #7 0xc071886b in tcp_timewait (tw=0x0, to=0xd4422c40, th=0xc3835024, m=0xc3801b00, tlen=0) at pcpu.h:166 166 __asm __volatile("movl %%fs:0,%0" : "=r" (td)); (kgdb) p tw $4 = (struct tcptw *) 0x0 (kgdb) p *tw Cannot access memory at address 0x0 (kgdb) p *to $5 = {to_flags = 49, to_tsval = 97163995, to_tsecr = 0, to_mss = 1460, to_requested_s_scale = 0 '\0', to_nsacks = 0 '\0', to_sacks = 0x0} (kgdb) p *th $6 = {th_sport = 4364, th_dport = 20480, th_seq = 2749703150, th_ack = 0, th_x2 = 0, th_off = 10, th_flags = 2 '\002', th_win = 57344, th_sum = 0, th_urp = 0} (kgdb) Is more information required? -- Kazuaki OdaReceived on Wed Apr 05 2006 - 00:57:14 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:38:54 UTC