Re: panic: in_pcblookup_local (?)

From: Glen Barber <gjb_at_FreeBSD.org> Date: Thu, 2 May 2013 06:42:19 -0400 · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:37 UTC

On Thu, May 02, 2013 at 10:27:39AM +0100, Robert N. M. Watson wrote:
> 
> On 2 May 2013, at 01:57, Glen Barber wrote:
> 
> > So, I am admittedly not too familiar with DDB.  In fact, I just now
> > realize the kernel is built without DDB...
> 
> DDB is a very powerful tool in that it's been custom-developed
> to help debug common kernel panics. It lacks some of the flexibility,
> and especially the data-type awareness of GDB, but GDB is a less
> well-suited tool when investigating common crash patterns. I'll
> usually start out debugging in DDB, and find that 90% of my
> in-development panics can be debugged with it, resorting to GDB for
> post-mortem analyses in production or particularly hard debugging
> cases (usually where DDB's pretty printers for data types fall
> short). I've wanted, for a long time, to teach DDB how to pretty-print
> arbitrary types using DTrace's CTF meta-data, which would address
> the most significant major case where I turn to GDB. Mind you, the
> limitations I see in GDB are made up for in most part by John's GDB
> scripts :-).
> 

Hmm.  Perhaps it would be worthwhile for me to rebuild the current
kernel with DDB support.  It looks like the machine has panicked a few
times over the last two weeks or so, but based on the timestamps of the
crash dumps and nagios complaints, happened during the middle of the
night when I would not have really noticed, or otherwise would have just
blamed my ISP.

Two of the panics are ath(4) related.  One looks similar to the one
referenced in this thread, similarly triggered by a CFEngine process.

In that case, the backtrace looks like:

#4 0xffffffff808cdbb3 at calltrap+0x8
#5 0xffffffff807371d8 at in_pcb_lport+0x128
#6 0xffffffff8073745a at in_pcbbind_setup+0x16a
#7 0xffffffff80737d8e at in_pcbconnect_setup+0x71e
#8 0xffffffff80737df9 at in_pcbconnect_mbuf+0x59
#9 0xffffffff807bf29f at udp_connect+0x11f
#10 0xffffffff80680615 at kern_connectat+0x275

Regarding DDB though, it would be rather difficult to access the machine
if it drops to a DDB debugger session, since the machine acts as my
firewall.

> >> Put those in a dir and do 'source gdb6'.  You can then run 'ps' to get a good 
> >> ps listing that includes threads.  You can also use 'thread apply all bt' to 
> >> get stacktraces of all threads in kgdb.  I believe there is an 'allpcpu' 
> >> command that is similar to 'show allpcpu' in DDB.
> > 
> > I have the outputs of 'ps', 'allpcpu', and 'thread apply all bt' saved
> > to separate script(1) files.  Is there anything in particular I can look
> > for before uploading the files somewhere public?  At quick-ish look
> > though, I did not see anything cf-agent (the current process at time of
> > panic) related.
> 
> To be honest, it's probably easiest if I just take a look at it
> and see what I see. In as much as I find interesting things, I'll
> follow up explaining what they are. We may find we can't track this
> problem down from the data we have -- but it's worth a try.
> 

Sure.  The files are available here:

    https://www.glenbarber.us/stuff/in_pcblookup_local/vmcore.4.ps.txt
    https://www.glenbarber.us/stuff/in_pcblookup_local/vmcore.4.allpcpu.txt
    https://www.glenbarber.us/stuff/in_pcblookup_local/vmcore.4.thread_apply_all_bt.txt

Thanks to both of you for looking into this.

Glen