Lousy timing (no pun intended -- it's early in the day for me), given the recent MFC, but as I was booting my laptop to yesterday's head: FreeBSD g1-245.catwhisker.org 11.0-CURRENT FreeBSD 11.0-CURRENT #127 r285652M/285652:1100077: Fri Jul 17 04:30:16 PDT 2015 root_at_g1-245.catwhisker.org:/common/S3/obj/usr/src/sys/CANARY amd64 to build today's head (_at_r285670; still in progress as I type), I happened to note [Oh, great -- we can no longer copy/paste from console now??!? Fine, I'll transcribe by hand.... :-(]: ... bound to 172.17.1.245 -- renewal in 43200 seconds. pid 544 (ntpd), uid 0: exited on signal 11 (core dumped) Starting Network: lo0 em0 iwn0 lagg0. ... Trying to examine the /ntpd.core, I see: root_at_g1-245:/ # gdb `which ntpd` ntpd.core GNU gdb 6.1.1 [FreeBSD] Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "amd64-marcel-freebsd"...(no debugging symbols found)... Core was generated by `ntpd'. Program terminated with signal 11, Segmentation fault. Reading symbols from /lib/libm.so.5...(no debugging symbols found)...done. Loaded symbols for /lib/libm.so.5 Reading symbols from /lib/libcrypto.so.7...(no debugging symbols found)...done. Loaded symbols for /lib/libcrypto.so.7 Reading symbols from /lib/libthr.so.3...(no debugging symbols found)...done. Loaded symbols for /lib/libthr.so.3 Reading symbols from /lib/libc.so.7...(no debugging symbols found)...done. Loaded symbols for /lib/libc.so.7 Reading symbols from /libexec/ld-elf.so.1...(no debugging symbols found)...done. Loaded symbols for /libexec/ld-elf.so.1 #0 0x00000008011cd6a0 in sbrk () from /lib/libc.so.7 [New Thread 801c07400 (LWP 100122/<unknown>)] [New Thread 801c06400 (LWP 100120/<unknown>)] (gdb) bt #0 0x00000008011cd6a0 in sbrk () from /lib/libc.so.7 #1 0x00000008ccbd4f34 in ?? () #2 0x0000000000000005 in ?? () #3 0x0000000801800448 in ?? () #4 0x00000008011ca888 in sbrk () from /lib/libc.so.7 #5 0x00000008018000c8 in ?? () #6 0x00000008018000c0 in ?? () #7 0x0000000000000208 in ?? () #8 0x0000000801c32fb0 in ?? () #9 0x0000000000000001 in ?? () #10 0x0000000801cc20c8 in ?? () #11 0x0000000000000030 in ?? () #12 0x0000000801cc20c8 in ?? () #13 0x00007fffffffe480 in ?? () #14 0x00000008011cd240 in sbrk () from /lib/libc.so.7 #15 0x0000000000000280 in ?? () #16 0x00000008014bbc70 in malloc_message () from /lib/libc.so.7 #17 0x00000008018000c0 in ?? () #18 0x0000000801800448 in ?? () #19 0x0000000000000032 in ?? () #20 0x0000000801800458 in ?? () #21 0x00000008014bbc68 in malloc_message () from /lib/libc.so.7 #22 0x0000000801cc2000 in ?? () ---Type <return> to continue, or q <return> to quit--- #23 0x00000008014bba60 in malloc_message () from /lib/libc.so.7 #24 0x0000000801cc20d8 in ?? () #25 0x00000000000000a0 in ?? () #26 0x0000000000000208 in ?? () #27 0x00007fffffffe4d0 in ?? () #28 0x00000008011bdd7a in _malloc_thread_cleanup () from /lib/libc.so.7 Previous frame inner to this frame (corrupt stack?) (gdb) which seems... well, not especially useful, as far as I can tell. This is (as mentioned above) on my laptop; as such, it is expected to "wander" from one network to another. Accordingly: * Since it could be connected to a network I do not control, I use a packet filter (IPFW, in my case) to reduce my exposure from a possibly-hostile network. * Rather than enabling ntpd in /etc/rc.conf, I use /etc/dhclient-exit-hooks to start ntpd after the laptop has a DHCP lease. (For networks I control, I also set up the DHCP server to advertise what NTP server the DHCP clients should use, but the code in dhclient-exit-hooks merely prefers that, rather han requiring it.) * In my world-view -- at least for networks I control -- DNS zone files are the Source of Truth with respect to hostname <-> IP address correspondence, and Dynamic DNS is Evil. I populate my zone files with appropriate A & PTR records so that every assignable DHCP address has a PTR record, and the hostname to which it points has an A record that points back to that IP address. Accordingly, I also use /etc/dhclient-exit-hooks so the laptop can find out what its hostname is, and set it accordingly. Mind, I've been doing the above for well over a decade, so that doesn't qualify as "new." And most of the time, it Just Works (which is a significant reason I keep doing it). A couple of other things that are more recent, and possibly of relevance: * As alluded to above, I have the em0 & wlan0 (iwn(4)) NICs set up using Link Aggregation in "failover" mode. In practice, I rarely use the em0 (wired) NIC -- I had originally done that based on a misperception of how I thought things were set up at work, and then just left the configuration alone and relied on the wireless NIC. (At home, I have things set up so that the failover would work, but doing so would be a little awkward for reasons that aren't relevant here.) * I have the laptop configured to run xdm(1)... after the DHCP lease is acquired and the hostname is set. My ~/.xsession script is set up so it fires up ssh-agent, requests a passphrase, and then (among other things) establishes an SSH session to the "mail hub" at home and re-establish a tmux session where I'm running mutt to handle my email. I've noticed that in head, these connections sometimes fail to get initialized, and sometimes will time out, while sessions started a few minutes later will have no problem. That seems peculiar, but was sufficiently ... well, "nebulous" that I didn't think it warranted a whine of its own here. But on the chance that it's related to ntpd giving up the ghost prematurely, it seemed but a reasonable exercise of "Full Disclosure" to mention it in this context -- even though it's also something I've been doing since the (late) 1990s. So: Any suggestions for either diagnosing what the root cause is or changing the configuration so that the failure no longer occurs? Thanks! Peace, david -- David H. Wolfskill david_at_catwhisker.org Those who murder in the name of God or prophet are blasphemous cowards. See http://www.catwhisker.org/~david/publickey.gpg for my public key.
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:58 UTC