Re: dhclient taking all cpu

From: Eric Anderson <anderson_at_centtech.com>
Date: Wed, 27 Jul 2005 14:35:06 -0500
Brooks Davis wrote:
> On Tue, Jul 26, 2005 at 04:39:33PM -0700, Brooks Davis wrote:
> 
>>On Tue, Jul 26, 2005 at 06:53:17PM -0400, Jung-uk Kim wrote:
>>
>>>On Tuesday 26 July 2005 04:00 pm, Wilko Bulte wrote:
>>>
>>>>On Tue, Jul 26, 2005 at 12:33:24PM -0700, Brooks Davis wrote..
>>>>
>>>>
>>>>>On Mon, Jul 25, 2005 at 10:39:09PM -0400, Mike Jakubik wrote:
>>>>>
>>>>>>On Mon, July 25, 2005 9:54 pm, Brooks Davis said:
>>>>>>
>>>>>>>>>Probably something wrong with your interface, but you
>>>>>>>>>havent't provided any useful information so who knows.  At
>>>>>>>>>the very least, I need to know what interface you are
>>>>>>>>>running on, something about it's status, and if both
>>>>>>>>>dhclient processes are running.
>>>>>>>>
>>>>>>>>The interface is xl0 (3Com 3c905C-TX Fast Etherlink XL), and
>>>>>>>>it worked in this machine fine for as long as i remember.
>>>>>>>>This seems to have happened since a recent cvsup and
>>>>>>>>buildworld from ~6-BETA to 7-CURRENT. I rebooted three
>>>>>>>>times, and the problem occured rougly a minute after bootup.
>>>>>>>>On the fourth time however, it seems to be ok so far.
>>>>>>>
>>>>>>>That sounds like a problem with the code that handles the
>>>>>>>link state notifications in the interface driver.  The
>>>>>>>notifications are a reletivly new feature that we're only now
>>>>>>>starting to use heavily so there are going to be bumps in the
>>>>>>>road.  It would be intresting to know if you see link state
>>>>>>>messages promptly if you plug and unplug the network cable.
>>>>>>
>>>>>>It seems to be back at it again, this time it took longer to
>>>>>>kick in. Here is a "ps auxw|grep dhclient" :
>>>>>>
>>>>>>_dhcp      219 93.5  0.2  1484  1136  ??  Rs    8:49PM  
>>>>>>5:06.00 dhclient: xl0 (dhclient)
>>>>>>root       193  0.0  0.2  1484  1088  d0- S     8:49PM  
>>>>>>0:00.02 dhclient: xl0 [priv] (dhclient)
>>>>>>
>>>>>>top:
>>>>>>
>>>>>>  PID USERNAME      THR PRI NICE   SIZE    RES STATE    TIME  
>>>>>>WCPU COMMAND 219 _dhcp           1 129    0  1484K  1136K RUN  
>>>>>>   9:33 94.24% dhclient
>>>>>>
>>>>>>Nothing in dmesg about link state changes on xl0. Unplugging
>>>>>>and replugging the network cable results in link state
>>>>>>notification within a couple seconds.
>>>>>
>>>>>Could you see what happens if you run dhclient in the foreground?
>>>>> Just running "dhclient -d xl0" should do it.  I'd like to know
>>>>>what sort of output it's generating.
>>>>
>>>>In my case it is not displaying anything:
>>>>
>>>>
>>>>chuck#dhclient -d ath0
>>>>DHCPREQUEST on ath0 to 255.255.255.255 port 67
>>>>DHCPACK from 192.168.5.254
>>>>bound to 192.168.5.20 -- renewal in 21600 seconds.
>>>>
>>>><nothing>
>>>>
>>>>I can tell the phenomenon occurs when my laptop fan springs to
>>>>life:
>>>>
>>>>CPU states: 96.5% user,  0.0% nice,  2.7% system,  0.8% interrupt, 
>>>>0.0% idle
>>>>Mem: 48M Active, 28M Inact, 50M Wired, 680K Cache, 34M Buf, 115M
>>>>Free Swap: 257M Total, 257M Free
>>>>
>>>>  PID USERNAME  THR PRI NICE   SIZE    RES STATE    TIME   WCPU
>>>>COMMAND 719 _dhcp       1 129    0  1384K  1092K RUN      2:14
>>>>93.55% dhclient 607 root        1  98    0 34584K 21212K select  
>>>>0:09  1.81% Xorg 663 wb          4  20    0 46712K 40224K kserel  
>>>>0:27  0.00% mozilla-bin 503 root        1   8    0  1184K   796K
>>>>nanslp   0:07  0.00% powerd
>>>>
>>>>Took (best guess) approx 5-10 minutes for the effect to kick in.
>>>
>>>FYI, I have the same issues with bge(4) and ndis(4).
>>
>>I've seen it on ath and em interfaces now, but am not sure what's going
>>on. and have no idea how to reproduce the problem.  As also reported by
>>Bakul Shah, we seem to be getting into a state where receive_packet() is
>>spinning.  I'm not seeing an obvious way for this to be possible.
> 
> 
> I think I've found it.  There was a really odd typo (= instead of +) in
> the code that handles undersized captures on the bpf socket.  Please try
> the following patch and see if it solves the problem.  I'm testing here,
> but I don't have a reliable way to trigger the bug.  The fix is fairly
> obvious so I'll commit it to head shortly.

It's been 20 minutes without any issues - I think that did it.  Thanks!

Eric




-- 
------------------------------------------------------------------------
Eric Anderson        Sr. Systems Administrator        Centaur Technology
Anything that works is better than anything that doesn't.
------------------------------------------------------------------------
Received on Wed Jul 27 2005 - 17:35:17 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:38:39 UTC