Re: em(4) watchdog timeout

From: Jack Vogel <jfvogel_at_gmail.com>
Date: Fri, 21 Jul 2006 09:27:31 -0700
On 7/21/06, Jeremie Le Hen <jeremie_at_le-hen.org> wrote:
> Hi,
>
> I am running a two month old current (dated from May 24), and I am
> experiencing watchdog timeouts with my em(4) adapter when running
> some CPU bound workload involving a computational perl script.
> Unfortunately this bugs occurs very infrequently, I can't trigger
> it each time I run this job.
>
> FWIW, the command line is something like this :
> %   gzip -dc data.gz | perlscript > chewed_data
>
> I recompiled em(4) with DEBUG_INIT, DEBUG_IOCTL and DEBUG_HW
> all set to 1, but it doesn't seem to provide valuable information :
>
> % Jul 21 11:17:14 neuneuf kernel: em0: watchdog timeout -- resetting
> % Jul 21 11:17:14 neuneuf kernel: em_init: begin
> % Jul 21 11:17:14 neuneuf kernel: em_stop: begin
> % Jul 21 11:17:14 neuneuf kernel: free_transmit_structures: begin
> % Jul 21 11:17:14 neuneuf kernel: free_receive_structures: begin
> % Jul 21 11:17:14 neuneuf kernel: em_init: pba=48K
> % Jul 21 11:17:14 neuneuf kernel: em_hardware_init: begin
> % Jul 21 11:17:14 neuneuf kernel: em_initialize_transmit_unit: begin
> % Jul 21 11:17:14 neuneuf kernel: Base = 1ebf9000, Length = 1000
> % Jul 21 11:17:14 neuneuf kernel:
> % Jul 21 11:17:14 neuneuf kernel: em_set_multi: begin
> % Jul 21 11:17:14 neuneuf kernel: em_initialize_receive_unit: begin
> % Jul 21 11:17:14 neuneuf kernel: em0: link state changed to DOWN
> % Jul 21 11:17:16 neuneuf kernel: em0: link state changed to UP
> % Jul 21 11:17:16 neuneuf kernel: ioctl rcv'd: SIOCxIFMEDIA (Get/Set Interface Media)
> % Jul 21 11:17:16 neuneuf kernel: em_media_status: begin
>
> The ship is:
> % em0_at_pci3:11:0:  class=0x020000 card=0x02871014 chip=0x10138086 rev=0x00 hdr=0x00
> %     vendor   = 'Intel Corporation'
> %     device   = '82541EI Gigabit Ethernet Controller (Copper)'
> %     class    = network
> %     subclass = ethernet
>
> The interrupt is shared with uhci0:
> % neuneuf:/sys:112# vmstat -i
> % interrupt                          total       rate
> % irq1: atkbd0                       39216          0
> % irq14: ata0                      4801030          3
> % irq16: em0 uhci0++             919491852        688
> % irq19: uhci1                       35141          0
> % irq23: ehci0                           1          0
> % cpu0: timer                   2670435076       1999
> % Total                         3594802316       2692
>
> I can't try DEVICE_POLLING right now since IIRC I should recompile the whole
> kernel (right now I am using the if_em module so that I can tune the driver
> without rebooting).

hitting watchdog means you have a hang of some sort.
try 'sysctl dev.em.0.debug_info=1' and see if that gives any clues.

Jack
Received on Fri Jul 21 2006 - 14:27:32 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:38:58 UTC