Re: LOR No 9 and strange other kernel messages

From: Tai-hwa Liang <avatar_at_mmlab.cse.yzu.edu.tw>
Date: Mon, 7 Jun 2004 11:16:43 +0800 (CST)
On Sun, 6 Jun 2004, Don Lewis wrote:
[...]
> >> Maybe your sound client software (or something else on the system) is
> >> querying for the number of vchans.  Even "sysctl -a" will invoke the
> >> handler.
> >
> > I'm using the stock mpg123 0.59r to play MP3 files. Since I've never read
> > the source before, I'm not sure whether it utilised vchan related sysctls
> > or not.
>
> The code that calls pcm_inprog() and prints the "x: 2" debug message
> appears to be an attempt at implementing a reader/writer lock.  I'm
> pretty sure the failure that triggers the debug message is harmless,
> other than causing the sysctl() call to return EWOULDBLOCK.

I'm glad to know that the message is harmless. However, the "x: %d" is
a little too obscured to endusers IMHO. Shouldn't that be protected by
#ifdef DIAGNOSTIC or something like PCM_DEBUG?

> The most likely trigger for this event is if the client software is
> writing data to sound device and is blocked in the write() syscall when
> another thread calls sysctl().  You could find out which process is
> calling sysctl() by printing curproc->p_pid in place of or in addition
> to x.

Ah, that explained everything. I've added the diagnostic code in
sys/dev/sound/pcm/sound.c, and can 100% trigger this while mpg123 is
playing and having a "sysctl -a" running in another xterm at the same time.

x: 2, current pid: 39234 (sysctl)
x: 2, current pid: 39234 (sysctl)

It also looks like gcc will use sysctl() to retrieve physical memory size
before compilation. I guess the "x: 2" I ran into was just a result of
running gcc too frequently(make buildworld buildkernel).

>
[...]
> >> Do your sound hardware and em0 share the same irq?  If they do, that
> >
> > Nope.
> >
> > interrupt                          total       rate
> > irq0: clk                         356127         99
> > irq1: atkbd0                        6006          1
> > irq4: cbb0 em0++                 1378364        386
> > irq6: cbb1 pcm0                   110211         30
> > irq7: ppc0                             3          0
> > irq8: rtc                         455852        127
> > irq12: psm0                        16785          4
> > irq13: npx0                            1          0
> > irq14: ata0                       322853         90
> > irq15: ata1                           55          0
> > Total                            2646257        742
>
> In that case, the em0 problem is likely to be something else.  Does em0
> lock up even if you don't use the sound hardware?

Yes. It always locked up if the network load is high(w/ ACPI enabled).

>
> >> might be a clue that the problems are related.  Can you boot with ACPI
> >> disabled, and if so, does it change the symptoms?
> >
> > Whoops! My first trial on booting without ACPI -- kernel panic in
> > usb_get_next_event(). Apparently I have to disconnect the USB mouse
> > prior to kernel booting. See final section for the backtrace.
>
> Sounds like bug #3.

Any idea about how bug #3 can happen? I've tried to comment out the
#ifdef DIAGNOSTIC code around sys/dev/usb/usb.c:752 to see if the extra
checking for NULL ue helps. Unfortunately that didn't work for me: the
kernel always panic at usb_get_next_event+0x5e after starting usbd.

> > And, yes, the em0 didn't hang after booting to the non-ACPI environment.
> > However, there're still something weird happening at that moment:
> >
> > 	1. There're still a couple of "x: 2" dumped on console. Test case:
> > 	"make buildworld buildkernel -DNOCLEAN" in one xterm, mpg123 in
> > 	another xterm, and providing file downloading to another Windoze
> > 	box.
> >
> > 	2. While the remote box was downloading(Intel eepro 100/VE) files,
> > 	the number of interrupt on em0 was dramatically reduced to 300 ~ 500+
> > 	per second. It was 3000+ when ACPI was enabled(no device polling in
> > 	kernel).
> >
> > 	3. The system average load is surprising low(I'm using SCHED_4BSD),
> > 	about 0.06 ~ 0.30; however, the download was awfully slooowwwww.
> > 	It took my brother's Windoze XP about 25 minutes to complete the
> > 	whole downloading(3 ISO files, about 2GB in total). I'm pretty sure
> > 	that the mediaopt on em0 was 100baseTX + full-duplex.
>
> I wonder if the em0 interrupt is not getting enabled or is getting
> misrouted and the em0 interrupt service routing is getting triggered by
> another interrupt source that is happening at a lower rate.  Print out
> the irq mapping with ACPI disabled to see what em0 is sharing its irq
> with.  This is probably something that jhb and njl will need to look at.

Oops, it looks like I accidentally enabled the DEVICE_POLLING in kernel
configuration file and forgot to turn kern.polling.enable on.

After disabling DEVICE_POLLING and booting w/o ACPI, I can get rid of
the aforementioned em0 sluggish problem. The interrupt rate on em0 is
now 5700 ~ 7200 per second while downloading a couple of files from this
laptop, and the average speed is 10+ MBytes.

Hmm... it seems that I have to disable USB mouse and ACPI to get em0
back to work at this moment.

I'm not quite sure about where the "IRQ mapping" you're talking about to
retrieve from; therefore, I assume that came from dmesg:

Found $PIR table, 15 entries at 0xc00fdea0
PCI-Only Interrupts: none
Location  Bus Device Pin  Link  IRQs
embedded    0    0    A   0x60  3 4 5 6 7 9 10 11 12 14 15
embedded    0    0    B   0x61  3 4 5 6 7 9 10 11 12 14 15
embedded    0    0    C   0x62  3 4 5 6 7 9 10 11 12 14 15
embedded    0    0    D   0x63  3 4 5 6 7 9 10 11 12 14 15
embedded    0    2    A   0x60  3 4 5 6 7 9 10 11 12 14 15
embedded    0    2    B   0x61  3 4 5 6 7 9 10 11 12 14 15
embedded    0    1    A   0x60  3 4 5 6 7 9 10 11 12 14 15
embedded    0    1    B   0x61  3 4 5 6 7 9 10 11 12 14 15
embedded    1    0    A   0x60  3 4 5 6 7 9 10 11 12 14 15
embedded    1    0    B   0x61  3 4 5 6 7 9 10 11 12 14 15
embedded    0   30    A   0x60  3 4 5 6 7 9 10 11 12 14 15
embedded    0   30    B   0x61  3 4 5 6 7 9 10 11 12 14 15
embedded    0   30    C   0x62  3 4 5 6 7 9 10 11 12 14 15
embedded    0   30    D   0x63  3 4 5 6 7 9 10 11 12 14 15
embedded    2    0    A   0x60  3 4 5 6 7 9 10 11 12 14 15
embedded    2    0    B   0x61  3 4 5 6 7 9 10 11 12 14 15
slot 1      2    2    A   0x62  3 4 5 6 7 9 10 11 12 14 15
slot 1      2    2    B   0x63  3 4 5 6 7 9 10 11 12 14 15
embedded    2    3    A   0x60  3 4 5 6 7 9 10 11 12 14 15
embedded    2    3    B   0x61  3 4 5 6 7 9 10 11 12 14 15
embedded    2    3    D   0x63  3 4 5 6 7 9 10 11 12 14 15
slot 2      9    0    A   0x60  3 4 5 6 7 9 10 11 12 14 15
slot 2      9    0    B   0x61  3 4 5 6 7 9 10 11 12 14 15
slot 2      9    0    C   0x62  3 4 5 6 7 9 10 11 12 14 15
slot 2      9    0    D   0x63  3 4 5 6 7 9 10 11 12 14 15
embedded    9    1    A   0x61  3 4 5 6 7 9 10 11 12 14 15
embedded    9    2    A   0x62  3 4 5 6 7 9 10 11 12 14 15
embedded    9    2    B   0x62  3 4 5 6 7 9 10 11 12 14 15
embedded    2    1    A   0x60  3 4 5 6 7 9 10 11 12 14 15
embedded    2    8    A   0x68  3 4 5 6 7 9 10 11 12 14 15
embedded    0   29    A   0x60  3 4 5 6 7 9 10 11 12 14 15
embedded    0   29    B   0x63  3 4 5 6 7 9 10 11 12 14 15
embedded    0   29    C   0x62  3 4 5 6 7 9 10 11 12 14 15
embedded    0   29    D   0x6b  3 4 5 6 7 9 10 11 12 14 15
embedded    0   31    A   0x62  3 4 5 6 7 9 10 11 12 14 15
embedded    0   31    B   0x61  3 4 5 6 7 9 10 11 12 14 15
pcib0: <Host to PCI bridge> at pcibus 0 on motherboard
pir0: <PCI Interrupt Routing Table: 15 Entries> on motherboard
$PIR: Links after initial probe:
Link  IRQ  Rtd  Ref  IRQs
0x60  255   N    10  3 4 5 6 7 9 10 11 12 14 15
0x61  255   N    10  3 4 5 6 7 9 10 11 12 14 15
0x62  255   N     8  3 4 5 6 7 9 10 11 12 14 15
0x63  255   N     6  3 4 5 6 7 9 10 11 12 14 15
0x68  255   N     1  3 4 5 6 7 9 10 11 12 14 15
0x6b  255   N     1  3 4 5 6 7 9 10 11 12 14 15
$PIR: Found matching pin for 1.0.INTA at func 0: 4
$PIR: Found matching pin for 2.0.INTA at func 0: 4
$PIR: Found matching pin for 2.0.INTB at func 1: 6
$PIR: Found matching pin for 2.2.INTA at func 0: 11
$PIR: Found matching pin for 2.1.INTA at func 0: 4
$PIR: Found matching pin for 0.29.INTA at func 0: 4
$PIR: Found matching pin for 0.29.INTB at func 1: 10
$PIR: Found matching pin for 0.29.INTC at func 2: 11
$PIR: Found matching pin for 0.29.INTD at func 7: 11
$PIR: Found matching pin for 0.31.INTA at func 1: 255
$PIR: Found matching pin for 0.31.INTB at func 3: 6
$PIR: Links after initial IRQ discovery:
Link  IRQ  Rtd  Ref  IRQs
0x60    4   Y    10  3 4 5 6 7 9 10 11 12 14 15
0x61    6   Y    10  3 4 5 6 7 9 10 11 12 14 15
0x62   11   Y     8  3 4 5 6 7 9 10 11 12 14 15
0x63   10   Y     6  3 4 5 6 7 9 10 11 12 14 15
0x68  255   N     1  3 4 5 6 7 9 10 11 12 14 15
0x6b   11   Y     1  3 4 5 6 7 9 10 11 12 14 15
$PIR: IRQs used by BIOS: 4 6 10 11
$PIR: Interrupt Weights:
[    0   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15 ]
[    0   0   0   0  10   0  10   0   0   0   6   9   0   0   0   0 ]

For the complete dmesg, please consult:

	http://www.mmlab.cse.yzu.edu.tw/~avatar/dmesg-noacpi.txt
	http://www.mmlab.cse.yzu.edu.tw/~avatar/dmesg-acpi.txt
Received on Mon Jun 07 2004 - 01:16:46 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:37:56 UTC