RE: New interrupt stuff breaks ASUS 2 CPU system

From: Bruce Evans <bde_at_zeta.org.au>
Date: Fri, 7 Nov 2003 20:04:11 +1100 (EST)
On Thu, 6 Nov 2003, John Baldwin wrote:

> On 06-Nov-2003 Harti Brandt wrote:
> > JB>I figured out what is happenning I think.  You are getting a spurious
> > JB>interrupt from the 8259A PIC (which comes in on IRQ 7).  The IRR register
> > JB>lists pending interrupts still waiting to be serviced.  Try using
> > JB>'options NO_MIXED_MODE' to stop using the 8259A's for the clock and see if
> > JB>the spurious IRQ 7 interrupts go away.
> >
> > Ok, that seems to help. Interesting although why do these interrupts
> > happen only with a larger HZ and when the kernel is doing printfs (this
> > machine has a serial console). I have also not tried to disable SIO2 and
> > the parallel port.
>
> Can you also try turning mixed mode back on and using
> http://www.FreeBSD.org/~jhb/patches/spurious.patch
>
> You should get some stray IRQ 7's in the vmstat -i output as well as a few
> printf's to the kernel console.

Other changes fixed the problem with the apic case not working on my BP6,
except the apic causes many more interrupts on serial ports at 921600 bps,
almost enough to overload the system with just 2 active serial ports.
I've now gathered lots of statistics for sio interrupt performance.  The
bad effect of the apic on performance is shown in the "-current(apic)"
lines for a45 and a45b only:

%%%
Keywords:
c04 = send at 115200 bps on cuac00, receive at 115200 bps on cuac04
c04b = like c04 plus send and receive in other direction too (b = bidirectional)
  (cuac* are on a Cyclades 8yo (2 * cd1400 isa))
a01 = like c04 except use ports cuaa[01]
a01b = like a01 except bidirectional
  (cuaa[01] are standard motherboard 16550 clones)
a45 = like a01 except use speed 921600 bps and ports cuaa[45]
a45b = like a45 except bidirectional
  (cuaa[45] are on a VScom 200HV2 (2 * 16950 pci))
-current(ointr) = -current before new interrupt code
-current = plain current (2003/11/06)
-current(apic) = -current with apic configured for UP kernel on SMP hardware
-current(bde) = my version of -current (new interrupt code not merged yet)
&+iir,+stream,+intr0 = my version of -current with variants of sio
  optimizations (only UART-independent ones; optimizations for 16950 UARTs
  give factor of 2 reduction in overheads)

Overheads for doing above I/O in percent (min-max for 3 runs) on an ABIT BP6
with 366 MHz and 400 MHz Celerons:

Devices		OS		UP		SMP
-------		--		--		---
c04		RELENG_4(4.9)	6.58-6.59	Not measured (method problems)
		-current(ointr)	9.65-9.76	6.77-7.11
		-current	10.64-10.69	6.09-6.36
		-current(apic)	9.63-9.90	As above (apic standard)
		-current(bde)	6.83-6.96	3.54-3.78
c04b		RELENG_4(4.9)	12.83-12.90	Not measured (method problems)
		-current(ointr)	19.42-19.44	13.70-13.90
		-current	20.23-20.24	12.01-12.48
		-current(apic)	17.77-17.89	As above (apic standard)
		-current(bde)	12.74-13.23	6.23-6.53
a01		RELENG_4(4.9)	7.50-7.50	Not measured (method problems)
		-current(ointr)	7.67-7.69	4.44-4.77
		-current	8.09-8.13	4.72-5.60
		-current(apic)	7.75-8.02	As above (apic standard)
		-current(bde)	7.53-7.63	4.49-4.54
		&+iir		7.09-7.30	Not measured (kernel problems)
		&+stream	6.23-6.24
		&+iir+stream	5.47-5.52
		&+intr0+iir	5.24-5.26	2.75-2.91
a01b		RELENG_4(4.9)	14.64-14.84	Not measured (method problems)
		-current(ointr)	14.36-15.10	8.65-8.92
		-current	14.79-14.87	8.18-9.77
		-current(apic)	14.80-14.91	As above (apic standard)
		-current(bde)	14.19-14.24	8.13-8.46
		&+iir		14.05-14.13
		&+stream	12.12-12.17
		&+iir+stream	10.58-10.62
		&+intr0+iir	10.07-10.12	5.10-5.63
a45		RELENG_4(4.9)	21.81-21.86	Not measured (method problems)
		-current(ointr)	24.00-24.04	13.3
		-current	25.13-25.20	31.4-31.5(86)
		-current(apic)	51.02-51.05(87)	As above (apic standard)
		-current(bde)	21.83-22.02	10.71-10.89
		&+iir		21.98-22.05
		&+stream	27.78-27.81
		&+iir+stream	22.08-22.16
		&+intr0+iir	16.76-16.92	6.85-8.11
a45b		RELENG_4(4.9)	46.23-46.44(87)	Not measured (method problems)
		-current(ointr)	54.01-54.37(86)	25.2 (82/82)
		-current	56.04-56.93(85)	70.1-70.7(80)
		-current(apic)	87.35-88.22(78)	As above (apic standard)
		-current(bde)	42.06-42.12	Not measured (kernel problems)
		&+iir		44.60-44.75(91/90/90/89)
		&+stream	52.64-52.99(89/89/88/87)
		&+iir+stream	41.01-41.05(92/91/92/91)
		&+intr0+iir	32.57-32.70	17.11-17.31

Notes:
1. Measurements for RELENG_3 are missing because my kernels don't quite boot.
2. Measurements for RELENG_4 are with -current utilities (with adjustments to
   stop them using new syscalls so that they can run under RELENG_4).
3. Measurements for RELENG_4-SMP are missing because the test methodology
   doesn't work there.  It involves running 1 process that loops incrementing
   a countr and another process that does the i/o using select(), and seeing
   how far the counter gets while doing i/o compared with when not doing i/o.
   This is very accurate for UP, but I wouldn't have expected it to work at
   all for SMP.  However, it gives very believable results under SMP in
   -current although it gives the expected  garbage results under SMP in
   RELENG_4.
4. Numbers in parentheses like 87 in 51.02-51.05(87) mean that the throughput
   was only about 87000 cps instead of the expected 92160 cps.  There is a
   latency problem that prevents full throughput in old version of sio.
   Extra interrupt latency in -current increases this problem.  Extra
   interrupts for the apic case also increases it.
%%%

The pessimization of fast interrupt handlers from changing the PIC masks
can be seen in the non-apic rows of the UP column.  It takes 694 nsec to
write the PIC mask on this machine.  c04 is on irq10 so there are 4 PIC
i/o's where there were none before.  The other UARTs are on iqs < 8, so
there are only 2 PIC i/o's where there were none before.  AUTO_EOI_1 is
enabled, but AUTO_EOI_2 is not, so c04 gets pessimized by a PIC write
for EOI in all non-apic UP versions.

Using the apic makes this problem moot for sio and probably for most
fast interrupt handlers that do real i/o, since the apic is apparently
much faster than the i/o (i/o on the UARTs in this system takes between
150 nsec (cuaa[45] best case), 480 nsec (cuaa[45] most cases) and 1513
nsec (cuaa[01] all cases).  I haven't measured apic access times).

However, using the apic almost doubles the overheads for the a45 cases.
This seems to be due to extra interrupts.  The UART and/or driver already
have weird (partly benign) behaviour related to the number of interrupts.
To transmit and receive (unidirectional) at 921600 bps with a 16-byte tx
fifo and an 8-byte rx fifo threshold, I would expect about 5760 tx
interrupts and 11520 rx interrupts per second, with some (many?)
coalesced because the interrupt is shared.  The actual numbers for the
PIC case are 5760-epsilon (total) for cuaa4 -> cuaa5 and 11520-epsilon
(total) for the reverse direction.  For the apic case, at least the
first number is at least doubled.  I forget the details.  The asymmetry
is probably caused by the driver polling cuaa4 before cuaa5.  The driver
doesn't really understand the shared interrupts directed to it by puc(4)
(it only understands its internal COM_MULTIPORT support for shared
interrupts) and it often finds things to do that it wouldn't if it
were purely interrupt driven.  This reduces performance and increases
latency, especially if the apic (or whatever it is) generates extra
interrupts.  I haven't tested the apic behaviour at low speeds on
cuaa[45].  Low speeds cause another problem with extra interrupts:
there are a lot for apparently-spurious modem status changes at low
speeds only.

Bruce
Received on Fri Nov 07 2003 - 00:04:21 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:37:28 UTC