NFS corruption on p4 machines (please test)

From: Kris Kennaway <kris_at_obsecurity.org>
Date: Thu, 2 Oct 2003 22:43:26 -0700
For some months now I have been experiencing NFS corruption on the
three machines in the dosirak.kr package cluster - these are SMP
pentium 4 machines that run -CURRENT.  Setting DISABLE_PSE and
DISABLE_PG_G does not fix these problems.  I am able to easily
reproduce these problems using /usr/src/tools/regression/fsx on a
loopback nfs mount - they are not deterministic, but it blows up
within about 8000 operations (less than a minute of operation).  In
fact sometimes it even manages to make fsx segfault, which is fairly
impressive :)

Just mount something rw via loopback nfs, and run 'fsx foo' on the nfs
filesystem for a few minutes.

e.g.:
dosirak# fsx foo
truncating to largest ever: 0x13e76
truncating to largest ever: 0x2e52c
truncating to largest ever: 0x3c2c2
truncating to largest ever: 0x3f15f
truncating to largest ever: 0x3fcb9
ftruncate1: 30cc3
dotruncate: ftruncate: Permission denied

Is anyone else able to test this?  The three machines I see this on
have the same hardware specs, so it may be an interaction with certain
hardware.

Copyright (c) 1992-2003 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
        The Regents of the University of California. All rights reserved.
FreeBSD 5.1-CURRENT #0: Fri Sep 26 20:23:51 KST 2003
    root_at_dalki.kr.freebsd.org:/usr/obj/d/src/sys/DALKI
Preloaded elf kernel "/boot/kernel/kernel" at 0xc0588000.
Timecounter "i8254" frequency 1193182 Hz quality 0
CPU: Intel(R) XEON(TM) CPU 2.20GHz (2199.94-MHz 686-class CPU)
  Origin = "GenuineIntel"  Id = 0xf24  Stepping = 4
  Features=0x3febfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI
,MMX,FXSR,SSE,SSE2,SS,HTT,TM>
  Hyperthreading: 2 logical CPUs
real memory  = 2147418112 (2047 MB)
avail memory = 2084302848 (1987 MB)
Programming 16 pins in IOAPIC #0
IOAPIC #0 intpin 2 -> irq 0
Programming 16 pins in IOAPIC #1
Programming 16 pins in IOAPIC #2
Programming 16 pins in IOAPIC #3
FreeBSD/SMP: Multiprocessor System Detected: 4 CPUs
 cpu0 (BSP): apic id:  0, version: 0x00050014, at 0xfee00000
 cpu1 (AP):  apic id:  1, version: 0x00050014, at 0xfee00000
 cpu2 (AP):  apic id:  2, version: 0x00050014, at 0xfee00000
 cpu3 (AP):  apic id:  3, version: 0x00050014, at 0xfee00000
 io0 (APIC): apic id:  8, version: 0x000f0011, at 0xfec00000
 io1 (APIC): apic id:  9, version: 0x000f0011, at 0xfec01000
 io2 (APIC): apic id: 10, version: 0x000f0011, at 0xfec02000
 io3 (APIC): apic id: 11, version: 0x000f0011, at 0xfec03000
Pentium Pro MTRR support enabled
    ACPI-0660: *** Warning: Type override - [DEB_] had invalid type (Integer) for Scope operator, changed to (
Scope)
    ACPI-0660: *** Warning: Type override - [MLIB] had invalid type (Integer) for Scope operator, changed to (
Scope)
    ACPI-0660: *** Warning: Type override - [IO__] had invalid type (Integer) for Scope operator, changed to (
Scope)
    ACPI-0660: *** Warning: Type override - [DATA] had invalid type (String) for Scope operator, changed to (S
cope)
    ACPI-0660: *** Warning: Type override - [SIO_] had invalid type (String) for Scope operator, changed to (S
cope)
    ACPI-0660: *** Warning: Type override - [SB__] had invalid type (String) for Scope operator, changed to (S
cope)
    ACPI-0660: *** Warning: Type override - [PM__] had invalid type (String) for Scope operator, changed to (S
cope)
    ACPI-0660: *** Warning: Type override - [ICNT] had invalid type (String) for Scope operator, changed to (S
cope)
    ACPI-0660: *** Warning: Type override - [ACPI] had invalid type (String) for Scope operator, changed to (S
cope)
    ACPI-0660: *** Warning: Type override - [IORG] had invalid type (String) for Scope operator, changed to (S
cope)
    ACPI-0660: *** Warning: Type override - [SB__] had invalid type (String) for Scope operator, changed to (S
cope)
    ACPI-0660: *** Warning: Type override - [PM__] had invalid type (String) for Scope operator, changed to (S
cope)
    ACPI-0660: *** Warning: Type override - [SIO_] had invalid type (String) for Scope operator, changed to (S
cope)
    ACPI-0660: *** Warning: Type override - [PM__] had invalid type (String) for Scope operator, changed to (S
cope)
    ACPI-0660: *** Warning: Type override - [BIOS] had invalid type (Integer) for Scope operator, changed to (
Scope)
    ACPI-0660: *** Warning: Type override - [CMOS] had invalid type (Integer) for Scope operator, changed to (
Scope)
    ACPI-0660: *** Warning: Type override - [KBC_] had invalid type (Integer) for Scope operator, changed to (
Scope)
    ACPI-0660: *** Warning: Type override - [OEM_] had invalid type (Integer) for Scope operator, changed to (
Scope)
acpi0: <RCC    GCHE    > on motherboard
acpi0: Power Button (fixed)
Timecounter "ACPI-safe" frequency 3579545 Hz quality 1000
pcibios: BIOS version 2.10
Using $PIR table, 7 entries at 0xc00f4a70
acpi_timer0: <32-bit timer at 3.579545MHz> port 0x508-0x50b on acpi0
acpi_cpu0: <CPU> on acpi0
acpi_cpu1: <CPU> on acpi0
acpi_cpu2: <CPU> on acpi0
acpi_cpu3: <CPU> on acpi0
acpi_cpu4: <CPU> on acpi0
acpi_cpu5: <CPU> on acpi0
acpi_cpu6: <CPU> on acpi0
acpi_cpu7: <CPU> on acpi0
acpi_button0: <Sleep Button> on acpi0
pcib0: <ACPI Host-PCI bridge> port 0xcf8-0xcff on acpi0
pci0: <ACPI PCI bus> on pcib0
IOAPIC #1 intpin 2 -> irq 2
IOAPIC #1 intpin 1 -> irq 3
IOAPIC #1 intpin 3 -> irq 5
pci0: <display, VGA> at device 2.0 (no driver attached)
fxp0: <Intel 82550 Pro/100 Ethernet> port 0xce80-0xcebf mem 0xfe980000-0xfe99ffff,0xfe9fd000-0xfe9fdfff irq 3
at device 4.0 on pci0
fxp0: Ethernet address 00:30:48:12:59:16
miibus0: <MII bus> on fxp0
inphy0: <i82555 10/100 media interface> on miibus0
inphy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
fxp1: <Intel 82550 Pro/100 Ethernet> port 0xcf00-0xcf3f mem 0xfe9a0000-0xfe9bffff,0xfe9fe000-0xfe9fefff irq 5
at device 5.0 on pci0
fxp1: Ethernet address 00:30:48:12:49:d8
miibus1: <MII bus> on fxp1
inphy1: <i82555 10/100 media interface> on miibus1
inphy1:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
isab0: <PCI-ISA bridge> at device 15.0 on pci0
isa0: <ISA bus> on isab0
atapci0: <ServerWorks CSB5 UDMA100 controller> port 0xffa0-0xffaf,0x374-0x377,0x170-0x177,0x3f4-0x3f7,0x1f0-0x
1f7 at device 15.1 on pci0
ata0: at 0x1f0 irq 14 on atapci0
ata0: [MPSAFE]
ata1: at 0x170 irq 15 on atapci0
ata1: [MPSAFE]
pcib1: <ACPI Host-PCI bridge> on acpi0
pci1: <ACPI PCI bus> on pcib1
pcib2: <ACPI Host-PCI bridge> on acpi0
pci2: <ACPI PCI bus> on pcib2
IOAPIC #1 intpin 14 -> irq 9
IOAPIC #1 intpin 15 -> irq 10
ahc0: <Adaptec aic7899 Ultra160 SCSI adapter> port 0xe400-0xe4ff mem 0xfebfe000-0xfebfefff irq 9 at device 2.0
 on pci2
aic7899: Ultra160 Wide Channel A, SCSI Id=7, 32/253 SCBs
ahc1: <Adaptec aic7899 Ultra160 SCSI adapter> port 0xe800-0xe8ff mem 0xfebff000-0xfebfffff irq 10 at device 2.
1 on pci2
aic7899: Ultra160 Wide Channel B, SCSI Id=7, 32/253 SCBs
atkbdc0: <Keyboard controller (i8042)> port 0x64,0x60 irq 1 on acpi0
fdc0: ready for input in output
fdc0: cmd 3 failed at out byte 1 of 3
sio0 port 0x3f8-0x3ff irq 4 on acpi0
sio0: type 16550A
ppc0 port 0x778-0x77f,0x378-0x37f irq 7 drq 1 on acpi0
ppc0: Generic chipset (ECP/PS2/NIBBLE) in COMPATIBLE mode
ppc0: FIFO with 16/16/8 bytes threshold
ppbus0: <Parallel port bus> on ppc0
plip0: <PLIP network interface> on ppbus0
lpt0: <Printer> on ppbus0
lpt0: Interrupt-driven port
ppi0: <Parallel I/O> on ppbus0
fdc0: ready for input in output
fdc0: cmd 3 failed at out byte 1 of 3
npx0: <math processor> on motherboard
npx0: INT 16 interface
orm0: <Option ROMs> at iomem 0xc9000-0xc9fff,0xc8000-0xc8fff,0xc0000-0xc7fff on isa0
pmtimer0 on isa0
fdc0: <Enhanced floppy controller (i82077, NE72065 or clone)> at port 0x3f7,0x3f0-0x3f5 irq 6 drq 2 on isa0
fdc0: FIFO enabled, 8 bytes threshold
fd0: <1440-KB 3.5" drive> on fdc0 drive 0
sc0: <System console> at flags 0x100 on isa0
sc0: VGA <16 virtual consoles, flags=0x300>
sio1: configured irq 3 not in bitmap of probed irqs 0
sio1: port may not be enabled
vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0
APIC_IO: Testing 8254 interrupt delivery
APIC_IO: Broken MP table detected: 8254 is not connected to IOAPIC #0 intpin 2
APIC_IO: routing 8254 via 8259 and IOAPIC #0 intpin 0
Timecounters tick every 10.000 msec
acpi_cpu: throttling enabled, 16 steps (100% to 6.2%), currently 100.0%
acd0: CDROM <MATSHITA CR-177> at ata1-master PIO4
Waiting 15 seconds for SCSI devices to settle
GEOM: create disk da0 dp=0xc7d90450
ses0 at ahc0 bus 0 target 6 lun 0
ses0: <SUPER GEM318 0> Fixed Processor SCSI-2 device
ses0: 3.300MB/s transfers
ses0: SAF-TE Compliant Device
da0 at ahc0 bus 0 target 0 lun 0
da0: <SEAGATE ST336607LC 0004> Fixed Direct Access SCSI-3 device
da0: 160.000MB/s transfers (80.000MHz, offset 63, 16bit), Tagged Queueing Enabled
da0: 35003MB (71687372 512 byte sectors: 255H 63S/T 4462C)
SMP: AP CPU #3 Launched!
SMP: AP CPU #1 Launched!
SMP: AP CPU #2 Launched!

Kris

Received on Thu Oct 02 2003 - 20:43:39 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:37:24 UTC