Running 8.0-RC1 on an amd64 workstation, I have run into what appears to be a memory corruption issue when doing (UFS2) filesystem I/O on an attached SCSI disk when more than 4GB of RAM is installed, or when 4GB is installed and "memory hole remapping" is enabled in the BIOS. The memory modules all pass memtest86+ (e820 map) individually and in combination. A linux rescue disc runs fine, and the memtester program included thereon doesn't complain. I can scan the disk (using dd) from that rescue disc without errors. SCSI verify commands run from the SCSI card complete successfully. The disk reports no grown defects and no SMART failures. I tried two different U320 cables/terminators and two (consumer-class) motherboards. Underclocking the RAM doesn't resolve the issue. The disk works fine with 2GB (1x2) or 3GB (3x1) of RAM installed, and with 8GB (4x2) installed with hw.physmem="2GB" set in the loader (tested that case with remapping off). "Works" means the following command completes successfully with root, /usr, and /var mounted r/o: # find / -type f -exec md5 -q {} \; > /dev/null When it doesn't work, the following cases tend to occur: 1. (can't run command): "ROOT MOUNT ERROR" during boot (following "GEOM: da0s1: invalid disklabel."). 2. (can't run command): Hang during "Trying to mount root" (not following a disklabel error). 3. (booting from livefs cd, can't run command, invalid disklabel error). 4. (command is running, both from livefs cd and after a successful da0 root mount): g_vfs_done() kernel error messages with high-magnitude positive and negative offset values. Input/output errors and invalid file descriptor errors on specific files. Eventual panic (most recent was something equivalent to "GPF while in kernel mode; trap 9 while interrupts disabled"). For cases 3 and 4, I checked the first 16k of the slice with the following command: # dd if=/dev/da0s1 count=32 | md5 -q The same digest was produced for (3), (4), and in cases which worked without error (2GB and 4GB w/remapping off, running from livefs cd). In some of the failing cases, "camcontrol inquiry" would intermittently return an empty result, e.g. (retyped): # camcontrol inquiry da0 pass0: < > Fixed Direct Access SCSI-0 device pass0: Serial Number (can't remember 3rd output line) (Intermittently as in repeating the command during the same session could yield the proper result after a number of attempts). The SCSI card is an LSI20160 (sym(4), PCI U160). The SCSI disk is a Seagate ST373455LW (U320). The current motherboard is an ASUS M3A76-CM (AMI BIOS, AM2+ socket). I have a boot -v dmesg (34kb) available from a livefs cd boot with 8GB installed and memory hole remapping turned on (case 4 result). The source used to build the CD was cvsup'ed a week or two ago. Hardware common to failing cases: 1. Power supply. 2. CPU itself. 3. SCSI card. 4. SCSI disk. 5. RAM. The system appears to be stable when not using the SCSI disk. I would appreciate any suggestions anyone might have, or confirmation that someone has the same type of setup working under 8.0-RC1. I haven't yet tried the setup under 7-stable-amd64. MikeReceived on Tue Oct 20 2009 - 22:19:58 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:57 UTC