Has anyone else seen any form of in memory or on disk corruption?

From: <gnn_at_freebsd.org>
Date: Fri, 04 Jul 2008 12:58:07 -0400
Hi,

I've been working on the following brain teasing (breaking?) problem
for about a week now.  What I'm seeing is that on large memory
machines, those with more than 4G of RAM, the ungzipping/untarring of
files fails due to gzip thinking the file is corrupt.  The way to
reproduce this is:

1) Create a bunch of gzip/tar balls in the 1-20MB range.
2) Reboot FreeBSD 7.0 release
3) Run gzip -t over all the files.

I have hundreds of these files to run this over, and a full check
takes about 3 hours, but I usually see some form of corruption within
the first 20 minutes.

Other important factors:

1) This is on very modern, 2P/4Core (8 cores total) hardware
2) The disks are 1TB SATA set up in JBOD.
3) The machines have 16G of RAM.
4) Corruption is seen only after a reboot, if the machines continue to
run corruption is never seen again, until another reboot.
5) The systems are all Xeon running amd64
6) The disk controller is an AMCC 9650, but we do see this very rarely
with the on board controlller.
7) All boards are 

http://www.supermicro.com/products/motherboard/Xeon1333/5400/X7DWU.cfm

8) All machines have 3 1TB drives.
9) The corruption is in 4K chunks.  That is N x 4K.
10) Files are not normally corrupted on disk, but this can happen.

I have already tried a few of the obvious things, such as making sure
that we sync pages before we shutdown the twa driver.

Given what I have seen I believe this is something that happens from
startup, and not at shutdown.

Thoughts?

Best,
George
Received on Fri Jul 04 2008 - 15:23:06 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:32 UTC