poul: sorry, was: Re: still: Re: gbde data corruption?

From: Heiko Schaefer <hschaefer_at_fto.de>
Date: Thu, 1 May 2003 11:07:48 +0200 (CEST)
Hi Poul,

sorry for all the fuss, you can now reattach your pulled-out hair.
and i need to find someone or something new to blame this issue on...

i have just reproduced the data-corruption on non-gbde filesystems (what
a way to start a day). so gbde is very very likely not to blame for any of
the stuff that i've been complaining about (apart from the slowness
while writing to disc, which will bugs me again after i've sorted out
the corruption :) i'll get back to you in a few weeks *g*).

now i have the questionable pleasure to rule out pieces of hardware as
suspects, i guess. damn. sometimes cheap pc hardware really sucks. even
though i don't think it's clear if this is a hardware or a software issue.

if anyone has suggestions on how to rule out specific causes or what
pieces of hard- and software are particularly suspect, i'd be glad to hear
about it. first of all i am planning to rule out the sis nic in the server
(onboard nics have always been suspect to me)

sorry again, poul for blaming this on you right away.

regards,

Heiko

> [corruption of data]
> > That is really strange, the problems I've seen until now have all
> > resulted in data coming back scrambled beyond recognition, and therefore
> > practically incompressible, this sounds like they're filled with identical
> > bytes or sectors of some kind.
> >
> > Can you try to run "cmp -l oldfile newfile" and study the output
> > for a bit ?  Any observations you can make will be helpful.
>
> the broken version of the file contains lots of 0-bytes (instead of high
> entropy values in the original file). seems by the output of cmp that
> every damaged value is replaced by 0.
>
> > >(potentially this could also be an nfs-issue, as i am copying onto the
> > >gbde partition via nfs from a 4.6-rc machine. but i can't really imagine
> > >that, never had anything like that in all of my non-gbde freebsd nfs
> > >experience. if it is an nfs issue, then it would probably be fbsd-5
> > >specific - is there any such known issue ?!)
> >
> > I doubt it is NFS, but it would be nice if you could verify the checksum
> > on both the client and server side, just to see that they agree.
>
> to clarify: i mount the (remote) gbde partition to a box which wishes to
> get rid of a lot of data - then i move stuff onto the gbde mount via nfs.
>
> the checking of the checksums was then done on the server (i.e. locally).
>
> > >the partition in question now looks like this:
> > >e: 117231392       16    4.2BSD     4096 16384    64  # (Cyl.    0*- 7297*)
> >
> > What does diskinfo(8) say about the encrypted (ad0e.bde) and unencrypted
> > (ad0e) devices (for some value of "ad0") ?
>
> zoidberg# diskinfo /dev/ad0s1e
> /dev/ad0s1e     512     29051207680     56740640        56290   16      63
> zoidberg# diskinfo /dev/ad0s1e.bde
> /dev/ad0s1e.bde 4096    28937551872     7064832
>
> another thing i just notice: /var/log/messages contains lots of
>
> [...]
> Apr 30 15:24:55 zoidberg kernel: ENOMEM 0xc4c62100 on 0xc45c6c80(ad2s1e.bde)
> Apr 30 15:25:19 zoidberg kernel: ENOMEM 0xc3fa5000 on 0xc45c6c80(ad2s1e.bde)
> Apr 30 15:25:57 zoidberg kernel: ENOMEM 0xc4b46100 on 0xc45c6c80(ad2s1e.bde)
> Apr 30 15:25:57 zoidberg kernel: ENOMEM 0xc4364500 on 0xc45c6c80(ad2s1e.bde)
> [...]
>
> i haven't yet checked the data on ad2s1e.bde, it might be partially
> corrupt or not.
>
> > >this time i inited gbde's sectorsize to "4096". last time i reported
> > >corruption, gbde's sectorsize was at its default (i presume 512). the
> > >corruption then 'felt' just the same. very sporadic - and somewhat
> > >non-deterministic from my point of view.
> >
> > The sectorsize is mainly a performance issue, it should not affect operation
>
> i feel that the issue i see is outside the realm of 'should' - so i try to
> give any information i can think of. even useless information :)
>
> also, i have the unpleasant feeling that i might be making some stupid
> mistake, and waste your time by looking entirely in the wrong direction.
>
> ...for all i know the hardware i use on the server-side (or the drivers
> for it ... for some reason the sis-based onboard nic comes to my mind,
> just now) could be subtly broken :/
>
> if you have no other things i could report or try, i might just throw away
> the gbde volumes and try the same copying with non-gbde partitions, just
> to be sure.
>
> regards,
>
> Heiko
Received on Thu May 01 2003 - 00:07:55 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:37:05 UTC