Re: still: Re: gbde data corruption?

From: Heiko Schaefer <hschaefer_at_fto.de>
Date: Wed, 30 Apr 2003 15:29:28 +0200 (CEST)
Hi Poul,

[corruption of data]
> That is really strange, the problems I've seen until now have all
> resulted in data coming back scrambled beyond recognition, and therefore
> practically incompressible, this sounds like they're filled with identical
> bytes or sectors of some kind.
>
> Can you try to run "cmp -l oldfile newfile" and study the output
> for a bit ?  Any observations you can make will be helpful.

the broken version of the file contains lots of 0-bytes (instead of high
entropy values in the original file). seems by the output of cmp that
every damaged value is replaced by 0.

> >(potentially this could also be an nfs-issue, as i am copying onto the
> >gbde partition via nfs from a 4.6-rc machine. but i can't really imagine
> >that, never had anything like that in all of my non-gbde freebsd nfs
> >experience. if it is an nfs issue, then it would probably be fbsd-5
> >specific - is there any such known issue ?!)
>
> I doubt it is NFS, but it would be nice if you could verify the checksum
> on both the client and server side, just to see that they agree.

to clarify: i mount the (remote) gbde partition to a box which wishes to
get rid of a lot of data - then i move stuff onto the gbde mount via nfs.

the checking of the checksums was then done on the server (i.e. locally).

> >the partition in question now looks like this:
> >e: 117231392       16    4.2BSD     4096 16384    64  # (Cyl.    0*- 7297*)
>
> What does diskinfo(8) say about the encrypted (ad0e.bde) and unencrypted
> (ad0e) devices (for some value of "ad0") ?

zoidberg# diskinfo /dev/ad0s1e
/dev/ad0s1e     512     29051207680     56740640        56290   16      63
zoidberg# diskinfo /dev/ad0s1e.bde
/dev/ad0s1e.bde 4096    28937551872     7064832

another thing i just notice: /var/log/messages contains lots of

[...]
Apr 30 15:24:55 zoidberg kernel: ENOMEM 0xc4c62100 on 0xc45c6c80(ad2s1e.bde)
Apr 30 15:25:19 zoidberg kernel: ENOMEM 0xc3fa5000 on 0xc45c6c80(ad2s1e.bde)
Apr 30 15:25:57 zoidberg kernel: ENOMEM 0xc4b46100 on 0xc45c6c80(ad2s1e.bde)
Apr 30 15:25:57 zoidberg kernel: ENOMEM 0xc4364500 on 0xc45c6c80(ad2s1e.bde)
[...]

i haven't yet checked the data on ad2s1e.bde, it might be partially
corrupt or not.

> >this time i inited gbde's sectorsize to "4096". last time i reported
> >corruption, gbde's sectorsize was at its default (i presume 512). the
> >corruption then 'felt' just the same. very sporadic - and somewhat
> >non-deterministic from my point of view.
>
> The sectorsize is mainly a performance issue, it should not affect operation

i feel that the issue i see is outside the realm of 'should' - so i try to
give any information i can think of. even useless information :)

also, i have the unpleasant feeling that i might be making some stupid
mistake, and waste your time by looking entirely in the wrong direction.

...for all i know the hardware i use on the server-side (or the drivers
for it ... for some reason the sis-based onboard nic comes to my mind,
just now) could be subtly broken :/

if you have no other things i could report or try, i might just throw away
the gbde volumes and try the same copying with non-gbde partitions, just
to be sure.

regards,

Heiko
Received on Wed Apr 30 2003 - 04:29:32 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:37:05 UTC