Re: arm64 fork/swap data corruptions: A ~110 line C program demonstrating an example (Pine64+ 2GB context) [Corrected subject: arm64!]

From: Mark Millard <markmi_at_dsl-only.net> Date: Wed, 15 Mar 2017 11:51:49 -0700 · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:10 UTC

[Something strange happened to the automatic CC: fill-in for my original
reply. Also I should have mentioned that for my test program if a
variant is made that does not fork the swapping works fine.]

On 2017-Mar-15, at 9:37 AM, Mark Millard <markmi at dsl-only.net> wrote:

> On 2017-Mar-15, at 6:15 AM, Scott Bennett <bennett at sdf.org> wrote:
> 
>>    On Tue, 14 Mar 2017 18:18:56 -0700 Mark Millard
>> <markmi at dsl-only.net> wrote:
>>> On 2017-Mar-14, at 4:44 PM, Bernd Walter <ticso_at_cicely7.cicely.de> wrote:
>>> 
>>>> On Tue, Mar 14, 2017 at 03:28:53PM -0700, Mark Millard wrote:
>>>>> [test_check() between the fork and the wait/sleep prevents the
>>>>> failure from occurring. Even a small access to the memory at
>>>>> that stage prevents the failure. Details follow.]
>>>> 
>>>> Maybe a stupid question, since you might have written it somewhere.
>>>> What medium do you swap to?
>>>> I've seen broken firmware on microSD cards doing silent data
>>>> corruption for some access patterns.
>>> 
>>> The root filesystem is on a USB SSD on a powered hub.
>>> 
>>> Only the kernel is from the microSD card.
>>> 
>>> I have several examples of the USB SSD model and have
>>> never observed such problems in any other context.
>>> 
>>> [remainder of irrelevant material deleted  --SB]
>> 
>>    You gave a very long-winded non-answer to Bernd's question, so I'll
>> repeat it here.  What medium do you swap to?
> 
> My wording of:
> 
> The root filesystem is on a USB SSD on a powered hub.
> 
> was definitely poor. It should have explicitly mentioned the
> swap partition too:
> 
> The root filesystem and swap partition are both on the same
> USB SSD on a powered hub.
> 
> More detail from dmesg -a for usb:
> 
> usbus0: 12Mbps Full Speed USB v1.0
> usbus1: 480Mbps High Speed USB v2.0
> usbus2: 12Mbps Full Speed USB v1.0
> usbus3: 480Mbps High Speed USB v2.0
> ugen0.1: <Generic OHCI root HUB> at usbus0
> uhub0: <Generic OHCI root HUB, class 9/0, rev 1.00/1.00, addr 1> on usbus0
> ugen1.1: <Allwinner EHCI root HUB> at usbus1
> uhub1: <Allwinner EHCI root HUB, class 9/0, rev 2.00/1.00, addr 1> on usbus1
> ugen2.1: <Generic OHCI root HUB> at usbus2
> uhub2: <Generic OHCI root HUB, class 9/0, rev 1.00/1.00, addr 1> on usbus2
> ugen3.1: <Allwinner EHCI root HUB> at usbus3
> uhub3: <Allwinner EHCI root HUB, class 9/0, rev 2.00/1.00, addr 1> on usbus3
> . . .
> uhub0: 1 port with 1 removable, self powered
> uhub2: 1 port with 1 removable, self powered
> uhub1: 1 port with 1 removable, self powered
> uhub3: 1 port with 1 removable, self powered
> ugen3.2: <GenesysLogic USB2.0 Hub> at usbus3
> uhub4 on uhub3
> uhub4: <GenesysLogic USB2.0 Hub, class 9/0, rev 2.00/90.20, addr 2> on usbus3
> uhub4: MTT enabled
> uhub4: 4 ports with 4 removable, self powered
> ugen3.3: <OWC Envoy Pro mini> at usbus3
> umass0 on uhub4
> umass0: <OWC Envoy Pro mini, class 0/0, rev 2.10/1.00, addr 3> on usbus3
> umass0:  SCSI over Bulk-Only; quirks = 0x0100
> umass0:0:0: Attached to scbus0
> . . .
> da0 at umass-sim0 bus 0 scbus0 target 0 lun 0
> da0: <OWC Envoy Pro mini 0> Fixed Direct Access SPC-4 SCSI device
> da0: Serial Number <REPLACED>
> da0: 40.000MB/s transfers
> 
> (Edited a bit because there is other material interlaced, even
> internal to some lines. Also: I removed the serial number of the
> specific example device.)
> 
>>    I will further note that any kind of USB device cannot automatically
>> be trusted to behave properly.  USB devices are notorious, for example,
>> for momentarily dropping off-line and then immediately reconnecting.  (ZFS
>> reacts very poorly to such events, BTW.)  This misbehavior can be caused
>> by either processor involved, i.e., the one controlling either the
>> upstream or the downstream device.  Hubs are really bad about this, but
>> any USB device can be guilty.  You may have a defective storage device,
>> its controller may be defective, or any controller in the chain all the
>> way back to the motherboard may be defective or, not defective, but
>> corrupted by having been connected to another device with corrupted
>> (infected) firmware that tries to flash itself into the firmware flash
>> chips in its potential victim.
>>    Flash memory chips, spinning disks, or {S,}{D,}RAM chips can be
>> defective.  Without parity bits, the devices may return bad data and lie
>> about the presence of corrupted data.  That, for example, is where ZFS
>> is better than any kind of classical RAID because ZFS keeps checksums on
>> everything, so it has a reasonable chance of detecting corruption even
>> without parity support and, if there is any redundancy in the pool or the
>> data set, fixing the bad data machine.  Even having parity generally
>> allows only the detection of single-bit errors, but not of repairing them.
>>    You should identify where you page/swap to and then try substituting
>> a different device for that function as a test to eliminate the possibility
>> of a bad storage device/controller.  If the problem still occurs, that
>> means there still remains the possibility that another controller or its
>> firmware is defective instead.  It could be a kernel bug, it is true, but
>> making sure there is no hardware or firmware error occurring is important,
>> and as I say, USB devices should always be considered suspect unless and
>> until proven innocent.
> 
> [FYI: This is a ufs context, not a zfs one.]
> 
> I'm aware of such  things. There is no evidence that has resulted in
> suggesting the USB devices that I can replace are a problem. Otherwise
> I'd not be going down this path. I only have access to the one arm64
> device (a Pine64+ 2GB) so I've no ability to substitution-test what
> is on that board.
> 
> It would be neat if some folks used my code to test other arm64
> contexts and reported the results. I'd be very interested.
> (This is easier to do on devices that do not have massive
> amounts of RAM, which may limit the range of devices or
> device configurations that are reasonable to test.)
> 
> There is that other people using other devices have reported
> the behavior that started this investigation. I can produce the
> behavior that they reported, although I've not seen anyone else
> listing specific steps that lead to the problem or ways to tell
> if the symptom is going to happen before it actually does. Nor
> have I seen any other core dump analysis. (I have bugzilla
> submittals 217138 and 217239 tied to symptoms others have
> reported as well as this test program material.)
> 
> Also, considering that for my test program I can control which pages
> get the zeroed-problem by read-accessing even one byte of any 4K
> Byte page that I want to make work normally, doing so in the child
> process of the fork, between the fork and the sleep/swap-out, it does
> not suggest USB-device-specific behavior. The read-access is changing
> the status of the page in some way as far as I can tell.
> 
> (Such read-accesses in the parent process make no difference to the
> behavior.)

I should have noted another comparison/contrast between
having memory corruption and not in my context:

I've tried variants of my test program that do not fork but
just sleep for 60s to allow me to force the swap-out. I
did this before adding fork and before using
parital_test_check, for example. I gradually added things
apparently involved in the reports others had made
until I found a combination that produced a memory
corruption test failure.

These tests without fork involved find no problems with
the memory content after the swap-in.

For my test program it appears that fork-before-swap-out
or the like is essential to having the problem occur.

===
Mark Millard
markmi at dsl-only.net