[Summary: I've now tested on a rpi3 in addition to a pine64+ 2GB. Both contexts show the problem.] On 2017-Mar-16, at 2:07 AM, Mark Millard <markmi at dsl-only.net> wrote: > On 2017-Mar-15, at 11:07 PM, Scott Bennett <bennett at sdf.org> wrote: > >> Mark Millard <markmi ta dsl-only.net> wrote: >> >>> [Something strange happened to the automatic CC: fill-in for my original >>> reply. Also I should have mentioned that for my test program if a >>> variant is made that does not fork the swapping works fine.] >>> >>> On 2017-Mar-15, at 9:37 AM, Mark Millard <markmi at dsl-only.net> wrote: >>> >>>> On 2017-Mar-15, at 6:15 AM, Scott Bennett <bennett at sdf.org> wrote: >>>> >>>>> On Tue, 14 Mar 2017 18:18:56 -0700 Mark Millard >>>>> <markmi at dsl-only.net> wrote: >>>>>> On 2017-Mar-14, at 4:44 PM, Bernd Walter <ticso_at_cicely7.cicely.de> wrote: >>>>>> >>>>>>> On Tue, Mar 14, 2017 at 03:28:53PM -0700, Mark Millard wrote: >>>>>>>> [test_check() between the fork and the wait/sleep prevents the >>>>>>>> failure from occurring. Even a small access to the memory at >>>>>>>> that stage prevents the failure. Details follow.] >>>>>>> >>>>>>> Maybe a stupid question, since you might have written it somewhere. >>>>>>> What medium do you swap to? >>>>>>> I've seen broken firmware on microSD cards doing silent data >>>>>>> corruption for some access patterns. >>>>>> >>>>>> The root filesystem is on a USB SSD on a powered hub. >>>>>> >>>>>> Only the kernel is from the microSD card. >>>>>> >>>>>> I have several examples of the USB SSD model and have >>>>>> never observed such problems in any other context. >>>>>> >>>>>> [remainder of irrelevant material deleted --SB] >>>>> >>>>> You gave a very long-winded non-answer to Bernd's question, so I'll >>>>> repeat it here. What medium do you swap to? >>>> >>>> My wording of: >>>> >>>> The root filesystem is on a USB SSD on a powered hub. >>>> >>>> was definitely poor. It should have explicitly mentioned the >>>> swap partition too: >>>> >>>> The root filesystem and swap partition are both on the same >>>> USB SSD on a powered hub. >>>> >>>> More detail from dmesg -a for usb: >>>> >>>> usbus0: 12Mbps Full Speed USB v1.0 >>>> usbus1: 480Mbps High Speed USB v2.0 >>>> usbus2: 12Mbps Full Speed USB v1.0 >>>> usbus3: 480Mbps High Speed USB v2.0 >>>> ugen0.1: <Generic OHCI root HUB> at usbus0 >>>> uhub0: <Generic OHCI root HUB, class 9/0, rev 1.00/1.00, addr 1> on usbus0 >>>> ugen1.1: <Allwinner EHCI root HUB> at usbus1 >>>> uhub1: <Allwinner EHCI root HUB, class 9/0, rev 2.00/1.00, addr 1> on usbus1 >>>> ugen2.1: <Generic OHCI root HUB> at usbus2 >>>> uhub2: <Generic OHCI root HUB, class 9/0, rev 1.00/1.00, addr 1> on usbus2 >>>> ugen3.1: <Allwinner EHCI root HUB> at usbus3 >>>> uhub3: <Allwinner EHCI root HUB, class 9/0, rev 2.00/1.00, addr 1> on usbus3 >>>> . . . >>>> uhub0: 1 port with 1 removable, self powered >>>> uhub2: 1 port with 1 removable, self powered >>>> uhub1: 1 port with 1 removable, self powered >>>> uhub3: 1 port with 1 removable, self powered >>>> ugen3.2: <GenesysLogic USB2.0 Hub> at usbus3 >>>> uhub4 on uhub3 >>>> uhub4: <GenesysLogic USB2.0 Hub, class 9/0, rev 2.00/90.20, addr 2> on usbus3 >>>> uhub4: MTT enabled >>>> uhub4: 4 ports with 4 removable, self powered >>>> ugen3.3: <OWC Envoy Pro mini> at usbus3 >>>> umass0 on uhub4 >>>> umass0: <OWC Envoy Pro mini, class 0/0, rev 2.10/1.00, addr 3> on usbus3 >>>> umass0: SCSI over Bulk-Only; quirks = 0x0100 >>>> umass0:0:0: Attached to scbus0 >>>> . . . >>>> da0 at umass-sim0 bus 0 scbus0 target 0 lun 0 >>>> da0: <OWC Envoy Pro mini 0> Fixed Direct Access SPC-4 SCSI device >>>> da0: Serial Number <REPLACED> >>>> da0: 40.000MB/s transfers >>>> >>>> (Edited a bit because there is other material interlaced, even >>>> internal to some lines. Also: I removed the serial number of the >>>> specific example device.) >> >> Thank you. That presents a much clearer picture. >>>> >>>>> I will further note that any kind of USB device cannot automatically >>>>> be trusted to behave properly. USB devices are notorious, for example, >>>>> >>>>> [reasons why deleted --SB] >>>>> >>>>> You should identify where you page/swap to and then try substituting >>>>> a different device for that function as a test to eliminate the possibility >>>>> of a bad storage device/controller. If the problem still occurs, that >>>>> means there still remains the possibility that another controller or its >>>>> firmware is defective instead. It could be a kernel bug, it is true, but >>>>> making sure there is no hardware or firmware error occurring is important, >>>>> and as I say, USB devices should always be considered suspect unless and >>>>> until proven innocent. >>>> >>>> [FYI: This is a ufs context, not a zfs one.] >> >> Right. It's only a Pi, after all. :-) > > It is a Pine64+ 2GB, not an rpi3. > >>>> >>>> I'm aware of such things. There is no evidence that has resulted in >>>> suggesting the USB devices that I can replace are a problem. Otherwise >>>> I'd not be going down this path. I only have access to the one arm64 >>>> device (a Pine64+ 2GB) so I've no ability to substitution-test what >>>> is on that board. >> >> There isn't even one open port on that hub that you could plug a >> flash drive into temporarily to be the paging device? > > Why do you think that I've never tried alternative devices? It > is just that the result was no evidence that my usually-in-use > SSD is having a special/local problem: the behavior continues > across all such contexts when the Pine64+ 2GB is involved. (Again > I have not had access to an alternate to the one arm64 board. > That limits my substitution testing possibilities.) > > Why would you expect a Flash drive to be better than another SSD > for such testing? (The SSD that I usually use even happens to be > a USB 3.0 SSD, capable of USB 3.0 speeds in USB 3.0 contexts. So > is the hub that I usually use for that matter.) FYI: I now have access to a rpi3 in addition to a pine64+ 2GB. I've tested on the rpi3 using a different USB hub and a different SSD: no hardware device in common with the recent Pine64+ 2GB tests (other than console cabling and what handles the serial console). The fork-then-swap-out-then-swap-in failure happens in the rpi3 context as well. Because the rpi3 has only 1 GiByte of RAM the stress commands that I used were more like: stress -m 1 --vm-bytes 1000M to get zero RES(ident memory) for the two processes from my test program after it forks while they are waiting/sleeping. >> You could then >> try your tests before returning to the normal configuration. If there >> isn't an open port, then how about plugging a second hub into one of >> the first hub's ports and moving the displaced device to the second >> hub? A flash drive could then be plugged in. That kind of configuration >> is obviously a bad idea for the long run, but just to try your tests it >> ought to work well enough. > > I have access to more SSDs that I can use than I do to Flash drives. I > see no reason to specifically use a Flash drive. > >> (BTW, if a USB storage device containing a >> paging area drops off=line even momentarily and the system needs to use >> it, that is the beginning of the end, even though it may take up to a few >> minutes for everything to lock up. > > The system does not lock up, even days or weeks later, with having done > dozens of experiments that show memory corruption failures over those > days. The only processes showing memory corruption so far are those > that were the parent or child for a fork that were later swapped out > to have zero RES(ident memory) and then even later swapped back in. > > The context has no such issues. You are inventing problems that do > not exist in my context. That is why none of my list submittals > mention such problems: they did not occur. > >> You probably won't be able to do an >> orderly shutdown, but will instead have to crash it with the power switch. >> In the case of something like a Pi, this is an unpleasant fact of life, >> to be sure.) > > Such things did not occur and has nothing to do with my context so far. > >> I think I buy your arguments, given the evidence you've collected >> thus far, including what you've added below. I just like to eliminate >> possibilities that are much simpler to deal with before facing nastinesses >> like bugs in the VM subsystem. :-) > > When I started this I found no evidence of device-specific problems. > My investigation activity goes back to long before my list submittals. > > And I repeat: Other people have reported the symptoms that started > this investigation. They did so before I ever started my activities. > They were using none of the specific devices that I have access to. > Likely the types of devices were frequently even different, such as > a rpi3 instead of a Pine64+ 2GB or a different USB drive. I was able > to get the symptoms that they reported. > >>>> It would be neat if some folks used my code to test other arm64 >>>> contexts and reported the results. I'd be very interested. >>>> (This is easier to do on devices that do not have massive >>>> amounts of RAM, which may limit the range of devices or >>>> device configurations that are reasonable to test.) >>>> >>>> There is that other people using other devices have reported >>>> the behavior that started this investigation. I can produce the >>>> behavior that they reported, although I've not seen anyone else >>>> listing specific steps that lead to the problem or ways to tell >>>> if the symptom is going to happen before it actually does. Nor >>>> have I seen any other core dump analysis. (I have bugzilla >>>> submittals 217138 and 217239 tied to symptoms others have >>>> reported as well as this test program material.) >>>> >>>> Also, considering that for my test program I can control which pages >>>> get the zeroed-problem by read-accessing even one byte of any 4K >>>> Byte page that I want to make work normally, doing so in the child >>>> process of the fork, between the fork and the sleep/swap-out, it does >>>> not suggest USB-device-specific behavior. The read-access is changing >>>> the status of the page in some way as far as I can tell. >>>> >>>> (Such read-accesses in the parent process make no difference to the >>>> behavior.) >>> >>> I should have noted another comparison/contrast between >>> having memory corruption and not in my context: >>> >>> I've tried variants of my test program that do not fork but >>> just sleep for 60s to allow me to force the swap-out. I >>> did this before adding fork and before using >>> parital_test_check, for example. I gradually added things >>> apparently involved in the reports others had made >>> until I found a combination that produced a memory >>> corruption test failure. >>> >>> These tests without fork involved find no problems with >>> the memory content after the swap-in. >>> >>> For my test program it appears that fork-before-swap-out >>> or the like is essential to having the problem occur. >>> >> A comment about terminology seems in order here. It bothers >> me considerably to see you writing "swap out" or "swapping" where >> it seems like you mean to write "page out" or "paging". A BSD >> system whose swapping mechanism gets activated has already waded >> very deeply into the quicksand and frequently cannot be gotten out >> in a reasonable amount of time even with manual assistance. It is >> often quicker to crash it, reboot, and wait for the fsck(8) cleanups >> to complete. Orderly shutdowns, even of the kind that results from >> a quick poke to the power button, typically get mired in the same >> mess that already has the system in knots. Also, BSD systems since >> 3.0BSD, unlike older AT&T (pre-SysVR2.3) systems, do not swap in, >> just out. A swapped out process, once the system determines that it >> has adequate resources again to attempt to run the process, will have >> the interrupted text page paged in and the rest will be paged in by >> the normal mechanism of page faults and page-in operations. I assume >> you must already know all this, which is a large part of why it grates >> on me that you appear to be using the wrong terms. > > You apparently did not read any of the material about how the test > is done or are unfamiliar with what "stress -m 1 --vm-bytes 1800M" > does when there is only 2GB of RAM. I am deliberately inducing > swapping in other processes, including the 2 from my test program > (after the fork), not just paging. (stress is a port, not part of > the base system.) > > When I say swap-out and swap-in I mean it. > > From the source code of my test program: > > sleep(60); > > // During this manually force this process to > // swap out. I use something like: > > // stress -m 1 --vm-bytes 1800M > > // in another shell and ^C'ing it after top > // shows the swapped status desired. 1800M > // just happened to work on the Pine64+ 2GB > // that I was using. I watch with top -PCwaopid . > > That type of stress run uses about 1.8 GiBytes after a bit, > which is enough to cause the swapping of other processes, > including the two that I am testing (post-fork). (Some RAM > is in use already before the stress run, which explains not > needing 2 GiBytes to be in use by stress.) > > Look at a "top -PCwaopid" display: there are columns for > RES(ident memory) and SWAP. I cause my 2 test processes to > show zero RES and everything under SWAP, starting sometime > during the 60s sleep/wait. > > Why would I cause swapping? Because buildworld causes such > swap-outs at times when there is only 2GBytes of RAM, > including processes that forked earlier, and as a result > the corrupted memory problems show up later in some processes > that were swapped out at the time. The build eventually > stops for process failures tied to the corruptions of memory > in the failing processes. (At least that is what my testing > strongly suggests.) > > But that is a very complicated context to use for analysis or > testing of the problem. My test program is vastly simpler > and easier/quicker to set up and test when used with stress > as well. Such was the kind of thing I was trying to find. > > I want the Pine64+ 2GB to work well enough to be able to have > buildworld (-j 4) complete correctly without having to restart > the build --even when everything has to be rebuilt. So I'm > trying to find and provide enough evidence to help someone fix > the problems that are observed to block such buildworld > activity. > > Again: others have reported such arm64 problems on the lists > before I ever got into this activity. The evidence is that > the issues are not a local property of my environment. > > Swapping is supposed to work. I can do buildworld (-j 4) > on armv6 (really -mcpu=cortex-a7 so armv7-a) and the > swapping it causes works fine. This is true for both a > bpim3 (2 GiBytes of RAM) and a rpi2 (1 GiByte of RAM > so even more swapping). On a powerpc64 with 16 GiBytes > I've built things that caused 26 GiBytes of swap to be > in use some of the time (during 4 ld's running in > parallel), with lots of processes having zero for > RES(ident memory) and all their space listed under SWAP > in a "top -PCwaopid" display. This too has no problems > with swapping of previously forked processes (or of any > other processes). > > For the likes of a Pine64+ 2GB to be "self hosted" > for source-code based updates, swapping of previously > forked processes must work and currently such > swapping is unreliable. === Mark Millard markmi at dsl-only.netReceived on Sat Mar 18 2017 - 12:26:58 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:10 UTC