A single Byte access to a 4K Byte aligned region between the fork and wait/sleep/swap-out prevents that specific 4K Byte region from having the (bad) zeros. Sounds like a page sized unit of behavior to me. Details follow. On 2017-Mar-14, at 3:28 PM, Mark Millard <markmi_at_dsl-only.net> wrote: > [test_check() between the fork and the wait/sleep prevents the > failure from occurring. Even a small access to the memory at > that stage prevents the failure. Details follow.] > > On 2017-Mar-14, at 11:07 AM, Mark Millard <markmi_at_dsl-only.net> wrote: > >> [This is just a correction to the subject-line text to say arm64 >> instead of amd64.] >> >> On 2017-Mar-14, at 12:58 AM, Mark Millard <markmi_at_dsl-only.net> wrote: >> >> [Another correction I'm afraid --about alternative program variations >> this time.] >> >> On 2017-Mar-13, at 11:52 PM, Mark Millard <markmi_at_dsl-only.net> wrote: >> >>> I'm still at a loss about how to figure out what stages are messed >>> up. (Memory coherency? Some memory not swapped out? Bad data swapped >>> out? Wrong data swapped in?) >>> >>> But at least I've found a much smaller/simpler example to demonstrate >>> some problem with in my Pine64+_ 2GB context. >>> >>> The Pine64+ 2GB is the only amd64 context that I have access to. >> >> Someday I'll learn to type arm64 the first time instead of amd64. >> >>> The following program fails its check for data >>> having its expected byte pattern in dynamically >>> allocated memory after a fork/swap-out/swap-in >>> sequence. >>> >>> I'll note that the program sleeps for 60s after >>> forking to give time to do something else to >>> cause the parent and child processes to swap >>> out (RES=0 as seen in top). >> >> The following about the extra test_check() was >> wrong. >> >>> Note the source code line: >>> >>> // test_check(); // Adding this line prevents failure. >>> >>> It seem that accessing the region contents before forking >>> and swapping avoids the problem. But there is a problem >>> if the region was only written-to before the fork/swap. > > There is a place that if a test_check call is put then the > problem does not happen at any stage: I tried putting a > call between the fork and the later wait/sleep code: I changed the byte sequence patterns to avoid zero values since the bad values are zeros: static value_type value(size_t v) { return (value_type)((v&0xFEu)|0x1u); } // value now avoids the zero value since the failures // are zeros. With that I can then test accurately what bytes have bad values vs. do not. I also changed to: void partial_test_check(void) { if (value(0u)!=gbl_region.array[0]) raise(SIGABRT); if (value(0u)!=(*dyn_region).array[0]) raise(SIGABRT); } since previously [0] had a zero value and so I'd used [1]. On this basis I'm now using the below. See the comments tied to partial_test_check() calls: extern void test_setup(void); // Sets up the memory byte patterns. extern void test_check(void); // Tests the memory byte patterns. extern void partial_test_check(void); // Tests just [0] of each region // (gbl_region and dyn_region). int main(void) { test_setup(); test_check(); // Before fork() [passes] pid_t pid = fork(); int wait_status = 0;; // After fork; before waitsleep/swap-out. if (0==pid) partial_test_check(); // Even the above is sufficient by // itself to prevent failure for // region_size 1u through // 4u*1024u! // But 4u*1024u+1u and above fail // with this access to memory. // The failing test is of // (*dyn_region).array[4096u]. // This test never fails here. if (0<pid) partial_test_check(); // This never prevents // later failures (and // never fails here). if (0<pid) { wait(&wait_status); } if (-1!=wait_status && 0<=pid) { if (0==pid) { sleep(60); // During this manually force this process to // swap out. I use something like: // stress -m 1 --vm-bytes 1800M // in another shell and ^C'ing it after top // shows the swapped status desired. 1800M // just happened to work on the Pine64+ 2GB // that I was using. I watch with top -PCwaopid . } test_check(); // After wait/sleep [fails for small-enough region_sizes] } } > This suggests to me that the small access is forcing one or more things to > be initialized for memory access that fork is not establishing of itself. > It appears that if established correctly then the swap-out/swap-in > sequence would work okay without needing the manual access to the memory. > > > So far via this test I've not seen any evidence of problems with the global > region but only the dynamically allocated region. > > However, the symptoms that started this investigation in a much more > complicated context had an area of global memory from a .so that ended > up being zero. > > I think that things should be fixed for this simpler context first and > that further investigation of the sh/su related should wait to see what > things are like after this test case works. === Mark Millard markmi at dsl-only.netReceived on Wed Mar 15 2017 - 03:33:12 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:10 UTC