Hey Terry, > > > walt wrote: > > > > Do I recall from some months ago that this bug would not > > > > affect machines with less than a gig of RAM? > > > > > > The amount of memory at which you see it depends on the processor > > > features. Now that autotuning is in, there's a stair-step for > > > how much the system uses for each resource pool, based on how > > > much RAM is in the system. It's quite unpredictable where it will > > > show up in -current, because of this (and the new memory allocator). > > > > > > Basically, the problem will show wherever the memory size vs. > > > memory utilization tickles it (that's why upping maxfiles was > > > enough to scare it off, before the tuning/allocator changes > > > went in). > > - i still have an issue with the system because of which i started this > > thread: > > > > originally, i bought a 512mb ddr ram for it (not the cheapest kind, but > > also nothing fancy - the chips say infineon). with that ram i still > > experience data corruption. > > > > while i reported that the problem disappeared, i was running of a sdr pc > > 133 ram which is only 256mb. > > > > what i wonder now: is the physical 512mb ram possibly damaged (or not > > interacting well with the board or bios), or could that yet again be a > > general (software-solvable) issue (which i would likely experience > > whenever i have 512mb of ram in that machine. regardless of make) ? > > It's possible that the RAM was damaged, but unlikely. > > If you revert to a DP2 kernel (or any kernel before Jeff's > allocator changes AND Matt's autotuning changes), you should > be able to trigger this problem fairly easily with anything > that causes a lot of page thrashing right after system boot, > as long as you pick the right amount of RAM to install for > the CPU features of the CPU you are using. > > > if the problem is likely to go away with another 512mb ram, i will go to > > get the ram changed on monday - otherwise, i'd like to spare myself and > > the vendor the trouble :) ... especially myself *g* > > It might. It might not. When I first saw the problem, it > didn't occur on 512M, and it didn't occur on 2G, but it did > occur on 1G. This was a SuperMicro running a PIII. The > behaviour's going to be different for different CPU features, > unfortunately. i'm sorry, my mail was probably a bit confusing. since it has been pointed out to me, i am running -current kernels with options DISABLE_PSE options DISABLE_PG_G enabled. what i am asking myself: is there any chance that i still get any data corruption because of the issues that you write about in some configuration ?! because with the 512mb (ddr) ram (which might or might not be defective) i get data corruption, while with another 256mb (sdr) ram, i apparently don't. so far i had the impression that my test (copying >30gb of checksummed data between disks) shows these problems rather reliably. > Alternately, disable auto-tuning by setting MAXUSERS to some > value (preferrably equal to or larger than the pre-auto-tune > value), and then set maxfiles to 50000 or more. This should > also mask the problem (though I don't know this for sure, > given Jeff's allocator changes not preallocating the page > maps for things which used to be allocated via zalloci()). masking sounds scary to me - i don't really want to make the problem less likely by, say 1 : 10^3 or so :) i would much rather not have any data corrupted at all. > > does it make sense for me to try bosko's patch ? > > Yes. It fixes the problem, according to his testing. He > posted the URL for it a while back, or you can contact him > directly. ok, i'll find it - what i wanted to ask is, if that patch is likely to make _more_ problems go away than those two kernel options. > > can i hope for any better results (i don't really care about > > performance, only data integrity) with it than with those > > two kernel options ?! > > Yes, if that's the source of your problems. As you pointed > out, there's a small but finite chance it's bad RAM, or a > problem with the motherboard, etc.. The way to find out is > to try the offending RAM again, with a kernel with those > options, and see if it happens (this assumes that you were > able to trigger it fairly reliably before; negative evidence > is really only anecdotal, without a regression test case, so > if it only happened one in a great while, it not happening in > a week or a month would prove nothing). i guess i can manage to get another 256mb sdr ram into that box temporarily by next week, if nothing better comes up - just to check. thanks, regards, Heiko -- Free Software. Why put up with inferior code and antisocial corporations? http://www.gnu.org/philosophy/why-free.htmlReceived on Sat May 10 2003 - 09:44:28 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:37:07 UTC