Re: [CURRENT]: weird memory/linker problem?

From: Willem Jan Withagen <wjw_at_digiware.nl>
Date: Tue, 01 Jul 2014 17:23:14 +0200
On 2014-07-01 16:48, Rang, Anton wrote:
> DOT => DOD
>
> 444F54 => 444F44
>
> That's a single-bit flip.  Bad memory, perhaps?

Very likely, especially if the system does not have ECC....
It just happens on rare occasions that a alpha particle, power cycle, or 
any things else disruptive damages a memory cell. And it could be that 
it requires a special pattern of accesses to actually exhibit the error.

In the past (199x's) 'make buildworld' used to be a rather good memory 
tester. But nowadays look at
	http://www.memtest.org/

This tool has found all of the bad memory in all the systems I used and 
or build for others...
Note that it might take a few runs and some more heat to actually 
trigger the faulty cell, but memtest86 will usually find it.

Note that on big systems with lots of memory it can take a loooooong 
time to run just one full testset to completion.

--WjW


>
> Anton
>
> -----Original Message-----
> From: owner-freebsd-current_at_freebsd.org [mailto:owner-freebsd-current_at_freebsd.org] On Behalf Of O. Hartmann
> Sent: Tuesday, July 01, 2014 8:08 AM
> To: Dimitry Andric
> Cc: Adrian Chadd; FreeBSD CURRENT
> Subject: Re: [CURRENT]: weird memory/linker problem?
>
> Am Mon, 23 Jun 2014 17:22:25 +0200
> Dimitry Andric <dim_at_FreeBSD.org> schrieb:
>
>> On 23 Jun 2014, at 16:31, O. Hartmann <ohartman_at_zedat.fu-berlin.de> wrote:
>>> Am Sun, 22 Jun 2014 10:10:04 -0700
>>> Adrian Chadd <adrian_at_freebsd.org> schrieb:
>>>> When they segfault, where do they segfault?
>> ...
>>> GIMP, LaTeX work, nothing special, but a bit memory consuming
>>> regrading GIMP) I tried updating the ports tree and surprisingly the
>>> tree is left over in a unclean condition while /usr/bin/svn segfault
>>> (on console: pid 18013 (svn), uid 0: exited on signal 11 (core dumped)).
>>>
>>> Using /usr/local/bin/svn, which is from the devel/subversion port,
>>> performs well, while FreeBSD 11's svn contribution dies as described. It did not hours ago!
>>
>> I think what Adrian meant was: can you run svn (or another crashing
>> program) in gdb, and post a backtrace?  Or maybe run ktrace, and see
>> where it dies?
>>
>> Alternatively, put a core dump and the executable (with debug info) in
>> a tarball, and upload it somewhere, so somebody else can analyze it.
>>
>> -Dimitry
>>
>
> It's me again, with the same weird story.
>
> After a couple of days silence, the mysterious entity in my computer is back. This time it is again a weird compiler message of failure (trying to buildworld):
>
> [...]
> c++  -O2 -pipe -O3 -O3
> c++ -I/usr/src/lib/clang/libllvmsupport/../../../contrib/llvm/include
> -I/usr/src/lib/clang/libllvmsupport/../../../contrib/llvm/tools/clang/include
> -I/usr/src/lib/clang/libllvmsupport/../../../contrib/llvm/lib/Support -I.
> -I/usr/src/lib/clang/libllvmsupport/../../../contrib/llvm/../../lib/clang/include
> -DLLVM_ON_UNIX -DLLVM_ON_FREEBSD -D__STDC_LIMIT_MACROS -D__STDC_CONSTANT_MACROS -fno-strict-aliasing -DLLVM_DEFAULT_TARGET_TRIPLE=\"x86_64-unknown-freebsd11.0\"
> -DLLVM_HOST_TRIPLE=\"x86_64-unknown-freebsd11.0\" -DDEFAULT_SYSROOT=\"\"
> -Qunused-arguments -I/usr/obj/usr/src/tmp/legacy/usr/include -std=c++11 -fno-exceptions -fno-rtti -Wno-c++11-extensions -c /usr/src/lib/clang/libllvmsupport/../../../contrib/llvm/lib/Support/Host.cpp -o Host.o
> --- GraphWriter.o --- In file included
> from /usr/src/lib/clang/libllvmsupport/../../../contrib/llvm/lib/Support/GraphWriter.cpp:14: /usr/src/lib/clang/libllvmsupport/../../../contrib/llvm/include/llvm/Support/GraphWriter.h:269:10:
> error: use of undeclared identifier 'DOD'; did you mean 'DOT'? O << DOD::EscapeString(Label); ^~~ DOT /usr/src/lib/clang/libllvmsupport/../../../contrib/llvm/include/llvm/Support/GraphWriter.h:35:11:
> note: 'DOT' declared here namespace DOT {  // Private functions... ^ 1 error generated.
> *** [GraphWriter.o] Error code 1
>
>
> Well, in the past I saw many of those messages, especially not found labels of routines in shared objects/libraries or even those "funny" misspelled messages shown above.
>
> I can not reproduce them after a reboot, but as long as the system is running with this error occured, it is sticky. So in order to compile the OS successfully, I reboot.
>
> Does anyone have an idea what this could be? Since it affects at the moment only one machine (the other CoreDuo has been retired in the meanwhile), it feels a bit like a miscompilation on a certain type of CPU.
>
> Thanks for your patience,
>
> Oliver
> _______________________________________________
> freebsd-current_at_freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to "freebsd-current-unsubscribe_at_freebsd.org"
>
Received on Tue Jul 01 2014 - 13:23:26 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:50 UTC