Re: ldd leaves the machine unresponsive

From: Garrett Cooper <yanefbsd_at_gmail.com>
Date: Thu, 18 Mar 2010 11:59:34 -0700
On Thu, Mar 18, 2010 at 8:51 AM, Anton Shterenlikht <mexas_at_bristol.ac.uk> wrote:
> On Thu, Mar 18, 2010 at 11:29:36AM -0400, jhell wrote:
>> -----BEGIN PGP SIGNED MESSAGE-----
>> Hash: SHA1
>>
>>
>>
>> On Wed, 17 Mar 2010 12:32, Anton Shterenlikht wrote:
>> In Message-Id: <20100317163230.GJ87732_at_mech-cluster241.men.bris.ac.uk>
>>
>> > Just updated to ia64 r205248
>> >
>> > If my problem is due to my mis-configuration,
>> > I apologise in advance.
>> >
>> > I run this shell script after each upgrade
>> > and 'make delete-old-libs' to check
>> > if any shared objects need to be rebuilt:
>> >
>> > <start script>
>> >
>> > #!/bin/sh
>> >
>> > for file in `find /bin /sbin /usr/bin /usr/sbin /usr/lib /usr/libexec /usr/local -name "*"`
>> > do
>> >        echo $file
>> >        ldd $file >> /root/ldd_results 2> /dev/zero
>> > done
>> >
>> > <end script>
>> >
>>
>> This will probably do closer to what you actually would want to look for.
>>
>> Writing to /dev/zero ... I don't know never tried it since /dev/null is
>> usually the standard place to throw trash.
>>
>> #!/bin/sh
>> for file in `find /*bin /usr/*bin /usr/lib* /usr/local/*bin -type f` do
>>       echo $file
>>       ldd $file >>/root/ldd_results 2>/dev/null
>> done
>>
>> The problem with your script is that it finds most files that it can not
>> or is not useful to run ldd on and leaves you junk in return.
>>
>> It might be more useful if you searched for dynamically linked ELF
>> binaries to run ldd against like the following.
>>
>> === Script starts here ===
>> #!/bin/sh
>>
>> SEARCHPATH="/*bin /usr/*bin /usr/lib* /usr/local/*bin"
>>
>> trap 'exit 1' 2
>>
>> check_libs() {
>> for spath in $SEARCHPATH; do
>>          for ifelf in `find $spath -type f`; do
>>                  ldd `file $ifelf | grep dynamically | cut -f1 -d:`
>>          done
>> done
>> }
>>
>> check_libs 2>/dev/null
>> === Script ends here ===
>>
>> The above will find all type ELF * that are dynamically linked within the
>> SEARCHPATH variable and run ldd on them and print the results to stdout.
>>
>> Obviously since you are going to have thousands of files being questioned,
>> stdout is not going to be useful.
>>
>> So with the about stated:
>> save the script to: checklibs.sh
>> run with: "sh checklibs.sh >/root/checklibs_output"
>> or: "script /root/checklibs_output checklibs.sh"
>>
>> > After the upgrade to r205248, the script
>> > freezes at seemingly random points.
>> >
>>
>> Unneeded disk usage & execution.
>>
>> > I can still ssh to the machine (using keys), i.e.
>> > I see the welcome message, but cannot get to the console prompt.
>>
>> Of course... to many open files or processes in wait. SSH already has the
>> information it needs loaded into memory, that's why you can get sort-of-in
>>
>> ZFS file-system perhaps ?
>>
>> >
>> > On the serial console I cannot get the prompt
>> > after entering the root password.
>> >
>>
>> See above.
>>
>> > I have top(1) running interactively in another window.
>> > The sh process is in "getblk" state, and ignores kill -9.
>> > But there's no ldd process.
>> >
>> > And shutdown requests are also ignored:
>> >
>> > # shutdown -r now
>> > Shutdown NOW!
>> > shutdown: [pid 8019]
>> > #
>> > and nothing happens after that
>> >
>> > So I have to do a cold reset via MP.
>> >
>> > On ia64 r204322, this script causes no problems.
>> >
>> > Please advise
>> >
>>
>> The above edited script should help to limit disk usage and too many open
>> processes that causes your machine to bog down like that. This script does
>> have its limitations and there is one bug in it... Ill let you figure out
>> how to get rid of that bug but it really does not effect the intended
>> output so I left it alone and sent error output to fd/2.
>>
>> The limitations you'll find is how many files that ldd(1) or file(1) can
>> handle at one time. But if you specify specific paths like already in
>> SEARCHPATH then you will most likely never see this unless the files in
>> /*bin grow to be over max number of files that file(1) or ldd(1) can
>> handle at one time. Shortly said... use direct paths or short globs like
>> above.
>>
>> > many thanks
>> > anton
>> >
>>
>> A final note you might want to just install sysutils/libchk and run that.
>>
>> Standard Disclaimer: NONE OF THIS CONTAINED HEREIN "THIS MESSAGE" EXCUSES
>> ANY OF THE UNEXPLAINED DISK LOCKING THAT IS GOING ON AND THE INFORMATION
>> FOR WHICH IT MAY CONTAIN BECOMING UNAVAILABLE AT ANY POINT IN TIME DURING
>> THE ORIGINAL RUN OF THE FIRST SCRIPT OR THE SECOND SCRIPT THAT WAS POSTED
>> EITHER AS A ATTACHMENT OR IN-LINE.
>>
>> ;) JK!
>>
>> Good Luck.
>
> many thanks, this is very helpful
>
> I don't seem to have this lockup anymore.
> Don't know what was happening. I've run
> it now several times on 3 different ia64
> current (different revisions) boxes, with
> disks of different speed, and can't reproduce.
> My script was very crude, of course.
> I'll try sysutils/libchk

FWIW I've been seeing some performance issues with iir(4) and mfi(4)
backed UFS2 with softupdate filesystems on my new machine with some
other drivers loaded on my system [a PCI based em(4) card and
nvidia-driver enabled card -- which uses GIANT locking still].

Machine is Core i7 on an ASUS W6T Professional MB, 12GB RAM, with
debug symbols, ddb, kgdb, anti-reslock contention manager, (no
witness) etc.

I don't have much other than that to provide at this time, but it
might help to see if and when there's an overlap in the drivers noted
here.

Thanks,
-Garrett
Received on Thu Mar 18 2010 - 17:59:35 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:01 UTC