Re: ldd leaves the machine unresponsive

From: Anton Shterenlikht <mexas_at_bristol.ac.uk>
Date: Thu, 18 Mar 2010 15:51:13 +0000
On Thu, Mar 18, 2010 at 11:29:36AM -0400, jhell wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> 
> 
> On Wed, 17 Mar 2010 12:32, Anton Shterenlikht wrote:
> In Message-Id: <20100317163230.GJ87732_at_mech-cluster241.men.bris.ac.uk>
> 
> > Just updated to ia64 r205248
> >
> > If my problem is due to my mis-configuration,
> > I apologise in advance.
> >
> > I run this shell script after each upgrade
> > and 'make delete-old-libs' to check
> > if any shared objects need to be rebuilt:
> >
> > <start script>
> >
> > #!/bin/sh
> >
> > for file in `find /bin /sbin /usr/bin /usr/sbin /usr/lib /usr/libexec /usr/local -name "*"`
> > do
> >        echo $file
> >        ldd $file >> /root/ldd_results 2> /dev/zero
> > done
> >
> > <end script>
> >
> 
> This will probably do closer to what you actually would want to look for.
> 
> Writing to /dev/zero ... I don't know never tried it since /dev/null is 
> usually the standard place to throw trash.
> 
> #!/bin/sh
> for file in `find /*bin /usr/*bin /usr/lib* /usr/local/*bin -type f` do
>  	echo $file
>  	ldd $file >>/root/ldd_results 2>/dev/null
> done
> 
> The problem with your script is that it finds most files that it can not 
> or is not useful to run ldd on and leaves you junk in return.
> 
> It might be more useful if you searched for dynamically linked ELF 
> binaries to run ldd against like the following.
> 
> === Script starts here ===
> #!/bin/sh
> 
> SEARCHPATH="/*bin /usr/*bin /usr/lib* /usr/local/*bin"
> 
> trap 'exit 1' 2
> 
> check_libs() {
> for spath in $SEARCHPATH; do
>          for ifelf in `find $spath -type f`; do
>                  ldd `file $ifelf | grep dynamically | cut -f1 -d:`
>          done
> done
> }
> 
> check_libs 2>/dev/null
> === Script ends here ===
> 
> The above will find all type ELF * that are dynamically linked within the 
> SEARCHPATH variable and run ldd on them and print the results to stdout.
> 
> Obviously since you are going to have thousands of files being questioned, 
> stdout is not going to be useful.
> 
> So with the about stated:
> save the script to: checklibs.sh
> run with: "sh checklibs.sh >/root/checklibs_output"
> or: "script /root/checklibs_output checklibs.sh"
> 
> > After the upgrade to r205248, the script
> > freezes at seemingly random points.
> >
> 
> Unneeded disk usage & execution.
> 
> > I can still ssh to the machine (using keys), i.e.
> > I see the welcome message, but cannot get to the console prompt.
> 
> Of course... to many open files or processes in wait. SSH already has the 
> information it needs loaded into memory, that's why you can get sort-of-in
> 
> ZFS file-system perhaps ?
> 
> >
> > On the serial console I cannot get the prompt
> > after entering the root password.
> >
> 
> See above.
> 
> > I have top(1) running interactively in another window.
> > The sh process is in "getblk" state, and ignores kill -9.
> > But there's no ldd process.
> >
> > And shutdown requests are also ignored:
> >
> > # shutdown -r now
> > Shutdown NOW!
> > shutdown: [pid 8019]
> > #
> > and nothing happens after that
> >
> > So I have to do a cold reset via MP.
> >
> > On ia64 r204322, this script causes no problems.
> >
> > Please advise
> >
> 
> The above edited script should help to limit disk usage and too many open 
> processes that causes your machine to bog down like that. This script does 
> have its limitations and there is one bug in it... Ill let you figure out 
> how to get rid of that bug but it really does not effect the intended 
> output so I left it alone and sent error output to fd/2.
> 
> The limitations you'll find is how many files that ldd(1) or file(1) can 
> handle at one time. But if you specify specific paths like already in 
> SEARCHPATH then you will most likely never see this unless the files in 
> /*bin grow to be over max number of files that file(1) or ldd(1) can 
> handle at one time. Shortly said... use direct paths or short globs like 
> above.
> 
> > many thanks
> > anton
> >
> 
> A final note you might want to just install sysutils/libchk and run that.
> 
> Standard Disclaimer: NONE OF THIS CONTAINED HEREIN "THIS MESSAGE" EXCUSES 
> ANY OF THE UNEXPLAINED DISK LOCKING THAT IS GOING ON AND THE INFORMATION 
> FOR WHICH IT MAY CONTAIN BECOMING UNAVAILABLE AT ANY POINT IN TIME DURING 
> THE ORIGINAL RUN OF THE FIRST SCRIPT OR THE SECOND SCRIPT THAT WAS POSTED 
> EITHER AS A ATTACHMENT OR IN-LINE.
> 
> ;) JK!
> 
> Good Luck.

many thanks, this is very helpful

I don't seem to have this lockup anymore.
Don't know what was happening. I've run
it now several times on 3 different ia64
current (different revisions) boxes, with
disks of different speed, and can't reproduce.
My script was very crude, of course.
I'll try sysutils/libchk

thanks again
anton

-- 
Anton Shterenlikht
Room 2.6, Queen's Building
Mech Eng Dept
Bristol University
University Walk, Bristol BS8 1TR, UK
Tel: +44 (0)117 331 5944
Fax: +44 (0)117 929 4423
Received on Thu Mar 18 2010 - 14:51:17 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:01 UTC