Re: ldd leaves the machine unresponsive

From: Anton Shterenlikht <mexas_at_bristol.ac.uk>
Date: Fri, 19 Mar 2010 21:15:35 +0000
On Thu, Mar 18, 2010 at 11:29:36AM -0400, jhell wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> 
> 
> On Wed, 17 Mar 2010 12:32, Anton Shterenlikht wrote:
> In Message-Id: <20100317163230.GJ87732_at_mech-cluster241.men.bris.ac.uk>
> 
> > Just updated to ia64 r205248
> >
> > If my problem is due to my mis-configuration,
> > I apologise in advance.
> >
> > I run this shell script after each upgrade
> > and 'make delete-old-libs' to check
> > if any shared objects need to be rebuilt:
> >
> > <start script>
> >
> > #!/bin/sh
> >
> > for file in `find /bin /sbin /usr/bin /usr/sbin /usr/lib /usr/libexec /usr/local -name "*"`
> > do
> >        echo $file
> >        ldd $file >> /root/ldd_results 2> /dev/zero
> > done
> >
> > <end script>
> >
> 
> This will probably do closer to what you actually would want to look for.
> 
> Writing to /dev/zero ... I don't know never tried it since /dev/null is 
> usually the standard place to throw trash.
> 
> #!/bin/sh
> for file in `find /*bin /usr/*bin /usr/lib* /usr/local/*bin -type f` do
>  	echo $file
>  	ldd $file >>/root/ldd_results 2>/dev/null
> done
> 
> The problem with your script is that it finds most files that it can not 
> or is not useful to run ldd on and leaves you junk in return.
> 
> It might be more useful if you searched for dynamically linked ELF 
> binaries to run ldd against like the following.
> 
> === Script starts here ===
> #!/bin/sh
> 
> SEARCHPATH="/*bin /usr/*bin /usr/lib* /usr/local/*bin"
> 
> trap 'exit 1' 2
> 
> check_libs() {
> for spath in $SEARCHPATH; do
>          for ifelf in `find $spath -type f`; do
>                  ldd `file $ifelf | grep dynamically | cut -f1 -d:`
>          done
> done
> }
> 
> check_libs 2>/dev/null
> === Script ends here ===
> 
> The above will find all type ELF * that are dynamically linked within the 
> SEARCHPATH variable and run ldd on them and print the results to stdout.
> 
> Obviously since you are going to have thousands of files being questioned, 
> stdout is not going to be useful.
> 
> So with the about stated:
> save the script to: checklibs.sh
> run with: "sh checklibs.sh >/root/checklibs_output"
> or: "script /root/checklibs_output checklibs.sh"
> 
> > After the upgrade to r205248, the script
> > freezes at seemingly random points.
> >
> 
> Unneeded disk usage & execution.
> 
> > I can still ssh to the machine (using keys), i.e.
> > I see the welcome message, but cannot get to the console prompt.
> 
> Of course... to many open files or processes in wait. SSH already has the 
> information it needs loaded into memory, that's why you can get sort-of-in
> 
> ZFS file-system perhaps ?

I've no ZFS.

I'm seeing very similar behaviour now with csup:

( I do csup -L2 /root/ports-supfile, where

# cat /root/ports-supfile
*default host=cvsup.uk.FreeBSD.org
*default base=/var/db
*default prefix=/usr
*default release=cvs tag=. delete use-rel-suffix compress

ports-all
# )

top(1) shows:

last pid:  1160;  load averages:  0.00,  0.06,  0.07                                                                           up 0+00:10:53  15:05:52
81 processes:  3 running, 61 sleeping, 17 waiting
CPU 0:  0.0% user,  0.0% nice,  0.2% system,  0.0% interrupt, 99.8% idle
CPU 1:  0.0% user,  0.0% nice,  0.0% system,  0.0% interrupt,  100% idle
Mem: 23M Active, 19M Inact, 75M Wired, 136K Cache, 34M Buf, 5900M Free
Swap: 2780M Total, 2780M Free

  PID    UID    THR PRI NICE   SIZE    RES STATE   C   TIME   WCPU COMMAND
   10      0      2 171 ki31     0K    64K RUN     0  20:18 198.00% idle
   11      0     17 -48    -     0K   544K WAIT    0   0:01  0.00% intr
 1118   1001      1  96    0 12800K  3920K CPU0    0   0:00  0.00% top
    4      0      1  -8    -     0K    32K -       1   0:00  0.00% g_down
 1158      0      4  -8    0 43672K  6296K biowr   0   0:00  0.00% csup


which stays in biowr state indefinitely.

I can issue kill -9 or kill -HUP from top(1),
which makes csup change state to STOP, but
nothing else happens.

As before, I can't log in from other terminals
and have to do a cold reset. I've reinstalled
on another disk, so not sure what's going on.

I think rm(1) is also extremely slow, but
maybe I'm imagining things.

many thanks
anton

-- 
Anton Shterenlikht
Room 2.6, Queen's Building
Mech Eng Dept
Bristol University
University Walk, Bristol BS8 1TR, UK
Tel: +44 (0)117 331 5944
Fax: +44 (0)117 929 4423
Received on Fri Mar 19 2010 - 20:15:39 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:01 UTC