csup/svn/ldd make host unresponsive [WAS: Re: ldd leaves the machine unresponsive]

From: Anton Shterenlikht <mexas_at_bristol.ac.uk>
Date: Sun, 21 Mar 2010 18:22:14 +0000
On Sat, Mar 20, 2010 at 08:53:37PM +0000, Anton Shterenlikht wrote:
> On Sat, Mar 20, 2010 at 03:44:46PM +0000, Anton Shterenlikht wrote:
> > On Sat, Mar 20, 2010 at 07:27:43AM -0400, jhell wrote:
> > > 
> > > On Fri, 19 Mar 2010 17:15, Anton Shterenlikht wrote:
> > > In Message-Id: <20100319211535.GA76683_at_mech-cluster241.men.bris.ac.uk>
> > > 
> > > > On Thu, Mar 18, 2010 at 11:29:36AM -0400, jhell wrote:
> > > >> -----BEGIN PGP SIGNED MESSAGE-----
> > > >> Hash: SHA1
> > > >>
> > > >>
> > > >>
> > > >> On Wed, 17 Mar 2010 12:32, Anton Shterenlikht wrote:
> > > >> In Message-Id: <20100317163230.GJ87732_at_mech-cluster241.men.bris.ac.uk>
> > > >>
> > > >>> Just updated to ia64 r205248
> > > >>>
> > > >>> If my problem is due to my mis-configuration,
> > > >>> I apologise in advance.
> > > >>>
> > > >>> I run this shell script after each upgrade
> > > >>> and 'make delete-old-libs' to check
> > > >>> if any shared objects need to be rebuilt:
> > > >>>
> > > >>> <start script>
> > > >>>
> > > >>> #!/bin/sh
> > > >>>
> > > >>> for file in `find /bin /sbin /usr/bin /usr/sbin /usr/lib /usr/libexec /usr/local -name "*"`
> > > >>> do
> > > >>>        echo $file
> > > >>>        ldd $file >> /root/ldd_results 2> /dev/zero
> > > >>> done
> > > >>>
> > > >>> <end script>
> > > >>>
> > > >>
> > > >> This will probably do closer to what you actually would want to look for.
> > > >>
> > > >> Writing to /dev/zero ... I don't know never tried it since /dev/null is
> > > >> usually the standard place to throw trash.
> > > >>
> > > >> #!/bin/sh
> > > >> for file in `find /*bin /usr/*bin /usr/lib* /usr/local/*bin -type f` do
> > > >>  	echo $file
> > > >>  	ldd $file >>/root/ldd_results 2>/dev/null
> > > >> done
> > > >>
> > > >> The problem with your script is that it finds most files that it can not
> > > >> or is not useful to run ldd on and leaves you junk in return.
> > > >>
> > > >> It might be more useful if you searched for dynamically linked ELF
> > > >> binaries to run ldd against like the following.
> > > >>
> > > >> === Script starts here ===
> > > >> #!/bin/sh
> > > >>
> > > >> SEARCHPATH="/*bin /usr/*bin /usr/lib* /usr/local/*bin"
> > > >>
> > > >> trap 'exit 1' 2
> > > >>
> > > >> check_libs() {
> > > >> for spath in $SEARCHPATH; do
> > > >>          for ifelf in `find $spath -type f`; do
> > > >>                  ldd `file $ifelf | grep dynamically | cut -f1 -d:`
> > > >>          done
> > > >> done
> > > >> }
> > > >>
> > > >> check_libs 2>/dev/null
> > > >> === Script ends here ===
> > > >>
> > > >> The above will find all type ELF * that are dynamically linked within the
> > > >> SEARCHPATH variable and run ldd on them and print the results to stdout.
> > > >>
> > > >> Obviously since you are going to have thousands of files being questioned,
> > > >> stdout is not going to be useful.
> > > >>
> > > >> So with the about stated:
> > > >> save the script to: checklibs.sh
> > > >> run with: "sh checklibs.sh >/root/checklibs_output"
> > > >> or: "script /root/checklibs_output checklibs.sh"
> > > >>
> > > >>> After the upgrade to r205248, the script
> > > >>> freezes at seemingly random points.
> > > >>>
> > > >>
> > > >> Unneeded disk usage & execution.
> > > >>
> > > >>> I can still ssh to the machine (using keys), i.e.
> > > >>> I see the welcome message, but cannot get to the console prompt.
> > > >>
> > > >> Of course... to many open files or processes in wait. SSH already has the
> > > >> information it needs loaded into memory, that's why you can get sort-of-in
> > > >>
> > > >> ZFS file-system perhaps ?
> > > >
> > > > I've no ZFS.
> > > >
> > > > I'm seeing very similar behaviour now with csup:
> > > >
> > > > ( I do csup -L2 /root/ports-supfile, where
> > > >
> > > > # cat /root/ports-supfile
> > > > *default host=cvsup.uk.FreeBSD.org
> > > > *default base=/var/db
> > > > *default prefix=/usr
> > > > *default release=cvs tag=. delete use-rel-suffix compress
> > > >
> > > > ports-all
> > > > # )
> > > >
> > > > top(1) shows:
> > > >
> > > > last pid:  1160;  load averages:  0.00,  0.06,  0.07                                                                           up 0+00:10:53  15:05:52
> > > > 81 processes:  3 running, 61 sleeping, 17 waiting
> > > > CPU 0:  0.0% user,  0.0% nice,  0.2% system,  0.0% interrupt, 99.8% idle
> > > > CPU 1:  0.0% user,  0.0% nice,  0.0% system,  0.0% interrupt,  100% idle
> > > > Mem: 23M Active, 19M Inact, 75M Wired, 136K Cache, 34M Buf, 5900M Free
> > > > Swap: 2780M Total, 2780M Free
> > > >
> > > >  PID    UID    THR PRI NICE   SIZE    RES STATE   C   TIME   WCPU COMMAND
> > > >   10      0      2 171 ki31     0K    64K RUN     0  20:18 198.00% idle
> > > >   11      0     17 -48    -     0K   544K WAIT    0   0:01  0.00% intr
> > > > 1118   1001      1  96    0 12800K  3920K CPU0    0   0:00  0.00% top
> > > >    4      0      1  -8    -     0K    32K -       1   0:00  0.00% g_down
> > > > 1158      0      4  -8    0 43672K  6296K biowr   0   0:00  0.00% csup
> > > >
> > > >
> > > > which stays in biowr state indefinitely.
> > > >
> > > > I can issue kill -9 or kill -HUP from top(1),
> > > > which makes csup change state to STOP, but
> > > > nothing else happens.
> > > >
> > > > As before, I can't log in from other terminals
> > > > and have to do a cold reset. I've reinstalled
> > > > on another disk, so not sure what's going on.
> > > >
> > > > I think rm(1) is also extremely slow, but
> > > > maybe I'm imagining things.
> > > >
> > > > many thanks
> > > > anton
> > > >
> > > >
> > > 
> > > 
> > > I would post up the contents of your make.conf & your kernel config & your 
> > > dmesg somewhere so it can be evaluated.
> > 
> > When I reinstalled 8.0 from a CD,
> > I updated source with csup, that worked.
> > However, after upgrading to current, I can't get
> > any luck with csup. The important bit is that
> > I don't really know what revision this is.
> > 
> > I've no /etc/make.conf
> > 
> > kernel config:
> > 	http://seis.bris.ac.uk/~mexas/freebsd/ia64/rx2600/uzi/UZI
> > 
> > dmesg:
> > 	http://seis.bris.ac.uk/~mexas/freebsd/ia64/rx2600/uzi/dmesg.boot
> > 
> 
> Marcel, this might be of some interest.
> 
> I managed to csup /usr/src, probably because
> there was not too many updates from 3 days ago.
> I proceeded with updating the system, but
> had a freeze again in single user at the very
> beginning of 'make installworld'.
> 
> Now I've reinstalled 8.0-CURRENT-200906
> snapshot and have no issues with csup,
> just completed downloading the ports tree.
> It seems something is wrong with csup(1),
> or pehaps disk i/o, in the recent ia64 updates.
> 
> I'll try building svn from ports and update
> via svn, will report the results.

An update:

1. reinstalled from 8.0-CURRENT-200906

2. installed the ports tree via csup(1)

3. installed svn(1) from ports

4. updated src with svn.
	Both svn and csup worked fine here.

5. rebuilt and reinstalled kernel and world as
   usual to r205403.

6. rebooted.
The kernel config file:
	http://seis.bris.ac.uk/~mexas/freebsd/ia64/rx2600/uzi/UZI 

dmesg:
	http://seis.bris.ac.uk/~mexas/freebsd/ia64/rx2600/uzi/dmesg.boot

ifconfig -a:
	http://seis.bris.ac.uk/~mexas/freebsd/ia64/rx2600/uzi/ifconfig-a


7. tried to update the src again with svn and got stuck.
	All I can issue is CTRL/T, which shows for svn:

mech-as221# svn co svn://svn.freebsd.org/base/head/ /usr/src/

load: 0.00  cmd: svn 888 [biord] 8008.53r 0.09u 0.30s 0% 13992k
load: 0.00  cmd: svn 888 [biord] 8009.53r 0.09u 0.30s 0% 13992k
load: 0.00  cmd: svn 888 [biord] 8015.07r 0.09u 0.30s 0% 13992k

in another ssh session I was running gstat(8) which showed
zero activity in the disk.

and in yet another ssh session I tried to launch top:

mech-as221# top
load: 0.00  cmd: csh 915 [ufs] 6146.33r 0.00u 0.00s 0% 5008k
load: 0.00  cmd: csh 915 [ufs] 6147.15r 0.00u 0.00s 0% 5008k

and on the serial console:

load: 0.00  cmd: getty 828 [ufs] 8129.90r 0.00u 0.00s 0% 2560k
load: 0.00  cmd: getty 828 [ufs] 8130.70r 0.00u 0.00s 0% 2560k

but the shell prompt never appears.
I've waited maybe 2-3 hours.


Can't do much else, but a cold reboot.

many thanks
anton

-- 
Anton Shterenlikht
Room 2.6, Queen's Building
Mech Eng Dept
Bristol University
University Walk, Bristol BS8 1TR, UK
Tel: +44 (0)117 331 5944
Fax: +44 (0)117 929 4423
Received on Sun Mar 21 2010 - 17:22:19 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:02 UTC