Re: Running linux ldconfig on tmpfs results in unkillable process

From: Beat Gätzi <beat_at_chruetertee.ch>
Date: Thu, 20 Jan 2011 01:35:09 +0100
On 19.01.2011 13:24, Kostik Belousov wrote:
> On Tue, Jan 18, 2011 at 05:40:14PM +0100, Beat G?tzi wrote:
>> On 18.01.2011 17:13, Kostik Belousov wrote:
>>> On Tue, Jan 18, 2011 at 04:34:10PM +0100, Beat G?tzi wrote:
>>>> On 18.01.2011 15:46, Kostik Belousov wrote:
>>>>> On Tue, Jan 18, 2011 at 03:16:27PM +0100, Beat G?tzi wrote:
>>>>>> Hi,
>>>>>>
>>>>>> I've a tinderbox which uses tmpfs to build ports. Every time I build a
>>>>>> port which executes linux ldconfig it results in an unkillable process
>>>>>> which uses 100% CPU. The problem is reproduceable without tinderbox:
>>>>>>
>>>>>> # uname -a
>>>>>> FreeBSD daedalus.network.local 9.0-CURRENT FreeBSD 9.0-CURRENT #3
>>>>>> r216761: Tue Dec 28 15:32:26 CET 2010
>>>>>> root_at_daedalus.network.local:/usr/obj/usr/src/sys/GENERIC  i386
>>>>>> # mkdir /compat/test
>>>>>> # mount -t tmpfs tmpfs /compat/test
>>>>>> # cp -Rp /compat/linux/* /compat/test/
>>>>>> # mount -t linprocfs linprocfs /compat/test/proc
>>>>>> # /compat/linux/sbin/ldconfig -r /compat/test/
>>>>>> # pgrep ldconfig
>>>>>> 3449
>>>>>> # procstat -i 3449 | grep KILL
>>>>>>  3449 ldconfig         KILL     ---
>>>>>> # kill -9 3449
>>>>>> # procstat -i 3449 | grep KILL
>>>>>>  3449 ldconfig         KILL     P--
>>>>>>
>>>>>> >From top(1):
>>>>>> PID USERNAME THR PRI NICE  SIZE   RES STATE    C  TIME   WCPU COMMAND
>>>>>> 3449 root     1  44    0   992K   712K CPU1    1  10:06 100.00% ldconfig
>>>>>>
>>>>>> When I reboot the machine it hangs after "All buffers synced.".
>>>>>>
>>>>>> I've uploaded some additional output of procstat and ktrace here:
>>>>>> http://people.freebsd.org/~beat/logs/linux-ldconfig-tmpfs.txt
>>>>>>
>>>>>> Anyone knows how to fix this?
>>>>> kdump for the trace of the linux binary is a garbage. You need to
>>>>> use linux_kdump (from ports).
>>>>>
>>>>> I think that your process is looping in the kernel, you can confirm this
>>>>> by dropping in the ddb and doing "bt <pid>".
>>>>
>>>> I've uploaded a screenshot from the output of bt <pid> in ddb:
>>>> http://people.freebsd.org/~beat/logs/linux-ldconfig-tmpfs-bt.jpg
>>>
>>> Please try this.
>>>
>>> diff --git a/sys/compat/linux/linux_file.c b/sys/compat/linux/linux_file.c
>>> index 9ff1cf0..44ad193 100644
>>> --- a/sys/compat/linux/linux_file.c
>>> +++ b/sys/compat/linux/linux_file.c
>>> _at__at_ -369,7 +369,6 _at__at_ getdents_common(struct thread *td, struct linux_getdents64_args *args,
>>>  	lbuf = malloc(LINUX_MAXRECLEN, M_TEMP, M_WAITOK | M_ZERO);
>>>  	vn_lock(vp, LK_SHARED | LK_RETRY);
>>>  
>>> -again:
>>>  	aiov.iov_base = buf;
>>>  	aiov.iov_len = buflen;
>>>  	auio.uio_iov = &aiov;
>>> _at__at_ -506,8 +505,10 _at__at_ again:
>>>  			break;
>>>  	}
>>>  
>>> -	if (outp == (caddr_t)args->dirent)
>>> -		goto again;
>>> +	if (outp == (caddr_t)args->dirent) {
>>> +		nbytes = resid;
>>> +		goto eof;
>>> +	}
>>>  
>>>  	fp->f_offset = off;
>>>  	if (justone)
>>> diff --git a/sys/fs/tmpfs/tmpfs_subr.c b/sys/fs/tmpfs/tmpfs_subr.c
>>> index 84a2038..62dd0bf 100644
>>> --- a/sys/fs/tmpfs/tmpfs_subr.c
>>> +++ b/sys/fs/tmpfs/tmpfs_subr.c
>>> _at__at_ -827,9 +827,10 _at__at_ tmpfs_dir_getdents(struct tmpfs_node *node, struct uio *uio, off_t *cntp)
>>>  		/* Copy the new dirent structure into the output buffer and
>>>  		 * advance pointers. */
>>>  		error = uiomove(&d, d.d_reclen, uio);
>>> -
>>> -		(*cntp)++;
>>> -		de = TAILQ_NEXT(de, td_entries);
>>> +		if (error == 0) {
>>> +			(*cntp)++;
>>> +			de = TAILQ_NEXT(de, td_entries);
>>> +		}
>>>  	} while (error == 0 && uio->uio_resid > 0 && de != NULL);
>>>  
>>>  	/* Update the offset and cache. */
>>
>> This patch solves the problem.
>>
> Thank you, but apparently this is not the end of story.
> 
> I committed the linuxolator part of change, but I think that tmpfs
> change is uncomplete yet. Strictly following getdirentries(2), tmpfs
> must return EINVAL in the case when no single record can be returned.
> Currently, it indicates EOF instead. I think this could be a complete
> solution, but it might break e.g. Linux ldconfig(8) since it exposed
> the linuxolator situation.
> 
> Can you apply the patch below over the latest HEAD with r217578 included
> and retest ? Thanks.
> 
> diff --git a/sys/fs/tmpfs/tmpfs_subr.c b/sys/fs/tmpfs/tmpfs_subr.c
> index 84a2038..62dd0bf 100644
> --- a/sys/fs/tmpfs/tmpfs_subr.c
> +++ b/sys/fs/tmpfs/tmpfs_subr.c
> _at__at_ -827,9 +827,10 _at__at_ tmpfs_dir_getdents(struct tmpfs_node *node, struct uio *uio, off_t *cntp)
>  		/* Copy the new dirent structure into the output buffer and
>  		 * advance pointers. */
>  		error = uiomove(&d, d.d_reclen, uio);
> -
> -		(*cntp)++;
> -		de = TAILQ_NEXT(de, td_entries);
> +		if (error == 0) {
> +			(*cntp)++;
> +			de = TAILQ_NEXT(de, td_entries);
> +		}
>  	} while (error == 0 && uio->uio_resid > 0 && de != NULL);
>  
>  	/* Update the offset and cache. */
> diff --git a/sys/fs/tmpfs/tmpfs_vnops.c b/sys/fs/tmpfs/tmpfs_vnops.c
> index 059a790..a57c1f2 100644
> --- a/sys/fs/tmpfs/tmpfs_vnops.c
> +++ b/sys/fs/tmpfs/tmpfs_vnops.c
> _at__at_ -1349,7 +1349,7 _at__at_ outok:
>  	MPASS(error >= -1);
>  
>  	if (error == -1)
> -		error = 0;
> +		error = (cnt != 0) ? 0 : EINVAL;
>  
>  	if (eofflag != NULL)
>  		*eofflag =

I've applied the new patch on top of r217615 and was not able to
reproduce the problem.

Thanks again,
Beat
Received on Wed Jan 19 2011 - 23:35:11 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:10 UTC