Re: ls eat high CPU time when LANG=zh_CN.UTF-8 and LC_ALL=zh_CN.UTF-8

From: Huang Wen Hui <huanghwh_at_gmail.com>
Date: Tue, 5 Jul 2016 12:16:42 +0800
These 2 files can make ls suck:

touch 火灾1
touch 火灾2

2 files start with 2 same Chinese chars.


% lldb /bin/ls
(lldb) target create "/bin/ls"
Current executable set to '/bin/ls' (x86_64).
(lldb) run
Process 2185 launching
Process 2185 launched: '/bin/ls' (x86_64)

Enter Control+C:

libc.so.7 was compiled with optimization - stepping may behave oddly;
variables may not be available.
Process 2185 stopped
* thread #1: tid = 100261, 0x0000000800ff5aa7 libc.so.7`_collate_lookup
[inlined] largesearch(table=<unavailable>) + 38 at collate.c:276, stop
reason = signal SIGSTOP
    frame #0: 0x0000000800ff5aa7 libc.so.7`_collate_lookup [inlined]
largesearch(table=<unavailable>) + 38 at collate.c:276 [opt]
   273  next = (low + high) / 2;
   274  p = tab + next;
   275  compar = key - p->val;
-> 276  if (compar == 0)
   277  return (p);
   278  if (compar > 0)
   279  low = next + 1;
(lldb) bt
* thread #1: tid = 100261, 0x0000000800ff5aa7 libc.so.7`_collate_lookup
[inlined] largesearch(table=<unavailable>) + 38 at collate.c:276, stop
reason = signal SIGSTOP
  * frame #0: 0x0000000800ff5aa7 libc.so.7`_collate_lookup [inlined]
largesearch(table=<unavailable>) + 38 at collate.c:276 [opt]
    frame #1: 0x0000000800ff5a81
libc.so.7`_collate_lookup(table=<unavailable>, t=<unavailable>,
len=<unavailable>, pri=<unavailable>, which=<unavailable>,
state=<unavailable>) + 465 at collate.c:343 [opt]
    frame #2: 0x0000000800fd80a9 libc.so.7`wcscoll_l(ws1=<unavailable>,
ws2=<unavailable>, locale=<unavailable>) + 985 at wcscoll.c:171 [opt]
    frame #3: 0x0000000800fd4d19 libc.so.7`strcoll_l(s="火灾1", s2="火灾2",
locale=0x000000080124a338) + 393 at strcoll.c:101 [opt]
    frame #4: 0x0000000800fe9313 libc.so.7`qsort(a=<unavailable>,
n=<unavailable>, es=<unavailable>, cmp=(libc.so.7`fts_compar at fts.c:966))
+ 13763 at qsort.c:130 [opt]
    frame #5: 0x0000000800f25297 libc.so.7`fts_sort(sp=<unavailable>,
head=<unavailable>, nitems=<unavailable>) + 135 at fts.c:995 [opt]
    frame #6: 0x0000000800f2638e libc.so.7`fts_children(sp=<unavailable>,
instr=2) + 254 at fts.c:570 [opt]
    frame #7: 0x00000000004030df ls`traverse(argc=<unavailable>,
argv=<unavailable>, options=<unavailable>) + 463 at ls.c:576 [opt]
    frame #8: 0x0000000000402eeb ls`main(argc=<unavailable>,
argv=<unavailable>) + 2299 at ls.c:498 [opt]
    frame #9: 0x00000000004020cf ls`_start + 383

2016-07-04 15:04 GMT+08:00 Baptiste Daroussin <bapt_at_freebsd.org>:

> On Mon, Jul 04, 2016 at 02:51:46PM +0800, Huang Wen Hui wrote:
> > 2016-07-04 14:41 GMT+08:00 Baptiste Daroussin <bapt_at_freebsd.org>:
> >
> > > On Mon, Jul 04, 2016 at 02:36:11PM +0800, Huang Wen Hui wrote:
> > > > 2016-07-04 14:20 GMT+08:00 Baptiste Daroussin <bapt_at_freebsd.org>:
> > > >
> > > > > On Mon, Jul 04, 2016 at 11:56:36AM +0800, Huang Wen Hui wrote:
> > > > > > Hi,
> > > > > > On very recent CURRENT, ls can eat high CPU time when
> > > LANG=zh_CN.UTF-8
> > > > > and
> > > > > > LC_ALL=zh_CN.UTF-8:
> > > > > >
> > > > > > % uname -a
> > > > > > FreeBSD mbp.gddsn.org.cn 11.0-ALPHA6 FreeBSD 11.0-ALPHA6 #121
> > > r302331M:
> > > > > Mon
> > > > > > Jul  4 10:47:27 CST 2016     root_at_mbp.gddsn.org.cn:
> > > > > /usr/obj/usr/src/sys/MACBOOK
> > > > > >  amd64
> > > > > >
> > > > > > top show:
> > > > > > 4457 hwh           1 100    0 16784K  4416K CPU4    4   0:22
> 98.86%
> > > ls
> > > > > >
> > > > > > any ideas?
> > > > > >
> > > > > Is it in all directories or only in directories with files in
> chinese
> > > > > characters?
> > > > >
> > > > Yes, the  directory contain Chinese characters.
> > > >
> > > > >
> > > > > Is it only happening when you run ls with some arguments (in
> particular
> > > > > -l) or
> > > > > with any arguments?
> > > > >
> > > > I use  ls -wGl
> > > >
> > > > >
> > > > > Do you see the same if you force any other locale like en_US.UTF-8?
> > > > >
> > > > There is no problem if set  en_US.UTF-8.
> > > >
> > > >
> > > > > Best regards,
> > > > > Bapt
> > > > >
> > >
> > > Can you try:
> > > env -i LANG=zh_CN.UTF-8 LC_COLLATE=C ls -l
> > >
> > > And tell me if it still happen?
> > >
> > No problem with this command.
> >
>
> Ok so there might be an very inefficient code in the new chinese collation
> code
> I will look into it thanks a lot for reporting.
>
> Best regards,
> Bapt
>
Received on Tue Jul 05 2016 - 02:16:45 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:06 UTC