Re: __tls_get_addr problem with recent current

From: Kostik Belousov <kostikbel_at_gmail.com> Date: Mon, 1 Sep 2008 17:53:15 +0300 · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:34 UTC

On Mon, Sep 01, 2008 at 05:33:37PM +0300, Vyacheslav Bocharov wrote:
> I have similar problem in 7-STABLE (from 1 sep):
> 32bit application exec 64application and we have an core dump:
> 
> # gdb fw.sh fw.sh.core
> GNU gdb 6.1.1 [FreeBSD]
> Copyright 2004 Free Software Foundation, Inc.
> GDB is free software, covered by the GNU General Public License, and you are
> welcome to change it and/or distribute copies of it under certain
> conditions.
> Type "show copying" to see the conditions.
> There is absolutely no warranty for GDB.  Type "show warranty" for details.
> This GDB was configured as "amd64-marcel-freebsd"...
> Core was generated by `fw.sh'.
> Program terminated with signal 11, Segmentation fault.
> Reading symbols from /usr/lib/libstdc++.so.6...done.
> Loaded symbols for /usr/lib/libstdc++.so.6
> Reading symbols from /lib/libm.so.5...done.
> Loaded symbols for /lib/libm.so.5
> Reading symbols from /lib/libgcc_s.so.1...done.
> Loaded symbols for /lib/libgcc_s.so.1
> Reading symbols from /lib/libc.so.7...done.
> Loaded symbols for /lib/libc.so.7
> Reading symbols from /libexec/ld-elf.so.1...done.
> Loaded symbols for /libexec/ld-elf.so.1
> #0  0x0000000800507483 in __tls_get_addr () from /libexec/ld-elf.so.1
> (gdb) bt
> #0  0x0000000800507483 in __tls_get_addr () from /libexec/ld-elf.so.1
> #1  0x0000000800ad8892 in _pthread_mutex_init_calloc_cb () from
> /lib/libc.so.7
> #2  0x0000000800ada35f in malloc () from /lib/libc.so.7
> #3  0x00000008007050ad in operator new () from /usr/lib/libstdc++.so.6
> #4  0x00000008006b5f21 in std::string::_Rep::_S_create ()
>    from /usr/lib/libstdc++.so.6
> #5  0x00000008006b6ca5 in std::string::_S_copy_chars ()
>    from /usr/lib/libstdc++.so.6
> #6  0x00000008006b6dc2 in std::basic_string<char, std::char_traits<char>,
> std::allocator<char> >::basic_string () from /usr/lib/libstdc++.so.6
> #7  0x00000000004021ec in __static_initialization_and_destruction_0 (
>     __initialize_p=1, __priority=65535) at CCmdLine.cpp:16
> #8  0x00000000004026c3 in global constructors keyed to cmdlist ()
>     at CCmdLine.cpp:177
> #9  0x00000000004033a2 in __do_global_ctors_aux ()
> #10 0x000000000040113e in _init ()
> #11 0x0000000800b2b0c0 in __cxa_atexit () from /lib/libc.so.7
> #12 0x00000000004014e8 in _start ()
> #13 0x000000080052c000 in ?? ()
> 
> I tried your patch but nothing changed.
Exactly which patch ? There were three, one of which caused immediate
panic. I put the patches at
http://people.freebsd.org/~kib/misc/fsbase.1.patch
http://people.freebsd.org/~kib/misc/fsbase.2.patch

Could you, please, try both and report the results ?
And, isolated test case, as several C files or recipe to reproduce
this with base system, would be ideal.

> 
> 2008/8/31 Kostik Belousov <kostikbel_at_gmail.com>
> 
> > On Sun, Aug 31, 2008 at 10:16:18AM +0300, Kostik Belousov wrote:
> > > On Sat, Aug 30, 2008 at 02:03:00PM -0700, Artem Belevich wrote:
> > > > With the new patch kernel has crashed as soon as I ran i386 app,
> > > > though the crash happened within in-kernel thread g_up:
> > > >
> > > > Fatal trap 12: page fault while in kernel mode
> > > > cpuid = 2; apic id = 02
> > > > fault virtual address   = 0x20
> > > > fault code              = supervisor read data, page not present
> > > > instruction pointer     = 0x8:0xffffffff804a821f
> > > > stack pointer           = 0x10:0xffffffffac280b60
> > > > frame pointer           = 0x10:0x0
> > > > code segment            = base 0x0, limit 0xfffff, type 0x1b
> > > >                        = DPL 0, pres 1, long 1, def32 0, gran 1
> > > > processor eflags        = resume, IOPL = 0
> > > > current process         = 3 (g_up)
> > > > trap number             = 12
> > > > panic: page fault
> > > > cpuid = 2
> > > > Uptime: 37s
> > > > Physical memory: 8169 MB
> > > > Dumping 380 MB: 365 349 333 317 301 285 269 253 237 221 205 189 173
> > > > 157 141 125 109 93 77 61 45 29 13
> > > Could you, please, show me the disassembled code around the faulted
> > > %rip ?
> >
> > No need, it seems I found the problem. I trashed the %rdx that contains
> > the third cpu_switch argument. Please, try the updated patch.
> >
> > Thanks for the testing !
> >
> > diff --git a/sys/amd64/amd64/cpu_switch.S b/sys/amd64/amd64/cpu_switch.S
> > index f34b0cc..03f0eca 100644
> > --- a/sys/amd64/amd64/cpu_switch.S
> > +++ b/sys/amd64/amd64/cpu_switch.S
> > _at__at_ -249,6 +249,12 _at__at_ store_seg:
> >  1:     movl    %ds,PCB_DS(%r8)
> >        movl    %es,PCB_ES(%r8)
> >        movl    %fs,PCB_FS(%r8)
> > +       movq    %rdx,%r11
> > +       movl    $MSR_FSBASE,%ecx
> > +       rdmsr
> > +       shlq    $32,%rdx
> > +       leaq    (%rax,%rdx),%r9
> > +       movq    %r11,%rdx
> >         jmp     done_store_seg
> >  2:     movq    PCB_GS32P(%r8),%rax
> >        movq    (%rax),%rax
> >
> 
> 
> 
> -- 
> Vyacheslav Bocharov