Re: FBSD 5.2.CURRENT-p4 and mysqld problems

From: Ganbold <ganbold_at_micom.mng.net>
Date: Tue, 13 Apr 2004 09:47:48 +0900
Ken,

Thank you very much for reply. I'm having this problem since March 13th, 
and having big trouble.
I compiled mysql with linuxthreads-2.2.3-15.

At 04:17 AM 13.04.2004, you wrote:
>Are you able to run show process list?

Sometimes I can't run show processlist. It hangs.

>What is the status of your
>query?
>What date was your freeBSD 5-current built on?  Please send
>output of uname -a, so that we know exactly which version on which
>date because -current changes every day.

I did cvsup recently. uname -an shows:

backend2# uname -an
FreeBSD backend2.ub.mng.net 5.2-CURRENT FreeBSD 5.2-CURRENT #4: Sun Apr 11 
17:23:09 ULAT 
2004     tsgan_at_backend2.ub.mng.net:/usr/obj/usr/src/sys/BACKEND2  i386

Somehow on my another 5.2-CURRENT machine built on March 12 mysql-4.0.18 
runs without any problem.
Probably there was some change in FreeBSD kernel source and this might 
cause mysqld with linuxthreads
lock and hang. But it is just my guess.

>Also are you using libmap.conf?  If yes, what are it's contents.

I think I'm not using libmap.conf, since there is no /etc/libmap.conf file.

>What
>does top and ps show when this query is "locked up", can you include
>output of this?

When mysql hangs output of the ps axlwww | grep mysql shows following:

backend2# ps axlwww | grep mysql
     0 12544 12536   0   8  0  1564  988 wait   S     ??    0:00.01 sh -c 
/usr/local/bin/mysqldump --all-databases --complete-insert --user=backupman 
--password=xxx > all_db_backend2_2004-04-13.sql
     0 12545 12544   0   4  0  2692 2016 sbwait S     ??    7:25.74 
/usr/local/bin/mysqldump --all-databases --complete-insert --user=backupman 
--password=xxx
     0 19239 18683   0   8  0  1564 1004 wait   S     ??    0:00.00 sh -c 
/bin/sh /tmp/logrot0AWE3J /var/db/mysql/general.log
     0 19240 19239   0   8  0  1572 1008 wait   S     ??    0:00.00 /bin/sh 
/tmp/logrot0AWE3J /var/db/mysql/general.log
     0 19244 19240   0   4  0  1708  972 sbwait S     ??    0:00.01 
/usr/local/bin/mysqladmin --password=xxx flush-logs
     0 14377     1   0   8  0  1584 1032 wait   S     p0    0:00.04 /bin/sh 
/usr/local/bin/mysqld_safe --user=mysql --datadir=/var/db/mysql 
--pid-file=/var/db/mysql/backend2.pid --log-slow-queries=slow.log 
--log=general.log --log-update=update.log --default-character-set=latin1
    88 15081 49061   0  20 12 472736 445900 
pause  SN    p0    0:00.00  (mysqld)
    88 15082 49061   0  20 12 472736 445900 
pause  SN    p0    0:00.00  (mysqld)
    88 16118 49061   0  20 12 472736 445900 
pause  SN    p0    0:00.00  (mysqld)
    88 16119 49061   0  20 12 472736 445900 
pause  SN    p0    0:00.00  (mysqld)
    88 16120 49061   0  20 12 472736 445900 
pause  SN    p0    0:00.00  (mysqld)
    88 16386 49061   0  20 12 472736 445900 
pause  SN    p0    0:00.00  (mysqld)
    88 17143 49061   0  20 12 472736 445900 
pause  SN    p0    0:00.00  (mysqld)
    88 17647 49061   0  20 12 472736 445900 
pause  SN    p0    0:00.00  (mysqld)
    88 17648 49061   0  20 12 472736 445900 
pause  SN    p0    0:00.00  (mysqld)
    88 18669 49061   0  20 12 472736 445900 
pause  SN    p0    0:00.00  (mysqld)
    88 18670 49061   0  20 12 472736 445900 
pause  SN    p0    0:00.00  (mysqld)
    88 18671 49061   0  20 12 472736 445900 
pause  SN    p0    0:00.00  (mysqld)
    88 19245 49061   0  20 12 472736 445900 
pause  SN    p0    0:00.00  (mysqld)
    88 19763 49061   0  20 12 472736 445900 
pause  SN    p0    0:00.00  (mysqld)
    88 19764 49061   0  20 12 472736 445900 
pause  SN    p0    0:00.00  (mysqld)
    88 20268 49061   0  20 12 472736 445900 
pause  SN    p0    0:00.00  (mysqld)
    88 20269 49061   0  20 12 472736 445900 
pause  SN    p0    0:00.00  (mysqld)
    88 20270 49061   0  20 12 472736 445900 
pause  SN    p0    0:00.00  (mysqld)
    88 20271 49061   0  20 12 472736 445900 
pause  SN    p0    0:00.00  (mysqld)
    88 20790 49061   0  20 12 472736 445900 
pause  SN    p0    0:00.00  (mysqld)
    88 22329 49061   0  20 12 472736 445900 
pause  SN    p0    0:00.00  (mysqld)
    88 22330 49061   0  20 12 472736 445900 
pause  SN    p0    0:00.00  (mysqld)
    88 22332 49061   0  20 12 472736 445900 
pause  SN    p0    0:00.00  (mysqld)
    88 23352 49061   0  20 12 472736 445900 
pause  SN    p0    0:00.00  (mysqld)
    88 23858 49061   0  20 12 472736 445900 
pause  SN    p0    0:00.00  (mysqld)
    88 23859 49061   0  20 12 472736 445900 
pause  SN    p0    0:00.00  (mysqld)
    88 24376 49061   0  20 12 472736 445900 
pause  SN    p0    0:00.00  (mysqld)
    88 25397 49061   0  20 12 472736 445900 
pause  SN    p0    0:00.00  (mysqld)
    88 25901 49061   0  20 12 472736 445900 
pause  SN    p0    0:00.00  (mysqld)
    88 25902 49061   0  20 12 472736 445900 
pause  SN    p0    0:00.00  (mysqld)
    88 26406 49061   0  20 12 472736 445900 
pause  SN    p0    0:00.00  (mysqld)
    88 26422 49061   0  20 12 472736 445900 
pause  SN    p0    0:00.00  (mysqld)
    88 26926 49061   0  20 12 472736 445900 
pause  SN    p0    0:00.00  (mysqld)
    88 26927 49061   0  20 12 472736 445900 
pause  SN    p0    0:00.00  (mysqld)
    88 27948 49061   0  20 12 472736 445900 
pause  SN    p0    0:00.00  (mysqld)
    88 27949 49061   0  20 12 472736 445900 
pause  SN    p0    0:00.00  (mysqld)
    88 28469 49061   0  20 12 472736 445900 
pause  SN    p0    0:00.00  (mysqld)
    88 28973 49061   0  20 12 472736 445900 
pause  SN    p0    0:00.00  (mysqld)
    88 29477 49061   0  20 12 472736 445900 
pause  SN    p0    0:00.00  (mysqld)
    88 29996 49061   0  20 12 472736 445900 
pause  SN    p0    0:00.00  (mysqld)
    88 29997 49061   0  20 12 472736 445900 
pause  SN    p0    0:00.00  (mysqld)
    88 30501 49061   0  20 12 472736 445900 
pause  SN    p0    0:00.00  (mysqld)
    88 31018 49061   0  20 12 472736 445900 
pause  SN    p0    0:00.00  (mysqld)
    88 31019 49061   0  20 12 472736 445900 
pause  SN    p0    0:00.00  (mysqld)
    88 31523 49061   0  20 12 472736 445900 
pause  SN    p0    0:00.00  (mysqld)
    88 32040 49061   0  20 12 472736 445900 
pause  SN    p0    0:00.00  (mysqld)
    88 32544 49061   0  20 12 472736 445900 
pause  SN    p0    0:00.00  (mysqld)
    88 32545 49061   0  20 12 472736 445900 
pause  SN    p0    0:00.00  (mysqld)
    88 32547 49061   0  20 12 472736 445900 
pause  SN    p0    0:00.00  (mysqld)
    88 49060 14377   0  87 11 472736 445900 select 
SN    p0    0:02.98  (mysqld)
    88 49061 49060   0  87 11 472736 445900 select 
SN    p0    0:00.70  (mysqld)
    88 49062 49061   0  20 14 472736 445900 
pause  SN    p0    0:00.00  (mysqld)
    88 49063 49061   0  20 14 472736 445900 
pause  SN    p0    0:00.00  (mysqld)
    88 49064 49061   0  20 14 472736 445900 
pause  SN    p0    0:00.00  (mysqld)
    88 49065 49061   0  20 14 472736 445900 
pause  SN    p0    0:00.00  (mysqld)
    88 49066 49061   0  20 14 472736 445900 
pause  SN    p0    0:00.00  (mysqld)
    88 49067 49061   0  90 14 472736 445900 select 
SN    p0    0:08.76  (mysqld)
    88 49068 49061   0  20 14 472736 445900 
pause  SN    p0    0:00.00  (mysqld)
    88 49070 49061   0  20 11 472736 445900 
pause  SN    p0    0:06.38  (mysqld)
    88 49071 49061   0  20 12 472736 445900 
pause  SN    p0    0:00.00  (mysqld)
    88 49072 49061   0  20 14 472736 445900 
pause  SN    p0    4:57.18  (mysqld)
    88 49073 49061   0  20 14 472736 445900 
pause  SN    p0    0:10.60  (mysqld)
    88 49099 49061   0  20 14 472736 445900 
pause  SN    p0    0:00.55  (mysqld)
    88 49103 49061   0  20 14 472736 445900 
pause  SN    p0    0:55.50  (mysqld)
    88 49104 49061   0  20 14 472736 445900 
pause  SN    p0    0:58.64  (mysqld)
    88 49105 49061   0  20 14 472736 445900 
pause  SN    p0   20:30.31  (mysqld)

         ^^^^^^ When kills this thread, everything becomes OK

    88 51249 49061   0  20 14 472736 445900 
pause  SN    p0    0:02.53  (mysqld)
    88 98607 49061   0  20 14 472736 445900 
pause  SN    p0    0:00.09  (mysqld)
    88 98608 49061   0  20 14 472736 445900 
pause  SN    p0    0:05.68  (mysqld)
    88 99112 49061   0  20 14 472736 445900 
pause  SN    p0    0:01.20  (mysqld)
    88 99629 49061   0  20 14 472736 445900 
pause  SN    p0    2:18.97  (mysqld)
     0 32549 12045   0  -8  0  1376  784 piperd S+    p3    0:00.00 grep mysql
backend2# kill -9 49105

>Why do you feel you MUST use freebsd5 (I am just
>curious)?  Have you tried a GDB backtrace (please use debug version of
>mysql!)?

I know that I should use FreeBSD-4.x stable rather than 5.x. But I wanted 
to see new features such
as ULE scheduling, SMPng etc. I didn't try GDB backtrace yet.


>I know you already know this, but you shouldn't be using 5-current on
>a production server and even the releases such as 5.2.1 can have
>issues that you need to be aware of.

Yes. But mostly Current works fine without any problem. I have 17 FreeBSD 
machines
and 13 of them is FreeBSD 5.2-CURRENT.

Also today I received one email saying some problems with linux kernel. I 
know this
problem is related to Linux kernel, but there is also mentioned mysql and 
linuxthreads.

Please read following:
--------------------------------------------------------------------------------------------------------------------------------------
Hello.

We faced a bug (?) in Linux kernel causing different misbehaviours on our
server. After exploration, it seems that we found some security
implications of this issue.


When a process exits, it's parent is notified by SIGCHLD, and finished
child is kept in process table in "zombie" state until parent process (or
init, if parent is already ended) handles child exit.

Similary, with linuxthreads, when a thread exits, another thread in the
same process is notified by signal 33 (SIGRT_1), and exitted thread exists
in the process table in "zombie" state until the exit is handled.

When a signal that notifies about exit is generated by the kernel, kernel
code allocates a "struct sigqueue" object. This object keeps information
about the signal until the signal is delivered.

Only a limited number of such objects may be allocated at a time.
There is some code in the kernel that still allows signals with numbers
less than 32 to be delivered when "struct sigqueue" object can't be
allocated. However, for signal 33 signal generation routine just returns
-EAGAIN in this case.
As the result, process is not notified about thread exits, and ended thread
is left in "zombie" state.
Details are at
http://www.ussg.iu.edu/hypermail/linux/kernel/0404.0/0208.html

For long-living processes that create short-living threads (such as
mysqld), this causes process table overflow in several minutes.

"struct sigqueue" overflow may be easily caused from userspace, if a
process blocks a signal and then receives a large number of such signals.
The following sample code does that:

#include <signal.h>
#include <unistd.h>
#include <stdlib.h>

int main()
{
         sigset_t set;
         int i;
         pid_t pid;

         sigemptyset(&set);
         sigaddset(&set, 40);
         sigprocmask(SIG_BLOCK, &set, 0);

         pid = getpid();
         for (i = 0; i < 1024; i++)
                 kill(pid, 40);

         while (1)
                 sleep(1);
}

So if a user runs such code (or just runs a buggy program that blocks a
signal and then receives 1000 such signals - which happens here), this
will cause a DoS againt anything running on the same system that uses
linuxthreads, including daemons running as root.

On systems that use NPTL (such as Linux 2.6 kernel) there is no 'thread
zombie' problem, because in NPTL another notification mechanism is used.
However, DoS is still possible (and really happens - in form of daemon
crashes), because when it is not possible to allocatre a "struct sigqueue"
object, kernel behaviour in signal-passing changes, causing random hangs
and segfaults in different programs.

--------------------------------------------------------------------------------------------------------------------------------------

Please let me know your opinion and let me know if there is any solution 
for this.

thank you very much

Ganbold




>Also, one final recommendation,  if you are using current and have not
>updated in the last 30 to 60 days I would recommend that you update
>again as many thing have been changed and fixed.
>
>Hope this helps,
>Ken



After kill:

backend2# ps axlwww | grep mysql
     0 14377     1   0   8  0  1584 1032 wait   S     p0    0:00.05 /bin/sh 
/usr/local/bin/mysqld_safe --user=mysql --datadir=/var/db/mysql 
--pid-file=/var/db/mysql/backend2.pid --log-slow-queries=slow.log 
--log=general.log --log-update=update.log --default-character-set=latin1
    88 32560 14377   0 139 11 438256 308456 
Giant  LNL   p0    0:01.17  (mysqld)
    88 32561 32560   0 121 11 438256 308456 select 
SN    p0    0:00.00  (mysqld)
    88 32562 32561   0  20 14 438256 308456 
pause  SN    p0    0:00.00  (mysqld)
    88 32563 32561   0  20 14 438256 308456 
pause  SN    p0    0:00.00  (mysqld)
    88 32564 32561   0  20 14 438256 308456 
pause  SN    p0    0:00.00  (mysqld)
    88 32565 32561   0  20 14 438256 308456 
pause  SN    p0    0:00.00  (mysqld)
    88 32566 32561   0 123 14 438256 308456 select 
SN    p0    0:00.00  (mysqld)
    88 32567 32561   0 124 14 438256 308456 select 
SN    p0    0:00.00  (mysqld)
    88 32568 32561   0 121 14 438256 308456 select 
SN    p0    0:00.00  (mysqld)
     0 32570 12045   0  -8  0  1376  784 piperd S+    p3    0:00.01 grep mysql
backend2# ps axlwww | grep mysql
     0 14377     1   0   8  0  1584 1032 wait   S     p0    0:00.05 /bin/sh 
/usr/local/bin/mysqld_safe --user=mysql --datadir=/var/db/mysql 
--pid-file=/var/db/mysql/backend2.pid --log-slow-queries=slow.log 
--log=general.log --log-update=update.log --default-character-set=latin1
    88 32560 14377   0 139 11 457024 441232 select 
SN    p0    0:01.78  (mysqld)
    88 32561 32560   0  96 11 457024 441232 select 
SN    p0    0:00.00  (mysqld)
    88 32562 32561   0  20 14 457024 441232 
pause  SN    p0    0:00.00  (mysqld)
    88 32563 32561   0  20 14 457024 441232 
pause  SN    p0    0:00.00  (mysqld)
    88 32564 32561   0  20 14 457024 441232 
pause  SN    p0    0:00.00  (mysqld)
    88 32565 32561   0  20 14 457024 441232 
pause  SN    p0    0:00.00  (mysqld)
    88 32566 32561   0  20 14 457024 441232 
pause  SN    p0    0:00.00  (mysqld)
    88 32567 32561   0 124 14 457024 441232 select 
SN    p0    0:00.00  (mysqld)
    88 32568 32561   0  20 14 457024 441232 
pause  SN    p0    0:00.00  (mysqld)
    88 32571 32561   0  20 11 457024 441232 
pause  SN    p0    0:00.00  (mysqld)
    88 32572 32561   0  20 12 457024 441232 
pause  SN    p0    0:00.00  (mysqld)
    88 32573 32561   0  20 14 457024 441232 
pause  SN    p0    0:00.04  (mysqld)
     0 32575 12045   0  76  0  1332  740 Giant  L+    p3    0:00.00 grep mysql



>----- Original Message -----
>From: "Ganbold" <ganbold_at_micom.mng.net>
>To: <mysql_at_lists.mysql.com>
>Sent: Sunday, April 11, 2004 11:35 PM
>Subject: FBSD 5.2.CURRENT-p4 and mysqld problems
>
>
> > Hi,
> >
> > I found some problematic queries which locks mysql server any
>further
> > processing. I'm using mysql-4.0.18
> > from FreeBSD ports collection. I'm using FreeBSD-5.2-CURRENT.
> > Some of the queries like :
> >
> > select sum(total_amount) pbc from customers.payment where
>contract_id>=0
> > and contract_id<=4999 and
>from_unixtime(time_stamp,'%Y-%m')='2004-03';
> >
> > Above query is not using any index. It locks mysql server and all
>other
> > following queries go into queue until I kill locked mysqld thread.
> > After killing locked mysqld thread everything goes back to function
>normally.
> > I made composite index on contract_id,time_stamp,total_amount and
>let's see
> > what will happen.
> >
> > In the meantime, can somebody explain me what the problem is?
> > I already sent email to freebsd-current and freebsd-ports mailing
>list
> > about this problem.
> > Does somebody have this problem before? Let me know if there is any
>solution.
> > I know I should use FreeBSD 4.9 stable since it doesn't have any
>problem
> > with this issue. However I need to solve
> > this problem on FreeBSD-5.2-CURRENT.
> >
> > thanks in advance,
> >
> > Ganbold
> >
> >
> >
> > At 01:36 PM 05.04.2004, you wrote:
> > >yes,  we resolved the issue it looks like.  I wanted
> > >to hold off on announcing this until i was sure, but
> > >it's been running for about 24 hours now without a
> > >lockup.
> > >
> > >options MAXDSIZ="(1024*1024*1024)" # change max from
> > >512M to 1G
> > >add that to your kernel config.
> > >
> > >basically whats happening ( and i'm probably wrong
> > >with the technicals on this, but bear with me :D ) is
> > >that mysqld would aquire a lock (on a table more then
> > >likely) and then try to allocate above the FBSD 5.2
> > >max default memory size (512M).  when it did this, it
> > >crashed and left the lock on the table.  from there it
> > >just froze.
> > >
> > >making that change seems to have fixed it.  it was
> > >never evident before because we had a slower webserver
> > >that wasn't loading the sql server all that much.  now
> > >it is, and we started developing problems. :).
> > >
> > >as you can tell with this link:
> > >http://sql.tribalwar.com/before-ps.txt the mysqld proc
> > >wen't above 512M.
> > >
> > >check your sql.err log, if you're getting a malloc
> > >error, then this is more then likely your problem.
> > >
> > >Thanks to everyone for the help in tracking this down.
> > >
> > >Let me know if this fixes the problem for you.  Also,
> > >are you running -CURRENT in production? or testing?  I
> > >had given some thought to upgrading the box to
> > >-CURRENT after the commit for the network stack
> > >settled.  But I first need to get serial console
> > >enabled on it :).
> > >
> > >Daryl
> > >
> > >--- Ganbold <ganbold_at_micom.mng.net> wrote:
> > > > Hi Daryl,
> > > >
> > > > I have exactly same problem as you. I have FreeBSD
> > > > 5.2-CURRENT (did cvsup
> > > > on March 23) with mysql-4.0.18 from ports
> > > > collection.
> > > > It is compiled with linuxthreads. Mysql is working
> > > > fine, except it
> > > > sometimes freezes, sometimes one or two tables get
> > > > corrupted.
> > > > Usually freezes once per day. Just freezes and I had
> > > > to kill mysql process
> > > > and start.
> > > >
> > > > Did you solve your problem? I also downloaded latest
> > > > snapshot from
> > > > mysql.com web site and installed, but nothing
> > > > changes.
> > > > Let me know if you find something.
> > > >
> > > > TIA,
> > > >
> > > > Ganbold
> > > >
> > > >
> > > > At 12:48 PM 03.04.2004, you wrote:
> > > > >ah! thanks :)
> > > > >
> > > > >It just happened again and I was able to collect
> > > > the
> > > > >data again.
> > > > >
> > > > >Before I did mysqladmin shutdown/killall -11 mysqld
> > > > >http://sql.tribalwar.com/before-ps.txt
> > > > >
> > > > >a few times the 3 giant processes would show as -,
> > > > but
> > > > >then it got to the point where they weren't
> > > > changing
> > > > >one bit (staying at Giant)
> > > > >
> > > > >after I did mysqladmin shutdown/killall -11 mysqld
> > > > >http://sql.tribalwar.com/after-ps.txt
> > > > >
> > > > >this eventually cleared up after a minute or so.
> > > > >
> > > > >I've given some serious thought to upgrading to
> > > > >-CURRENT, but with the network stack commit comming
> > > > >soon (or happening now), I don't want to get caught
> > > > in
> > > > >the middle of it and have severe problems. Perhaps
> > > > I
> > > > >should use a tag for the time right around the
> > > > sysctl
> > > > >change for the mp stack?
> > > > >
> > > > >The guy running the site just wants me to (have
> > > > >someone at the colo) format the machine and go back
> > > > to
> > > > >4.9.
> > > > >
> > > > >Oh, I now have witness in the kernel...no deadlocks
> > > > >from what I've seen.
> > > > >
> > > > >HTH,
> > > > >Daryl
> > > > >
> > > > >--- Doug White <dwhite_at_gumbysoft.com> wrote:
> > > > > > On Fri, 2 Apr 2004, Daryl Chance wrote:
> > > > > >
> > > > > > > Thanks for your response.  It just happened a
> > > > few
> > > > > > > minutes ago and i was able to capture all the
> > > > > > output
> > > > > > > into a text file.
> > > > > > >
> > > > > > > http://sql.tribalwar.com/ps.txt
> > > > > >
> > > > > > oops, sorry, I get those confused. You want 'ps
> > > > > > axlwww'.  WCHAN will be a
> > > > > > string like "select".
> > > > > >
> > > > > > > I have WITNESS* options compiled into the
> > > > kernel
> > > > > > and
> > > > > > > am going to reboot soon.  Do you think it
> > > > would
> > > > > > help
> > > > > > > in debugging this to see if there are any
> > > > > > deadlocks?
> > > > > >
> > > > > > Potentially, although I get 100% cpu loops with
> > > > kde
> > > > > > that won't trigger any
> > > > > > witness or invariants checks.
> > > > > >
> > > > > > --
> > > > > > Doug White                    |  FreeBSD: The
> > > > Power
> > > > > > to Serve
> > > > > > dwhite_at_gumbysoft.com          |  www.FreeBSD.org
> > > > >
> > > > >
> > > > >__________________________________
> > > > >Do you Yahoo!?
> > > > >Yahoo! Small Business $15K Web Design Giveaway
> > > > >http://promotions.yahoo.com/design_giveaway/
> > > > >_______________________________________________
> > > > >freebsd-current_at_freebsd.org mailing list
> > > >
> > > >http://lists.freebsd.org/mailman/listinfo/freebsd-current
> > > > >To unsubscribe, send any mail to
> > > > "freebsd-current-unsubscribe_at_freebsd.org"
> > > >
> > >
> > >
> > >__________________________________
> > >Do you Yahoo!?
> > >Yahoo! Small Business $15K Web Design Giveaway
> > >http://promotions.yahoo.com/design_giveaway/
> >
> >
> > --
> > MySQL General Mailing List
> > For list archives: http://lists.mysql.com/mysql
> > To unsubscribe:
>http://lists.mysql.com/mysql?unsub=kenmysql_at_icarz.com
> >
> >
Received on Mon Apr 12 2004 - 15:42:56 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:37:50 UTC