Tracking down em problem

From: Sven Willenberger <sven_at_dmv.com>
Date: Wed, 02 Nov 2005 10:36:33 -0500
FreeBSD6.0-RC1 (Wed Oct 26 13:31:21 EDT 2005)

I seem to have an issue with losing connections to an em interface
during process of heavy IO load. There are several variables here so I
am hoping for some guidelines to help troubleshoot this.

I have a postgresql server (8.0.4) set up on an i386 system. The data
directory is on its own partition (which is actually a gstripe/gmirror
setup -- see the footnote after my problem description).

I have enabled a replication system from another server. When I started
relication there was a large amount of data that had to be fed to this
server via the em0 interface. During this process, while ssh'ed to the
box, my connection would just hang for a few moments, then it would
recover. However, if I cd to the data directory (stripe/mirror) and
start ls -alrt several times, the connection actually gets broken; not
only my ssh connection but the replication connection from the master
server is broken.

I have tried to set debug.mpsafenet=0 in /boot/loader.conf to no avail
-- the same issue happens. Preemption is enabled in the kernel, as is
sched_4bsd. I don't really know how to proceed at this point to try and
troubleshoot this issue: as it stands now, it is most definitely a show
stopper for the purposes of this server.

Thanks,

Sven

*footnote: here is the gstripe/gmirror config:

a) the mirrors:
Geom name: pg1
State: COMPLETE
Components: 2
Balance: split
Slice: 8192
Flags: NONE
GenID: 0
SyncID: 1
ID: 1606567834
Providers:
1. Name: mirror/pg1
   Mediasize: 36703949312 (34G)
   Sectorsize: 512
   Mode: r1w1e2
Consumers:
1. Name: da1
   Mediasize: 36703949824 (34G)
   Sectorsize: 512
   Mode: r1w1e1
   State: ACTIVE
   Priority: 0
   Flags: DIRTY
   GenID: 0
   SyncID: 1
   ID: 2976581887
2. Name: da2
   Mediasize: 36703949824 (34G)
   Sectorsize: 512
   Mode: r1w1e1
   State: ACTIVE
   Priority: 1
   Flags: DIRTY
   GenID: 0
   SyncID: 1
   ID: 3738898587

Geom name: pg2
State: COMPLETE
Components: 2
Balance: split
Slice: 8192
Flags: NONE
GenID: 0
SyncID: 1
ID: 2419201320
Providers:
1. Name: mirror/pg2
   Mediasize: 36703949312 (34G)
   Sectorsize: 512
   Mode: r1w1e2
Consumers:
1. Name: da3
   Mediasize: 36703949824 (34G)
   Sectorsize: 512
   Mode: r1w1e1
   State: ACTIVE
   Priority: 0
   Flags: DIRTY
   GenID: 0
   SyncID: 1
   ID: 4053765902
2. Name: da4
   Mediasize: 36703949824 (34G)
   Sectorsize: 512
   Mode: r1w1e1
   State: ACTIVE
   Priority: 1
   Flags: DIRTY
   GenID: 0
   SyncID: 1
   ID: 2784554060

b) the stripes (using the mirrors):
Geom name: pgdata
State: UP
Status: Total=2, Online=2
Type: AUTOMATIC
Stripesize: 65536
ID: 2329725949
Providers:
1. Name: stripe/pgdata
   Mediasize: 73407791104 (68G)
   Sectorsize: 512
   Mode: r1w1e1
Consumers:
1. Name: mirror/pg1
   Mediasize: 36703949312 (34G)
   Sectorsize: 512
   Mode: r1w1e2
   Number: 0
2. Name: mirror/pg2
   Mediasize: 36703949312 (34G)
   Sectorsize: 512
   Mode: r1w1e2
   Number: 1

This is then mounted as: 
/dev/stripe/pgdata      /usr/local/pgsql        ufs     rw,noatime
2       2
Received on Wed Nov 02 2005 - 14:36:35 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:38:47 UTC