Re: Heavy I/O blocks FreeBSD box for several seconds

From: Hartmann, O. <ohartman_at_zedat.fu-berlin.de> Date: Tue, 12 Jul 2011 20:44:19 +0200 · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:15 UTC

On 07/12/11 20:10, Matt wrote:
>
>> Sic... If you allow me the comparison, FreeBSD development is as open
>> as are the US (and, to some extend, most western country) borders
>> nowadays open to aliens, and believe me, this is not a compliment.
>>
>>   - Arnaud
>>

I like the comment, although I disagree. In some cases, 'too open' is 
worse. Look at Linux. There are too open ditributions,
and the rate of systemmalfunctions are relative high compared to *BSDs. 
This is my experience over the last few years,
especially with RedHat ...
> This is getting offtopic fast. Can we just EGODWIN here? It doesn't 
> fulfill the entire requirements, but it's getting close...
>
> Is it possible CPU is the wrong cause of the blocking? Is there 
> perhaps memory contention between ZFS/UFS in OP's setup? Could 
> filesystem/disk performance be the cause and not obscure 
> technicalities of ULE scheduler? Dodgy hardware causing interrupt 
> storms? Ethernet not in polling mode?
>
> Matt
Doggy hardware: Dell PowerEdge 1950 with 16 GB RAM, two 4-core XEONs at 
2.5 GHz, two Broadcom GBit NICs, SAS controller (mpt) and two SATA 
drives. Another box: Selfmade box based upon LGA775 with 8GB RAM, 
Realtek NIC with polling enabled (but do not know wheter it is used or 
not), Q6600 CPU, 5 SATA drives connected to ICH10R. Notebook: doggy 
hardware is Dell Latitude E6520. Another doggy hardware is our Dell 
Blade system with 24 GB RAM, two socket LGA1366 and two Westmere 6-core 
Intel XEONs (X56XXsomething). This box has as doggy hardware a SAS 2.0 
controller with a 500 GB SAS 2.0 harddrive and a 2 TB SATA drive. The 
box is now running Linux due to the nVidia TESLA M2050 board, the doggy 
nVidia GPU card can not be used with the server OS FreeBSD. Especially 
on that machine, which runs headless, I experienced those 'locks' - when 
starting compiler, when running some n-body simulation code which is not 
parallelised, so I start an ensemble of 5 or six or 12 instances.

I would like to qunatify the problems if someone would give me some 
advice how to measure. Within this thread I read about top isn't well 
designed doing that, so what is?
Well, just for fun, I compiled 4BSD scheduler in the older 16-core XEON 
box's kernel and tried reproducing the problems. I wasn't able to do so 
in most cases, except when doing massively disk I/O AND network try to 
copy lots of data via network. It seems to bring down even a simple SSH.

I'm confused ...