Re: 7.0-Beta 3: zfs makes system reboot

From: Johan Ström <johan_at_stromnet.se>
Date: Sun, 2 Dec 2007 13:33:09 +0100
On Nov 30, 2007, at 17:27 , Michael Rebele wrote:

> Hello,
>
> i'm testing the zfs since 7.0-Beta 1.
> First, i had only access to an 32 Bit Machine (P4/3GHz with 2GB  
> RAM, 2xHD for RAID1 and 2xHD for ZFS Raid 0).
>
> While running iozone with the following call:
> iozone -R -a -z -b file.wks -g 4G -f testile
>
> (This is inspired by Dominic Kay from Sun, see http://blogs.sun.com/ 
> dom/entry/zfs_v_vxfs_iozone for details).
>
> the well known "kmem_malloc" error occured and stopped the system.
> (panic: kmem_malloc (131072): kmem_map too small: 398491648 total  
> allocated cpuid=1)
>
> I tested several optimizations as suggested in the ZFS Tuning Guide  
> and several postings on this list.
> The problem stayed mainly the same, it stopped with a "kmem_malloc"  
> or rebooted without warning. This depends on the configuration, if  
> i raised the vm.kmem_-sizes or only the KVA_PAGES or both.
> But it never ever made the benchmark. With more memory in  
> vm.kmem_size and vm.kmem_size_max, the problem came later.
>
>
>
> But ok, the main target for the ZFS is to use amd64, not i386. Now  
> i have access to an Intel Woodcrest-System, it's a Xeon 5160 with  
> 4GB RAM and 1xHD. It has UFS for the System and Home and one ZFS  
> only for data (for the iozone-Benchmark).
> It has a vanilla kernel, i haven't touched it. I've tested the  
> default settings from Beta 3 and applied the tuning tips from the  
> Tuning Guide.
> It shows the same behaviour as on the 32 Bit machine. One major  
> difference: it makes always a reboot. There's no kmem_malloc error  
> message (which made the system hang).
>
> The problem is the "-z" option in the iozone-Benchmark. Without it,  
> the benchmark works (on the i386 and on the amd64-Machine). This  
> option makes iozone testing small record sizes for large files. On  
> an UFS-Filesystem, iozone works with the "-z" option. Though, it  
> seems to me, that this is a problem with ZFS.
>
> Here are some more informations (from the amd64-System):
>
> 1. The captured iozone output
>
> [root_at_zfs /tank/iozone]# iozone -R -a -z -b filez-512M.wks -g 4G -f  
> testile
> ...


For the record, I can reproduce the same thing on amd64 FreeBSD  
RELENG_7 (installed from beta3 2 days ago) from 2 days ago. Its a c2d  
box with 2Gb of memory and two satadrives in zpool mirror. No special  
tweaking whatsoever yet..
The panic was Page fault, supervisor read instruction page not  
present.. so not the (apparently) regular kmem_malloc? So I doubt the  
other patch that was linked to by Alexandre would help?

iozone got to
         Run began: Sun Dec  2 13:11:53 2007

         Excel chart generation enabled
         Auto Mode
         Cross over of record size disabled.
         Using maximum file size of 4194304 kilobytes.
         Command line used: iozone -R -a -z -b file.wks -g 4G -f testile
         Output is in Kbytes/sec
         Time Resolution = 0.000001 seconds.
         Processor cache size set to 1024 Kbytes.
         Processor cache line size set to 32 bytes.
         File stride size set to 17 * record size.
                                                             random   
random    bkwd  record  stride
               KB  reclen   write rewrite    read    reread    read    
write    read rewrite    read   fwrite frewrite   fread  freread
               64       4  122584  489126   969761  1210227 1033216   
503814  769584  516414  877797   291206   460591  703068   735831
               64       8  204474  735831  1452528  1518251 1279447   
799377 1255511  752329 1460430   372410   727850 1087638  1279447
......
           131072       4   65734   71698  1011780   970967   
755928    5479 1008858  494172  931232    65869    68155  906746    
910950
           131072       8   79507   74422  1699148  1710185 1350184    
10907 1612344  929991 1372725    34699    74782 1407638  1429434
           131072      16   82479   74279  2411000  2426173 2095714    
25327 2299061 1608974 2038950    71102    69200 1887231  1893067
           131072      32   75268   73077  3276650  3326454 2954789    
70573 3195793 2697621 2987611
then it died

No cores dumped however.. Altough I'm running on a gmirror for swap,  
if I recall correct at least 6.x couldnt dump to a gmirror, I guess  
7.x cant either then.. Altought the dump message DID say it dumped  
memory (and it did say Dump complete), savecore didnt find any dumps  
at boot..

The box didnt do anything else during this test, and is not running  
any apps yet. Havent encounterd the problem before, but then again  
I've only been playing with it for 2 days without any real hard test  
(just scp'ed about 50 gigs of data to it, but thats it)

--
Johan Ström
Stromnet
johan_at_stromnet.se
http://www.stromnet.se/
Received on Sun Dec 02 2007 - 11:33:39 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:23 UTC