Re: ZFS command can block the whole ZFS subsystem!

From: O. Hartmann <ohartman_at_zedat.fu-berlin.de> Date: Fri, 3 Jan 2014 20:25:35 +0100 · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:46 UTC

On Fri, 3 Jan 2014 12:16:22 -0600
Dan Nelson <dnelson_at_allantgroup.com> wrote:

> In the last episode (Jan 03), O. Hartmann said:
> > On Fri, 3 Jan 2014 14:38:03 -0000 "Steven Hartland"
> > <killing_at_multiplay.co.uk> wrote:
> > > From: "O. Hartmann" <ohartman_at_zedat.fu-berlin.de>
> > > > For some security reasons, I dumped via "dd" a large file onto
> > > > a 3TB disk.  The systems is 11.0-CURRENT #1 r259667: Fri Dec 20
> > > > 22:43:56 CET 2013 amd64.  Filesystem in question is a single
> > > > ZFS pool.
> > > > 
> > > > Issuing the command
> > > > 
> > > > "rm dumpfile.txt"
> > > > 
> > > > and then hitting Ctrl-Z to bring the rm command into background
> > > > via fg" (I use FreeBSD's csh in that console) locks up the
> > > > entire command and even worse - it seems to wind up the pool in
> > > > question for being exported!
> > >
> > > You can check that gstat -d
> > 
> > command report 100% acticity on the drive. I exported the pool in
> > question in single user mode and now try to import it back while in
> > miltiuser mode.
> 
> Did you happen to have enabled deduplication on the filesystem in
> question? That's the only thing I can think of that would make file
> deletions run slow.  I have deleted files up to 10GB on regular
> filesystems with no noticable delay at the commandline.  If you have
> deduplication enabled, however, each block's hash has to be looked up
> in the dedupe table, and if you don't have enough RAM for it to be
> loaded completely into memory, that will be very very slow :)
> 
> There are varying recommendations on how much RAM you need for a
> given pool size, since the DDT has to hold an entry for each block
> written, and blocksize depends on whether you wrote your files
> sequentially (128K blocks) or randomly (8k or smaller).  Each DDT
> entry takes 320 bytes of RAM, so a full 3TB ZFS pool would need at
> minimum 320*(3TB/128K) ~= 7GB of RAM to hold the DDT, and much more
> than that if your averge blocksize is less than 128K.
> 
> So, if your system has less than 8GB of RAM in it, there's no way the
> DDT will be able to stay in memory, so you're probably going to have
> to do at least one disk seek (probably more, since you're writing to
> the DDT as well) per block in the file you're deleting.  You should
> probably have 16GB or more RAM, and use an SSD as a L2ARC device as
> well.
> 
Thanks for the explanation.

The box in question has 32GB RAM. 

I wrote a single file, 2,72 GB in size, to the pool, which I tried to
"remove via rm" then.

DEDUp seems to be off according to this information:

[~] zfs get all BACKUP00
NAME      PROPERTY              VALUE                 SOURCE
BACKUP00  type                  filesystem            -
BACKUP00  creation              Fr Dez 20 23:14 2013  -
BACKUP00  used                  2.53T                 -
BACKUP00  available             147G                  -
BACKUP00  referenced            144K                  -
BACKUP00  compressratio         1.00x                 -
BACKUP00  mounted               yes                   -
BACKUP00  quota                 none                  default
BACKUP00  reservation           none                  default
BACKUP00  recordsize            128K                  default
BACKUP00  mountpoint            /BACKUP00             default
BACKUP00  sharenfs              off                   default
BACKUP00  checksum              on                    default
BACKUP00  compression           off                   default
BACKUP00  atime                 on                    default
BACKUP00  devices               on                    default
BACKUP00  exec                  on                    default
BACKUP00  setuid                on                    default
BACKUP00  readonly              off                   default
BACKUP00  jailed                off                   default
BACKUP00  snapdir               hidden                default
BACKUP00  aclmode               discard               default
BACKUP00  aclinherit            restricted            default
BACKUP00  canmount              on                    default
BACKUP00  xattr                 off                   temporary
BACKUP00  copies                1                     default
BACKUP00  version               5                     -
BACKUP00  utf8only              off                   -
BACKUP00  normalization         none                  -
BACKUP00  casesensitivity       sensitive             -
BACKUP00  vscan                 off                   default
BACKUP00  nbmand                off                   default
BACKUP00  sharesmb              off                   default
BACKUP00  refquota              none                  default
BACKUP00  refreservation        none                  default
BACKUP00  primarycache          all                   default
BACKUP00  secondarycache        all                   default
BACKUP00  usedbysnapshots       0                     -
BACKUP00  usedbydataset         144K                  -
BACKUP00  usedbychildren        2.53T                 -
BACKUP00  usedbyrefreservation  0                     -
BACKUP00  logbias               latency               default
BACKUP00  dedup                 off                   default
BACKUP00  mlslabel                                    -
BACKUP00  sync                  standard              default
BACKUP00  refcompressratio      1.00x                 -
BACKUP00  written               144K                  -
BACKUP00  logicalused           2.52T                 -
BACKUP00  logicalreferenced     43.5K                 -

Funny, the disk is supposed to be "empty" ... but is marked as used by
2.5 TB ...