Re: access to hard drives is "blocked" by writes to a flash drive

From: Ian Lepore <ian_at_FreeBSD.org> Date: Mon, 04 Mar 2013 08:54:47 -0700 · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:35 UTC

On Sun, 2013-03-03 at 20:28 +0000, Steven Hartland wrote:
> ----- Original Message ----- 
> From: "Ian Lepore" <ian_at_FreeBSD.org>
> To: "Poul-Henning Kamp" <phk_at_phk.freebsd.dk>
> Cc: "deeptech71" <deeptech71_at_gmail.com>; <freebsd-current_at_FreeBSD.org>; "Peter Jeremy" <peter_at_rulingia.com>
> Sent: Sunday, March 03, 2013 1:54 PM
> Subject: Re: access to hard drives is "blocked" by writes to a flash drive
> 
> 
> > On Sun, 2013-03-03 at 13:35 +0000, Poul-Henning Kamp wrote:
> >> Content-Type: text/plain; charset=ISO-8859-1
> >> --------
> >> In message <1362317291.1195.216.camel_at_revolution.hippie.lan>, Ian Lepore writes
> >> :
> >> 
> >> >I run into this behavior all the time too, mostly on arm systems that
> >> >have an sd card or usb thumb driver as their main/only drive.
> >> 
> >> This is really a FAQ and I belive I have answered it N times already:
> >> 
> >> There are, broadly speaking, two classes of flash-storage: "Camera-grade"
> >> and "the real thing".
> >> 
> >> "Camera-grade" have a very limited "Flash adaptation layer" which typically
> >> only can hold one flash-block open for writing at a time, and is typically
> >> found in CF and SD cards, USB sticks etc.
> >> 
> >> Some of them gets further upset if the filesystem is not the FAT they
> >> expect, because they implement "M-Systems" (patented) trick with monitoring
> >> block deletes in FAT to simulate a TRIM facility.
> >> 
> >> A number of products exist with such designs, typically a CF-style, is
> >> put behind a SATA-PATA bridge and sold as 2.5" SSD SATA devices.
> >> "Transcend" have done this for instance.
> >> 
> >> If you use this class of devices for anything real, gstat will show
> >> you I/O write-times of several seconds in periodic pile-ups, even
> >> 100 seconds if you are doing something heavy.
> >> 
> >> For various reasons (see: Lemming-syncer) FreeBSD will block all I/O
> >> traffic to other disks too, when these pileups gets too bad.
> > 
> > Hmmm, so the problem has been known and unfixed for 10 years.  That's
> > not encouraging.  One of the messages in the lemming-syncer mail thread
> > might explain why I've been seeing this a lot lately in hobbyist work,
> > but not so much at $work where we use sd cards heavily... we use very
> > short syncer timeouts on SD and CF storage at $work:
> > 
> > kern.metadelay: 3
> > kern.dirdelay: 4
> > kern.filedelay: 5
> > 
> > I might play with similar settings on some of my arm boards here.
> 
> Interesting, are these relevant for all filesystems e.g. ZFS?
> 
>     Regards
>     Steve

I'm not sure, I know almost nothing about zfs.  I do know we used those
tunings for a specific reason and I sure wouldn't recommend them for
general use.  There's a comment block at the top of kern/vfs_subr.c with
some information on those delay values and how they're used that you
might find useful.  I think in general such small numbers on a system
doing lots of IO would be counter-productive.

In our case we arrived at those tunings this way...  We have embedded
systems with CF and SD cards as their only mass storage, and we do a
relatively small amount of writing to the cards (occasional config
changes, low-volume routine logging via syslog, not much else).
Occasionally when newsyslog kicks in there'll be a short burst of IO to
compress and rotate, then back to a trickle again.  Once upon a time we
mounted the filesystem without softupdates and with the sync option.
That was fairly robust against users pulling the plug right after making
a config change, but very very slow during syslog rotation, sometimes to
the point of peturbing our apps (the CF cards run in PIO mode, and a
burst of PIO activity is hard on time-critical apps).

So we switched using softupdates and turned off the sync option.  That
was nicer to our apps, but left a long window during which updates
didn't get flushed to the card.  So we lowered those 3 tuning values to
the lowest numbers supported by the code (as I vaguely remember it, this
was years ago), not to get any sort of change in performance, but to
reduce the window during which data just sat around in memory waiting
for potential further updates before being flushed.  The further updates
would never happen for us, so the long delays had no benefit.

-- Ian