Re: Sub-optimal libc's read-ahead buffering behaviour

From: Peter Jeremy <PeterJeremy_at_optushome.com.au>
Date: Thu, 4 Aug 2005 20:42:36 +1000
On Thu, 2005-Aug-04 12:25:28 +0400, Andrey Chernov wrote:
>On Thu, Aug 04, 2005 at 05:57:11PM +1000, Peter Jeremy wrote:
>> >In case SEEK_CUR still uses 
>> >the buffer, it probably should not for character device.
>
>As I look at the fseek code now, _any_ non-regular file seek is not 
>optimized, which is right things

Maybe - see below.

>> I can't see any reason for the current stdio behaviour:
>> - If you're accessing a device with "magic" behaviour then it's not safe
>>   to read(2) 4KB (or whatever) when userland asks to fread(3) 512 bytes.
>
>It is safe to read more. You may hit EOF, but it handles by stdio 
>internally. It is not safe to read again from the buffer. In that case 
>fseek to needed position helps to re-read.

Consider /dev/mem (since that is a favourite in this thread).  You are
unlikely to hit EOF but reading more than required is likely to cause
unwanted I/O errors or unexpected device behaviour by accidently reading
"magic" device addresses.

That said, most other devices will either reject seeks (eg tapes)
or will correctly (if inefficiently) handle reading too much.  And
anyone who uses stdio to read /dev/mem probably deserves the hole
in their foot.

I can see two reasonable interpretations of stdio on devices:
1) The process issues a setbuf(3) family call to define the buffer size
   that it wants to use for physical reads/writes.  The process then uses
   stdio calls to read/write arbitrary sized data which is re-blocked by
   stdio to suit the device.
2) stdio should be transparent - fread/fwrite/fseek are expected to
   map directly onto read/write/lseek.

The current implementation falls somewhere in between:  read and write
are buffered but seeks are transparent.  This would seem to be the worst
of both worlds - the user has to ensure that seeks are multiples of
the device block size and that any writes wind up on a block boundary
before a seek.

In both cases above, seek really needs to be intelligent - more so
than for regular files.  It needs to lseek() in multiples of the
device block size and then adjust the buffer offset to handle any
remainder.

-- 
Peter Jeremy
Received on Thu Aug 04 2005 - 08:42:49 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:38:40 UTC