Re: Question about genassym, locore.s and 0-sized arrays(showstopper for an icc compiled kernel)

From: Bruce Evans <bde_at_zeta.org.au> Date: Fri, 5 Sep 2003 18:39:40 +1000 (EST) · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:37:21 UTC

On Thu, 4 Sep 2003, Marcel Moolenaar wrote:

> On Fri, Sep 05, 2003 at 02:59:22AM +0200, Marius Strobl wrote:
> > >
> > > We use the size of the symbol (ie the size of the object identified
> > > by the symbol) to pass around values. This we do by creating arrays.
> > > If we want to export a C constant 'FOOBAR' to assembly and the constant
> > > is defined to be 6, then we create an array for the sign, of which the
> > > size is 1 for negative numbers and 0 otherwise. In this case the array
> > > will be named FOOBARsign and its size is 0. We also create 4 arrays (*w0,
> > > *w1, *w2 and *w3), each with a maximum of 64K and corresponding to the
> > > 4 16-bit words that constitutes a single 64-bit entity.
> > > In this case
> > > 	00000006 C FOOBARw0
> > > 	00000000 C FOOBARw1
> > > 	00000000 C FOOBARw2
> > > 	00000000 C FOOBARw3
> > >
> > > If the compiler creates arrays of size 1 for arrays we define as a
> > > zero-sized array, you get exactly what you've observed.
> >
> > Is this rather complex approach really necessary?
>
> In theory, yes. In practice, maybe not. If I remember correctly,
> the problem we're trying to solve is twofold:

More like fourfold:

-1: A cross-compiler must be used for the first stage.  The old method
    of printf()'ing the results of sizeof(), etc., only works if the
    host machine is the same as the target machine, since sizeof()
    must be evaluated by a compiler for target machine and printf()
    can only be run on host machines.
0:  Compiler output is to unportable to parse easily, so arrange to use
    a small portable subset of it after passing it through some standard
    filters.  nm output was the most portable binutils-related output
    that I could think of.
> 1.  64-bit constants given the limitations of the object format,
>     which included widths of 32-bit and a.out.

After choosing to use nm output, there are some minor problems representing
all relevant numbers using it.  Numbers larger than 2^32 need to be
represented but cannot be represented directly as symbol values or sizes
in nm output on 32-bit machines.  Numbers nearly as large as 2^32 can be
represented as absolute symbols, but there is no way to generate absolute
symbols in semi-portable C AFAIK.  asm() statements might work but would
be very unportable (apart from not being standard C, different ones might
be needed for the aout and elf cases).  The technique of using array sizes
for all numbers works for most sizes, but the compiler might object to
creating arrays almost as large as the address space, and as we have just
found, to creating arrays of size 0.

> 2.  Sign extension or datatype limitations in awk(1)? I'm not
>     sure about this point. Bruce?

Yes: one-true-awk uses "typedef double Awkfloat;", and gawk uses something
similar by default IIRC, so awk can't hanele numbers larger than 2^53
without losing precision.  "typedef long double Awkfloat;", would be no
better because of my restriction of the precision of long doubles on
i386's and (when genassym.sh was written) the incomplete library support
for long doubles, and the nonexisted support for more than 53 bits of
precision on some supported (?) hardware.  Long doubles with 64 bits
of precision wouldn't work for representing 128-bit integers anyway.

> > ... The genassym.sh(8) of NetBSD kind
> > of directly exports the C-constants so it just needs one symbol per
> > constant and doesn't require zero sized arrays. Given that it's from
> > NetBSD their approach also should be very MI.
>
> I wouldn't have a problem using NetBSD's genassym implementation,
> provided we understand completely how it differs from ours and to
> what extend and how it affects us and provided of course we can
> live with whatever it is that's worse of just different from what
> we have now.

The main differences seem to be that it parses the assembler output.
This is less portable and not as easy -- it takes about 5 times as
much code.

If some values are unrepresentable then they need to be represtended
using other values.  E.g., add 1 to avoid 0, or multiply by the alignment
size if some element of the tool chanin instsists on rounding up things
for alignment like a broken aout version used to do.  16-bit values
would need 17 bits to represent after adding 1.

Bruce