CPU_EMPTY() optimalization for _NCPUWORDS == 1 on clang

From: Svatopluk Kraus <skra_at_freebsd.org>
Date: Thu, 17 Dec 2015 12:22:28 +0100
Hi,

I got weird disassembled code for CPU_EMPTY() on arm where cpu bit
array is just one int. The C code was compiled by clang with -O2 flag.
I expanded the macro in dummy1() to show it more clearly. The dummy2()
shows expected disassembled code.

void
dummy1(uint32_t* bits)
{
    size_t i;

    for (i = 0; i < 1; i++)
        if (bits[i])
            break;
    if (i == 1)
        tlb_flush(0);
}

void
dummy2(uint32_t* bits)
{
    size_t i;

    for (i = 0; i < 1; i++)
        if (bits[i])
            return;
    tlb_flush(0);
}

The dis-assembled code is the following:

c0556b50 <dummy1>:
c0556b50:   e3a01000    mov r1, #0
c0556b54:   e3510000    cmp r1, #0
c0556b58:   1a000004    bne c0556b70 <dummy1+0x20>
c0556b5c:   e5902000    ldr r2, [r0]
c0556b60:   e3a01001    mov r1, #1
c0556b64:   e3520000    cmp r2, #0
c0556b68:   0afffff9    beq c0556b54 <dummy1+0x4>
c0556b6c:   ea000005    b   c0556b88 <dummy1+0x38>
c0556b70:   e3510001    cmp r1, #1
c0556b74:   112fff1e    bxne    lr
c0556b78:   e3a00000    mov r0, #0
c0556b7c:   f57ff04f    dsb sy
c0556b80:   ee080f73    mcr 15, 0, r0, cr8, cr3, {3}
c0556b84:   f57ff04f    dsb sy
c0556b88:   e12fff1e    bx  lr

c0556b8c <dummy2>:
c0556b8c:   e5900000    ldr r0, [r0]
c0556b90:   e3500000    cmp r0, #0
c0556b94:   112fff1e    bxne    lr
c0556b98:   e3a00000    mov r0, #0
c0556b9c:   f57ff04f    dsb sy
c0556ba0:   ee080f73    mcr 15, 0, r0, cr8, cr3, {3}
c0556ba4:   f57ff04f    dsb sy
c0556ba8:   e12fff1e    bx  lr


I tried another thing:

void
dummy3(uint32_t* bits)
{
    size_t i;

    for (i = 0; i < 4; i++)
        if (bits[i])
            break;
    if (i == 1)
        tlb_flush(0);
}

and got

c0556bac <dummy3>:
c0556bac:   e5901000    ldr r1, [r0]
c0556bb0:   e3510000    cmp r1, #0
c0556bb4:   1a000006    bne c0556bd4 <dummy3+0x28>
c0556bb8:   e5900004    ldr r0, [r0, #4]
c0556bbc:   e3500000    cmp r0, #0
c0556bc0:   012fff1e    bxeq    lr
c0556bc4:   e3a00000    mov r0, #0
c0556bc8:   f57ff04f    dsb sy
c0556bcc:   ee080f73    mcr 15, 0, r0, cr8, cr3, {3}
c0556bd0:   f57ff04f    dsb sy
c0556bd4:   e12fff1e    bx  lr

This looks well. So, IMO, there is some small bug in clang optimalization.

Svatopluk Kraus
Received on Thu Dec 17 2015 - 10:27:46 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:01 UTC