Branch timing

Branches are one of the instruction types that do not always take one cycle. The only type of branch that does take one cycle is a conditional branch that is not taken. Branches that are taken take two cycles and branching to an unaligned wide instruction takes three cycles. Examples of starting a loop on an unaligned instruction can be seen in the examples below. The table shows the performance impact, while both functions jump twice, and a CPI count of two is expected, sample a uses 3 additional CPI cycles due to starting the loop with an unaligned wide instruction.

Examples of unaligned branching to wide and narrow instructions.

sample_a.s
power:
    mov r2, r0
    mov r0, #1     
    loop:
        mul r0, r2 //02fb00f0
        sub r1, #1
        cmp r1, #0
        bne loop
    bx lr

Unaligned loop starting with a wide (32-bit) sinstruction $r0 = r0^{(r1)}$.

sample_b.s
multiply:
    mov r2, r0
    mov r0, #1     
    loop:
        add r0, r2 //1044
        sub r1, #1
        cmp r1, #0
        bne loop
    bx lr

Unaligned loop starting with a narrow (16-bit) instruction $r0 = r0*r1$.

Performance comparison of unaligned loops. Note that 2 and 3 were passed as arguments.

Examplesample asample b
Instructions executed1414
LSU count00
CPI count52
Fold count(-) 0(-) 0
Cycle count1916