Skip to main content
Visitor II
September 25, 2007
Question

prefetch queue, branch cache question

  • September 25, 2007
  • 2 replies
  • 683 views
Posted on September 25, 2007 at 23:39

prefetch queue, branch cache question

    This topic has been closed for replies.

    2 replies

    alandrasAuthor
    Visitor II
    May 17, 2011
    Posted on May 17, 2011 at 09:47

    Hi,

    I was trying to optimize the interlocks of a tight loop and ended up with this basic experiment:

    if PFQBC disabled:

    1: subs r0,#4

    bpl 1b

    gives 12 cycle per iteration, and

    1: nop

    subs r0,#4

    bpl 1b

    gives 13 cycle.

    if PFQBC enabled:

    1: subs r0,#4

    bpl 1b

    gives 5 cycle/iteration

    but surprise,

    1: nop

    subs r0,#4

    bpl 1b

    gives 10 cycles.

    Additional nops increment the cycles by 1, so I guess my cycle counting is Ok.

    Can somebody explain me what is happening?

    thanks a lot,

    Andras

    alandrasAuthor
    Visitor II
    May 17, 2011
    Posted on May 17, 2011 at 09:47

    could be this issue?

    http://www.embeddedrelated.com/groups/lpc2000/show/20486.php

    Quote:

    The STR9 is faster when it can execute at 96MHz in a straight line as the

    flash can feed the CPU on each cycle. However, when a branch is taken you

    need the ARM9 core to refill its pipline, but to do that it needs

    instructions. The *first* instruction is provided by the branch cache but

    the PFQ is empty--it burst fills at the full 96MHz rate but the core is left

    waiting whilst it does so. We did some extensive benchmarking to

    characterize this because, as I said, we were astounded by the reports

    coming back from potential customers of the STR9 which asked us why it was

    so slow on their applications. The information was fed back to ST who

    acknowledged that there is an issue and that it will be fixed in the next

    rev, but all that will happen is that the BTC is increased in size IIRC.

    --

    Paul Curtis, Rowley Associates Ltd

    http://www.rowley.co.uk

    Noticed the *first*?

    This could explain the 5 cycle instead of theoretically 4 cycle, but still don't get-it with the 10 cycle for a nop,dec,branch !!!

    Is this fixed in the FA devices?

    thanks for any input,

    Andras