02-22-2012, 12:40 PM
LCD Wrote:Works and is very very fast!!!
Simple but great!
It's a classic case of win by unrolled loop, though. As it stands, the main part of this function (not including the setup for being called, or the returning part at the end, which ZX Basic does for us) takes:
15 (intro) + (7*25) loop + 20 (last loop that doesn't jump back) = 210 T states.
An unrolled loop version would take
15 (intro) + 8*12 = 111 T states. Or 89% faster.... Those 13 T state DJNZ jumps are quite a big proportion.
If I was using this function, considering that I'm using a mirror function to replace storing the graphics in both left and right configuration, I'd want it to be fast, and I'd probably be willing to give up 12 bytes to do this - the loop version is 8 bytes. The unrolled loop is 20. [Edit - just realised that XOR A probably doesn't matter, since the rotates slide A off anyway. So 7 bytes versus 19 bytes, and 206 T states vs 107]
Code:
function fastcall mirror (number as uByte) as uByte
asm
; ld b,8 ; We don't need this either. We're not looping.
ld c,a
; XOR A ; Edit - I bet it works without this. Saving 4 T states
RR C
RLA
RR C
RLA
RR C
RLA
RR C
RLA
RR C
RLA
RR C
RLA
RR C
RLA
RR C
RLA
end asm
END FUNCTION