03-06-2010, 06:44 PM
boriel Wrote:Even more optimized:Aha, another place where you find LD <reg>, Number followed by several INC statements. Actually, I didn't look too closely at update, it's true - the speed critical bit is erase-bufferBackground-redraw. Update happens after that, so I was a little lazy at finding ways to clean that part. If you look in the commented out parts, you'll see several places I did the same thing. Nice use of LDI there, though. It's quite a bit faster than the method the original tends to use.
The fspUpdateAsm routine can be optimized to this (please, click on the "Expand View" button on the right side of this message ==>)
Code:SUB fastcall fspUpdate : REM This name will be used from BASIC
asm
fspUpdateAsm: ; This name will be used from ASM
update_coordinates:
ld hl, datap+4 ; Points to data address
ld d, h
ld e, l ; idem
ld b, 4 ; 4 iterations
i4chars4:
;; For each Sprite:
;; *(datap + 6) = *(datap + 4)
;; *(datap + 7) = *(datap + 5)
inc de
inc de
ldi
inc bc ; Restores BC
ldi
inc bc ; Restores BC
;; hl = hl + 40
ld hl, 44
add hl, de
ld d, h ; DE = HL
ld e, l
djnz i4chars4
END ASM
END SUB : REM There is an implicit RET here
I think the fastest, if you're willing to use 32 bytes in place of 20 would be:
Code:
; For each Sprite:
;; *(datap + 6) = *(datap + 4)
;; *(datap + 7) = *(datap + 5)
ld hl, datap+4 ; Points to sprite 1
ld de, datap+6
ldi
ldi
ld hl, datap+4+48 ; Points to sprite 2
ld de, datap+6+48
ldi
ldi
ld hl, datap+4+48+48 ; Points to sprite 3
ld de, datap+6+48+48
ldi
ldi
ld hl, datap+4+48+48+48 ; Points to sprite 4
ld de, datap+6+48+48+48
ldi
ldi
Though since this isn't quite as speed critical, we might think about size. It's worth remembering that unrolled loops and brute force is sometimes quite a bit faster.
This runs in 256 T states. The loop version runs in 535 T states - which is quite a big saving over such a small routine!