Login

britlion · 03-06-2010, 06:44 PM

boriel Wrote:Even more optimized:

The fspUpdateAsm routine can be optimized to this (please, click on the "Expand View" button on the right side of this message ==>)

Code:
SUB fastcall fspUpdate : REM This name will be used from BASIC asm fspUpdateAsm: ; This name will be used from ASM update_coordinates: ld hl, datap+4 ; Points to data address ld d, h ld e, l ; idem ld b, 4 ; 4 iterations i4chars4: ;; For each Sprite: ;; *(datap + 6) = *(datap + 4) ;; *(datap + 7) = *(datap + 5) inc de inc de ldi inc bc ; Restores BC ldi inc bc ; Restores BC ;; hl = hl + 40 ld hl, 44 add hl, de ld d, h ; DE = HL ld e, l djnz i4chars4 END ASM END SUB : REM There is an implicit RET here

Aha, another place where you find LD <reg>, Number followed by several INC statements. Actually, I didn't look too closely at update, it's true - the speed critical bit is erase-bufferBackground-redraw. Update happens after that, so I was a little lazy at finding ways to clean that part. If you look in the commented out parts, you'll see several places I did the same thing. Nice use of LDI there, though. It's quite a bit faster than the method the original tends to use.

I think the fastest, if you're willing to use 32 bytes in place of 20 would be:

Code:
; For each Sprite:

         ;; *(datap + 6) = *(datap + 4)

         ;; *(datap + 7) = *(datap + 5)

         ld     hl, datap+4     ; Points to sprite 1

         ld     de, datap+6                 

         ldi

         ldi

         ld     hl, datap+4+48     ; Points to sprite 2

         ld     de, datap+6+48                       

         ldi

         ldi

         ld     hl, datap+4+48+48     ; Points to sprite 3

         ld     de, datap+6+48+48                    

         ldi

         ldi

         ld     hl, datap+4+48+48+48     ; Points to sprite 4

         ld     de, datap+6+48+48+48                        

         ldi

         ldi

Though since this isn't quite as speed critical, we might think about size. It's worth remembering that unrolled loops and brute force is sometimes quite a bit faster.

This runs in 256 T states. The loop version runs in 535 T states - which is quite a big saving over such a small routine!

Login
Username:
Password:	Lost Password?
	Remember me