Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Fourspriter: Alternate Version
#8
boriel Wrote:Even more optimized:

The fspUpdateAsm routine can be optimized to this (please, click on the "Expand View" button on the right side of this message ==>)
Code:
SUB fastcall fspUpdate : REM This name will be used from BASIC
    asm
fspUpdateAsm:          ; This name will be used from ASM
         update_coordinates:
         ld     hl, datap+4     ; Points to data address
         ld     d, h
         ld     e, l            ; idem
         ld     b, 4            ; 4 iterations
i4chars4:
         ;; For each Sprite:
         ;; *(datap + 6) = *(datap + 4)
         ;; *(datap + 7) = *(datap + 5)
         inc    de
         inc    de
         ldi
         inc bc   ; Restores BC
         ldi
         inc bc   ; Restores BC
         ;; hl = hl + 40
         ld     hl, 44
         add    hl, de
         ld      d, h ; DE = HL
         ld      e, l
         djnz    i4chars4
END ASM
END SUB : REM There is an implicit RET here
Aha, another place where you find LD <reg>, Number followed by several INC statements. Actually, I didn't look too closely at update, it's true - the speed critical bit is erase-bufferBackground-redraw. Update happens after that, so I was a little lazy at finding ways to clean that part. If you look in the commented out parts, you'll see several places I did the same thing. Nice use of LDI there, though. It's quite a bit faster than the method the original tends to use.

I think the fastest, if you're willing to use 32 bytes in place of 20 would be:
Code:
; For each Sprite:
         ;; *(datap + 6) = *(datap + 4)
         ;; *(datap + 7) = *(datap + 5)

         ld     hl, datap+4     ; Points to sprite 1
         ld     de, datap+6                
         ldi
         ldi

         ld     hl, datap+4+48     ; Points to sprite 2
         ld     de, datap+6+48                      
         ldi
         ldi

         ld     hl, datap+4+48+48     ; Points to sprite 3
         ld     de, datap+6+48+48                    
         ldi
         ldi

         ld     hl, datap+4+48+48+48     ; Points to sprite 4
         ld     de, datap+6+48+48+48                        
         ldi
         ldi

Though since this isn't quite as speed critical, we might think about size. It's worth remembering that unrolled loops and brute force is sometimes quite a bit faster.

This runs in 256 T states. The loop version runs in 535 T states - which is quite a big saving over such a small routine!
Reply


Messages In This Thread

Forum Jump:


Users browsing this thread: 2 Guest(s)