06-04-2010, 08:13 PM
Okay, today I've managed to get a little time for 8 bit mutiplication. This is the current (new) code:
And this is yours, replacing JR with JP (2 T-states faster, each). I've also commented it a lot, because some instructions has been removed (unneeded).
Note, your routine returns result in L not in A. Perhaps this routine could be rearranged?
Here's the benchmark I've used to time MUL8:
I haven't timed your routine. To do so, edit mul8.asm in library-asm replacing mul8 code with yours.
Also remember you have the old mul8, not the new one. With the new one, this benchmark gives 8.11 segs.
Code:
ld b, 8
ld l, a
xor a
__MUL8LOOP:
add a, a ; a *= 2
sla l
jp nc, __MUL8B
add a, h
__MUL8B:
djnz __MUL8LOOP
ret
And this is yours, replacing JR with JP (2 T-states faster, each). I've also commented it a lot, because some instructions has been removed (unneeded).
Code:
LD E, H ; H is the 2nd factor
LD HL,0
LD D, L ; DE => H
;LD A, (NUM1) ;; Not needed, already done
LOOP:
;; RR A ; (Divide A by 2 - copying the 1's column bit into the carry flag.)
RRA ; 1 byte, 4 T-states; RR A => 2 bytes, 8T-States and are equivalent!
; NOTE: JR is 3 T-States faster than JP when the condition is not met
; In this case, it's most likely numbers will be little ones (containing more 0s than 1s), so JP
JP NC, JP1; (Jump over the add if we have to) ;
ADD HL,DE ; 11 T-states
JP1:
RET Z ; (Leave when we finish - A has gone to zero) ; 5 T-States if condition not met, 10 if met
SLA E ; 8 T-states
; RL D ;; Multiply DE*2 ; Needless in 8 bit, only E is needed!
JP LOOP ; 2 T-States faster
Here's the benchmark I've used to time MUL8:
Code:
DIM a as Ubyte = 8
DIM t as Uinteger AT 23672 ' REM t = Frames
DIM q as UByte
DIM tmp as UInteger
POKE t, 0 : ' Sets the clock to 0 in a single instruction
FOR tmp = 0 to 65534
q = a * 165
NEXT
Print CAST(Fixed, t) / 50
END ' End the program OK instead of an STOP error (STOP is an "error")
PRINT q ' Avoid -O3 variable removal
I haven't timed your routine. To do so, edit mul8.asm in library-asm replacing mul8 code with yours.
Also remember you have the old mul8, not the new one. With the new one, this benchmark gives 8.11 segs.