Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Faster multiply?
#5
Okay, today I've managed to get a little time for 8 bit mutiplication. This is the current (new) code:
Code:
ld b, 8
    ld l, a
    xor a

__MUL8LOOP:
    add a, a ; a *= 2
    sla l
    jp nc, __MUL8B
    add a, h

__MUL8B:
    djnz __MUL8LOOP
    ret

And this is yours, replacing JR with JP (2 T-states faster, each). I've also commented it a lot, because some instructions has been removed (unneeded).
Code:
LD E, H  ; H is the 2nd factor
    LD HL,0
    LD D, L  ; DE => H

    ;LD A, (NUM1) ;; Not needed, already done
LOOP:
    ;; RR A ; (Divide A by 2 - copying the 1's column bit into the carry flag.)
    RRA ; 1 byte, 4 T-states; RR A => 2 bytes, 8T-States and are equivalent!

    ; NOTE: JR is 3 T-States faster than JP when the condition is not met
    ; In this case, it's most likely numbers will be little ones (containing more 0s than 1s), so JP
    JP NC, JP1; (Jump over the add if we have to) ;
    ADD HL,DE ; 11 T-states

JP1:
    RET Z ; (Leave when we finish - A has gone to zero) ; 5 T-States if condition not met, 10 if met
    SLA E  ; 8 T-states
    ; RL D   ;; Multiply DE*2 ; Needless in 8 bit, only E is needed!
    JP LOOP ; 2 T-States faster
Note, your routine returns result in L not in A. Perhaps this routine could be rearranged?

Here's the benchmark I've used to time MUL8:
Code:
DIM a as Ubyte = 8
DIM t as Uinteger AT 23672 ' REM t = Frames
DIM q as UByte
DIM tmp as UInteger

POKE t, 0 : ' Sets the clock to 0 in a single instruction

FOR tmp = 0 to 65534
    q = a * 165
NEXT

Print CAST(Fixed, t) / 50

END ' End the program OK instead of an STOP error (STOP is an "error")
PRINT q ' Avoid -O3 variable removal

I haven't timed your routine. To do so, edit mul8.asm in library-asm replacing mul8 code with yours.
Also remember you have the old mul8, not the new one. With the new one, this benchmark gives 8.11 segs.
Reply


Messages In This Thread

Forum Jump:


Users browsing this thread: 2 Guest(s)