02-13-2010, 01:16 AM
I've got a version that does LONG as well as uInteger later in this thread.
I note that the compiler uses the Spectrum ROM square root routine. This routine is hideously slow. It actually calculates x^(0.5) instead, and takes ages about it. The Newton-Raphson method would be a lot faster, and pretty easy to put in.
If you are willing to sacrifice accuracy, an integer square root would be faster still. For a lot of situations, an integer root would be just fine - for example, if I had to calculate the nearest of two objects on screen, I'm going to have to use pythagoras' theorum to calculate distances. [ A^2 = B^2 + C^2 ] that needs square roots to make a distance. But probably the nearest whole pixel would be a perfectly good enough result!
So, here are two functions, and some code to demonstrate them. One is a perfect replacement for sqr.asm in the library, actually - it's full floating point compatible, 100% accurate, and about 3-6 times faster. It actually uses the FP-Calculator in the rom. It just doesn't use the SQR command. [Note: It comes back with something instead of an error in case of a negative square root request. Boriel - you might want to change that behavor. Not sure.] The integer version... well - look for yourself! I reckon it's about 40 times faster than the fast version.
Also: Should be able to do a something similar for a 32 bit LONG integer.
Copy and compile this program. I hope you like it:
I note that the compiler uses the Spectrum ROM square root routine. This routine is hideously slow. It actually calculates x^(0.5) instead, and takes ages about it. The Newton-Raphson method would be a lot faster, and pretty easy to put in.
If you are willing to sacrifice accuracy, an integer square root would be faster still. For a lot of situations, an integer root would be just fine - for example, if I had to calculate the nearest of two objects on screen, I'm going to have to use pythagoras' theorum to calculate distances. [ A^2 = B^2 + C^2 ] that needs square roots to make a distance. But probably the nearest whole pixel would be a perfectly good enough result!
So, here are two functions, and some code to demonstrate them. One is a perfect replacement for sqr.asm in the library, actually - it's full floating point compatible, 100% accurate, and about 3-6 times faster. It actually uses the FP-Calculator in the rom. It just doesn't use the SQR command. [Note: It comes back with something instead of an error in case of a negative square root request. Boriel - you might want to change that behavor. Not sure.] The integer version... well - look for yourself! I reckon it's about 40 times faster than the fast version.
Also: Should be able to do a something similar for a 32 bit LONG integer.
Copy and compile this program. I hope you like it:
Code:
FUNCTION FASTCALL SQRT (radicand as FLOAT) as FLOAT
ASM
; FLOATS arrive in A ED CB
;A is the exponent.
AND A ; Test for zero argument
RET Z ; Return with zero.
;Strictly we should test the number for being negative and quit if it is.
;But let's assume we like imaginary numbers, hmm?
; If you'd rather break it change to a jump to an error below.
;BIT 7,(HL) ; Test the bit.
;JR NZ,REPORT ; back to REPORT_A
; 'Invalid argument'
RES 7,E ; Now it's a positive number, no matter what.
call __FPSTACK_PUSH ; Okay, We put it on the calc stack. Stack contains ABS(x)
; Halve the exponent to achieve a good guess.(accurate with .25 16 64 etc.)
; Remember, A is the exponent.
XOR $80 ; toggle sign of exponent
SRA A ; shift right, bit 7 unchanged.
INC A ;
JR Z,ASIS ; forward with say .25 -> .5
JP P,ASIS ; leave increment if value > .5
DEC A ; restore to shift only.
ASIS: XOR $80 ; restore sign.
call __FPSTACK_PUSH ; Okay, NOW we put the guess on the stack
rst 28h ; ROM CALC ;;guess,x
DEFB $C3 ;;st-mem-3 x,guess
DEFB $02 ;;delete x
SLOOP: DEFB $31 ;;duplicate x,x.
DEFB $E3 ;;get-mem-3 x,x,guess
DEFB $C4 ;;st-mem-4 x,x,guess
DEFB $05 ;;div x,x/guess.
DEFB $E3 ;;get-mem-3 x,x/guess,guess
DEFB $0F ;;addition x,x/guess+guess
DEFB $A2 ;;stk-half x,x/guess+guess,.5
DEFB $04 ;;multiply x,(x/guess+guess)*.5
DEFB $C3 ;;st-mem-3 x,newguess
DEFB $E4 ;;get-mem-4 x,newguess,oldguess
DEFB $03 ;;subtract x,newguess-oldguess
DEFB $2A ;;abs x,difference.
DEFB $37 ;;greater-0 x,(0/1).
DEFB $00 ;;jump-true x.
DEFB SLOOP - $ ;;to sloop x.
DEFB $02 ;;delete .
DEFB $E3 ;;get-mem-3 retrieve final guess.
DEFB $38 ;;end-calc sqr x.
jp __FPSTACK_POP
END ASM
END FUNCTION
FUNCTION FASTCALL SQRT16(radicand as uInteger) as uByte
asm
XOR A
AND A
ld a,l
ld l,h
ld de,0040h ; 40h appends "01" to D
ld h,d
ld b,7
sqrt16loop:
sbc hl,de ; IF speed is critical, and you don't mind spending the extra bytes, you could unroll this loop 7 times instead of DJNZ.
jr nc,$+3
add hl,de
ccf
rl d
rla
adc hl,hl
rla
adc hl,hl
DJNZ sqrt16loop
sbc hl,de ; optimised last iteration
ccf
rl d
ld a,d
end asm
END FUNCTION
FUNCTION t AS ULONG
RETURN INT((65536 * PEEK (23674) + 256 * PEEK(23673) + PEEK (23672)))
END FUNCTION
CLS
DIM a,b as float
DIM i as uInteger
DIM time as long
PRINT "ROM","FAST"
REM show it's as accurate
for i=1 to 15
LET a=rnd * 32768
PRINT SQR(a),SQRT(a)
next i
PRINT
PRINT "Over 500 Cycles:"
PRINT
REM ROM version 500 times.
LET time=t()
for i=1 to 500
b=SQR(a)
next i
PRINT "Rom routine: ";t()-time;" Frames."
REM MY version 500 times.
LET time=t()
for i=1 to 500
b=SQRT(a)
next i
PRINT "Fast routine: ";t()-time;" Frames."
PRINT
PRINT "PRESS A KEY"
PAUSE 1: PAUSE 0
CLS
PRINT "NUM FAST INTEGER"
REM show it's as accurate
for i=1 to 15
LET a=INT(rnd * 32768)
PRINT a;TAB 6;SQRT(a);TAB 16;SQRT16(a)
next i
PRINT
PRINT "Over 500 Cycles:"
PRINT
REM MY version 500 times.
LET time=t()
for i=1 to 500
b=SQRT(a)
next i
PRINT "Fast routine: ";t()-time;" Frames."
REM MY Integer version 500 times.
LET time=t()
for i=1 to 500
b=SQRT16(a)
next i
PRINT "Integer routine: ";t()-time;" Frames."