02-13-2010, 01:16 AM

I've got a version that does LONG as well as uInteger later in this thread.

I note that the compiler uses the Spectrum ROM square root routine. This routine is hideously slow. It actually calculates x^(0.5) instead, and takes ages about it. The Newton-Raphson method would be a lot faster, and pretty easy to put in.

If you are willing to sacrifice accuracy, an integer square root would be faster still. For a lot of situations, an integer root would be just fine - for example, if I had to calculate the nearest of two objects on screen, I'm going to have to use pythagoras' theorum to calculate distances. [ A^2 = B^2 + C^2 ] that needs square roots to make a distance. But probably the nearest whole pixel would be a perfectly good enough result!

So, here are two functions, and some code to demonstrate them. One is a perfect replacement for sqr.asm in the library, actually - it's full floating point compatible, 100% accurate, and about 3-6 times faster. It actually uses the FP-Calculator in the rom. It just doesn't use the SQR command. [Note: It comes back with something instead of an error in case of a negative square root request. Boriel - you might want to change that behavor. Not sure.] The integer version... well - look for yourself! I reckon it's about 40 times faster than the fast version.

Also: Should be able to do a something similar for a 32 bit LONG integer.

Copy and compile this program. I hope you like it:

I note that the compiler uses the Spectrum ROM square root routine. This routine is hideously slow. It actually calculates x^(0.5) instead, and takes ages about it. The Newton-Raphson method would be a lot faster, and pretty easy to put in.

If you are willing to sacrifice accuracy, an integer square root would be faster still. For a lot of situations, an integer root would be just fine - for example, if I had to calculate the nearest of two objects on screen, I'm going to have to use pythagoras' theorum to calculate distances. [ A^2 = B^2 + C^2 ] that needs square roots to make a distance. But probably the nearest whole pixel would be a perfectly good enough result!

So, here are two functions, and some code to demonstrate them. One is a perfect replacement for sqr.asm in the library, actually - it's full floating point compatible, 100% accurate, and about 3-6 times faster. It actually uses the FP-Calculator in the rom. It just doesn't use the SQR command. [Note: It comes back with something instead of an error in case of a negative square root request. Boriel - you might want to change that behavor. Not sure.] The integer version... well - look for yourself! I reckon it's about 40 times faster than the fast version.

Also: Should be able to do a something similar for a 32 bit LONG integer.

Copy and compile this program. I hope you like it:

Code:

`FUNCTION FASTCALL SQRT (radicand as FLOAT) as FLOAT`

ASM

; FLOATS arrive in A ED CB

;A is the exponent.

AND A ; Test for zero argument

RET Z ; Return with zero.

;Strictly we should test the number for being negative and quit if it is.

;But let's assume we like imaginary numbers, hmm?

; If you'd rather break it change to a jump to an error below.

;BIT 7,(HL) ; Test the bit.

;JR NZ,REPORT ; back to REPORT_A

; 'Invalid argument'

RES 7,E ; Now it's a positive number, no matter what.

call __FPSTACK_PUSH ; Okay, We put it on the calc stack. Stack contains ABS(x)

; Halve the exponent to achieve a good guess.(accurate with .25 16 64 etc.)

; Remember, A is the exponent.

XOR $80 ; toggle sign of exponent

SRA A ; shift right, bit 7 unchanged.

INC A ;

JR Z,ASIS ; forward with say .25 -> .5

JP P,ASIS ; leave increment if value > .5

DEC A ; restore to shift only.

ASIS: XOR $80 ; restore sign.

call __FPSTACK_PUSH ; Okay, NOW we put the guess on the stack

rst 28h ; ROM CALC ;;guess,x

DEFB $C3 ;;st-mem-3 x,guess

DEFB $02 ;;delete x

SLOOP: DEFB $31 ;;duplicate x,x.

DEFB $E3 ;;get-mem-3 x,x,guess

DEFB $C4 ;;st-mem-4 x,x,guess

DEFB $05 ;;div x,x/guess.

DEFB $E3 ;;get-mem-3 x,x/guess,guess

DEFB $0F ;;addition x,x/guess+guess

DEFB $A2 ;;stk-half x,x/guess+guess,.5

DEFB $04 ;;multiply x,(x/guess+guess)*.5

DEFB $C3 ;;st-mem-3 x,newguess

DEFB $E4 ;;get-mem-4 x,newguess,oldguess

DEFB $03 ;;subtract x,newguess-oldguess

DEFB $2A ;;abs x,difference.

DEFB $37 ;;greater-0 x,(0/1).

DEFB $00 ;;jump-true x.

DEFB SLOOP - $ ;;to sloop x.

DEFB $02 ;;delete .

DEFB $E3 ;;get-mem-3 retrieve final guess.

DEFB $38 ;;end-calc sqr x.

jp __FPSTACK_POP

END ASM

END FUNCTION

FUNCTION FASTCALL SQRT16(radicand as uInteger) as uByte

asm

XOR A

AND A

ld a,l

ld l,h

ld de,0040h ; 40h appends "01" to D

ld h,d

ld b,7

sqrt16loop:

sbc hl,de ; IF speed is critical, and you don't mind spending the extra bytes, you could unroll this loop 7 times instead of DJNZ.

jr nc,$+3

add hl,de

ccf

rl d

rla

adc hl,hl

rla

adc hl,hl

DJNZ sqrt16loop

sbc hl,de ; optimised last iteration

ccf

rl d

ld a,d

end asm

END FUNCTION

FUNCTION t AS ULONG

RETURN INT((65536 * PEEK (23674) + 256 * PEEK(23673) + PEEK (23672)))

END FUNCTION

CLS

DIM a,b as float

DIM i as uInteger

DIM time as long

PRINT "ROM","FAST"

REM show it's as accurate

for i=1 to 15

LET a=rnd * 32768

PRINT SQR(a),SQRT(a)

next i

PRINT

PRINT "Over 500 Cycles:"

PRINT

REM ROM version 500 times.

LET time=t()

for i=1 to 500

b=SQR(a)

next i

PRINT "Rom routine: ";t()-time;" Frames."

REM MY version 500 times.

LET time=t()

for i=1 to 500

b=SQRT(a)

next i

PRINT "Fast routine: ";t()-time;" Frames."

PRINT

PRINT "PRESS A KEY"

PAUSE 1: PAUSE 0

CLS

PRINT "NUM FAST INTEGER"

REM show it's as accurate

for i=1 to 15

LET a=INT(rnd * 32768)

PRINT a;TAB 6;SQRT(a);TAB 16;SQRT16(a)

next i

PRINT

PRINT "Over 500 Cycles:"

PRINT

REM MY version 500 times.

LET time=t()

for i=1 to 500

b=SQRT(a)

next i

PRINT "Fast routine: ";t()-time;" Frames."

REM MY Integer version 500 times.

LET time=t()

for i=1 to 500

b=SQRT16(a)

next i

PRINT "Integer routine: ";t()-time;" Frames."