02-24-2010, 07:33 PM
Okay, integer square roots - this is the big version. It can deal with LONG integer input.
And right now, I never want to see a four-register 32 bit number again *lol* hooboy are they a pain. I found out the hard way, for example, that "inc HL" doesn't flag overflows. It all boils down to a lot of tests and jumps to allow for lower 16 bits flowing into upper and vice versa.
This version is 148 bytes of code. It incorporates the 16 bit version as an alternative super-fast routine. This version is about 50 times faster than the ROM routine for numbers > 65535 and over 100 times faster for numbers <= 65535. It only returns integer square roots (see http://en.wikipedia.org/wiki/Integer_square_root for details) - but for game purposes, these are generally fast enough. If you want floating point numbers, I've also published a routine about 6 times faster than the usual rom one; though, ironically, that uses the rom calculator; but uses a far better algorithm than the actual rom call.
If you want to reduce the size by about 1/3, you can cut the jump to 16 bit off the start, and cut out the 16 bit section at the end. It's still going to be 50 times faster than the ROM.
I called it sqrtLF - for Square Root Long Fast. But you can rename it if you want. Just sqrt isn't a bad name. You can't call it sqr though, as that is a reserved word.
Here you go:
And right now, I never want to see a four-register 32 bit number again *lol* hooboy are they a pain. I found out the hard way, for example, that "inc HL" doesn't flag overflows. It all boils down to a lot of tests and jumps to allow for lower 16 bits flowing into upper and vice versa.
This version is 148 bytes of code. It incorporates the 16 bit version as an alternative super-fast routine. This version is about 50 times faster than the ROM routine for numbers > 65535 and over 100 times faster for numbers <= 65535. It only returns integer square roots (see http://en.wikipedia.org/wiki/Integer_square_root for details) - but for game purposes, these are generally fast enough. If you want floating point numbers, I've also published a routine about 6 times faster than the usual rom one; though, ironically, that uses the rom calculator; but uses a far better algorithm than the actual rom call.
If you want to reduce the size by about 1/3, you can cut the jump to 16 bit off the start, and cut out the 16 bit section at the end. It's still going to be 50 times faster than the ROM.
I called it sqrtLF - for Square Root Long Fast. But you can rename it if you want. Just sqrt isn't a bad name. You can't call it sqr though, as that is a reserved word.
Here you go:
Code:
FUNCTION FASTCALL sqrtLF (num as uLong) as uInteger
REM incoming is DEHL
REM outbound is HL
asm
LD A,D
OR E
JP Z, sqrtLF16bit ; we're inside a 16 bit number. We can use the faster version.
LD b,16 ; b times round
EXX ; Out to root and rem - we're doing most of this in alternate registers.
LD DE,0
LD HL,0 ; DEHL = remainder
LD BC,0 ; BC = root
EXX ;back to num and loop
sqrtLFasmloop:
EXX ; out to root and rem
SLA C ; root <<= 1
RL B ;
SLA L ; rem=rem<<1
RL H ;
RL E ;
RL D ;
NOP
SLA L ; rem=rem<<1
RL H ;
RL E ;
RL D ;
EXX ; back to Num and loop
LD a,d ; A = inputnum>>30
AND 192
RLCA
RLCA
SLA L ; num <<= 1
RL H
RL E
RL D
SLA L ; num <<= 1
RL H
RL E
RL D
EXX ; out to root and rem
ADD A,L ; a=a+L ; REM=REM+num>>30
LD L,A ; a-> L ;
JR NC, sqrtLFasmloophop1 ;
INC H
JR NC, sqrtLFasmloophop1
INC DE ; ;
sqrtLFasmloophop1:
INC BC ; root=root+1
sqrtLFasmloophop2:
; DEHL = Remainder
; BC = root
; if rem >= root then
LD A,D
OR E
JR NZ, sqrtLFasmthen ; if rem > 65535 then rem is definitely > root and we go to true
LD A, H
CP B
JR C, sqrtLFasmelse ; H<B - that is rem<root so rem>=root is false and we go to else
JR NZ, sqrtLFasmthen ; H isn't zero though, so we could do a carry from it, so we're good to say HL is larger.
; if h is out, then it's down to L and C
LD A,L
CP C
JR C, sqrtLFasmelse ; L<C - that is rem<root so rem>=root is false and we go to else
; must be true - go to true.
sqrtLFasmthen:
;remainder=remainder-root
AND A ; clear carry flag
SBC HL,BC ; take root away from the lower half of rem.
JP NC, sqrtLFasmhop3 ; we didn't take away too much, so we're okay to loop round.
; if we're here, we did take away too much. We need to borrow from DE
DEC DE ; borrow off DE
sqrtLFasmhop3:
INC BC ;root=root+1
JP sqrtLFasmloopend
;else
sqrtLFasmelse:
DEC BC ;root=root-1
;end if
sqrtLFasmloopend:
EXX ; back to num
DJNZ sqrtLFasmloop
EXX ; out to root and rem
PUSH BC
EXX ; back to normal
POP HL
SRA H
RES 7,H
RR L ; Hl=HL/2 - root/2 is the answer.
jr sqrtLFexitFunction
sqrtLF16bit:
ld a,l
ld l,h
ld de,0040h ; 40h appends "01" to D
ld h,d
ld b,7
sqrtLFsqrt16loop:
sbc hl,de ; IF speed is critical, and you don't mind spending the extra bytes, you could unroll this loop 7 times instead of DJNZ.
jr nc,$+3
add hl,de
ccf
rl d
rla
adc hl,hl
rla
adc hl,hl
DJNZ sqrtLFsqrt16loop
sbc hl,de ; optimised last iteration
ccf
rl d
ld h,0
ld l,d
ld de,0
sqrtLFexitFunction:
end asm
END FUNCTION