Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
How to make your code faster
One of the reasons you are probably looking at this is that you have some idea how to program in Sinclair Basic, and no idea how to code in machine code (or z80 assembler as it's sometimes called). You want to go play with the old spectrum stuff, and want faster programs - and it must be easier these days, right?

Well, with Boriel's compiler, it is. Most programs can be put into the compiler in a form almost identical to an original sinclair basic program, and it will work. It will be faster. But you want to make it as fast as you can, right?

First thing then: variable types. (see for details on what variable types the compiler supports.)

Nothing you can do to your program will make as big a speed increase as making sure you use the smallest variable type possible in every case. A byte is better than an integer is better than a long and all those are better than using floating point numbers if you can avoid them.

Have a look at this program:

    RETURN INT((65536 * PEEK (23674) + 256 * PEEK(23673) + PEEK (23672)))

DIM i,j,k,fake as <insert type here>
DIM time as uLong
let fake=0


PRINT "Loop Start"
for k=1 to 20    
    for j=1 to 125
        for i = 1 to 125
        LET fake=fake+1-(fake/2)
        next i
    next j
next k

PRINT "loop End"

print t()-TIME

If we set the type of variable for i,j,k,fake as FLOAT up there at the top, this program will disappear for ages before it reports that it took 119,551 frames to come back. That's almost 40 minutes! If you change the type of variable there to UBYTE it comes back in 839 frames. That's under 17 seconds. To put it another way, the code runs over 142 times faster. Variable types make a BIG difference!

NOTE: The nearest Sinclair BASIC equivalent of this program runs in 235,726 frames, or just over 78 minutes to do the same thing. Even using the same variable types as Sinclair BASIC (Which always uses five byte FLOAT types), a compiled program is quite a lot faster!

For the above program, here are the times, in frames (a frame is 1/50th of a second. Divide by 50 to get a time in seconds if you want - I left it this way to make a speed comparison)

uByte =     839
Byte  =     861
uinteger=  1126
integer =  1178
uLong =   31792
Long  =   32895
Fixed =   36711
Float =  119551

The rule is use the smaller one every time you can, especially in loops! If you're only going round a for/next loop about 10 times, use uByte.

If you can get away with positive numbers, unsigned types (uByte, uIntger and uLong) are a little bit faster than signed ones.

You may also be able to eliminate floating point numbers by multiplying up - for example store $3.02 as 302 pennies.

[Note: In computing terms generally (not just on the spectrum) there are good reasons not to store money in floating point numbers anyway - floating point numbers are NOT perfectly accurate and you may get rounding errors that could cause problems later on. Just as in decimal you can't write 1/3 without an infinitely long 0.33333333->forever happening, you can't store something like 0.1 in binary without an infinitely long binary number. So, far better to store currency as the smaller unit in an integer or long type. A long would allow you to keep track of up to +/- 2,147,483,647 pennies - or about 21 million currency units. If you want to track more than that you can definitely afford a more powerful computer than a Spectrum!]
Yes, FP is horribly slow and needs a boost. HiSoft Basic is much faster and the Tobos FP compiler es even much faster (the fastest FP Compiler for Speccy I know), so maybe Boriel will Rip the FP Runtimes from them?
<!-- m --><a class="postlink" href=""> ... id=0008249</a><!-- m -->
<!-- m --><a class="postlink" href=""> ... id=0008893</a><!-- m -->
I did not knew, it is THAT slow, because I rarely use FP arithmetic.
------------------------------------------------------------ redirector is dead
Visit my home page!
And if you DO know some assembler, you can VERY easily use it in the compiler.

The important thing here, if you want to optimize routines is that you break the code into smaller pieces. If you ever were to write something in pure assembler, trust me, having the program made of small routines is going to be the ONLY way you can make it work. Probably.

Here's a small routine (stolen from Boriel):
FUNCTION  get () as uByte
DIM lastK AS uByte AT 23560: REM LAST_K System VAR
   LET  lastK=0
   DO LOOP until lastK <> 0 : REM Wait FOR a keypress
   RETURN lastK

What this does is set a variable at the Spectrum's system variable called LAST_K (address 23560) and wait for it to change. This routine requires interrupts enabled of course!

If all your program is broken down into small chunks like this (and it should be), then someone who knows assembler - even a little bit! - can start to replace little routines like this with assembler equivalents.
Don't worry if you can't - the compiler does make for fast code; but it's designed to work with all cases, and will probably never produce code quite as small and fast as an experienced human.

Anyway, here's machine code that does exactly the same thing in exactly the same way (this version stolen from Dr. Beep):
         ld   hl,23560 ; sysvar last key pressed
         xor  a  
         ld   (hl),a ; reset last key pressed
readkey: or   (hl)       ; test on change (this happens during intrupt)
         jr   z,readkey ; if no change, then read again
end asm

First of all it points hl at the variable LAST_K, and sets A to be zero (that's what XOR A foes).
Then it copies A over to LAST_K - just like the first version had LET lastK=0
Then it uses the OR function to combine LAST_K and A (in this case, it's the same as adding them together, since A=0) and if A is still zero, it goes back to "readkey".
Otherwise, A has a copy of LAST_K in it - and that's what the function returns. (Fastcall AS byte functions return what's in the A register).

Okay, this routine didn't save much because it wasn't speed critical [No routine that spends its time waiting can be called speed critical!] - but it does save several bytes over the compiled variation.

The key thing to note here is that this routine is a plug-in replacement for the first thing. It does the same thing in the same way. So, that's one routine replaced with assembler - if you replaced them ALL you'd have a 100% hand coded machine code program! Smile

Perhaps if you learn a little assembler, you can start mixing in assembly language with compiled code - and the ZX Basic compiler makes this really easy to do!
LCD Wrote:Yes, FP is horribly slow and needs a boost. HiSoft Basic is much faster and the Tobos FP compiler es even much faster (the fastest FP Compiler for Speccy I know), so maybe Boriel will Rip the FP Runtimes from them?
<!-- m --><a class="postlink" href=""> ... id=0008249</a><!-- m -->
<!-- m --><a class="postlink" href=""> ... id=0008893</a><!-- m -->
I did not knew, it is THAT slow, because I rarely use FP arithmetic.

Actually code from Boriel's compiler is quite a lot faster than ZX basic. Around 2-3 times faster. That's about how much faster the hisoft basic compiler was for floating point, if I remember rightly. I haven't played with the Tobos one, but I hear good things about it. I used the hisoft compiler back in the 80's on a real spectrum in my bedroom as a kid. In fact, my one biggest project from back then was compiled in it.

The hisoft compiler DID make very fast and small code. It more or less had -O3 optimization built in Smile It only included the runtime routines that were actually called by your program, which I always thought was rather clever.
LCD: Does Tobos have any instructions?

I have the tape file...but no clue how to use the compiler, what it allows...or ...well...anything
I think I got it working. Very odd - the compiler seems to over write itself.

I wrote up that basic equivalent program, and it ran with tobos compiler in 22,448 frames. It was 10 times faster than Sinclair Basic. If all the numbers it held were floating point numbers... wow. That's VERY impressive.

Sinclair basic:      235,726 Frames
Tobos compiler:       22,448 Frames
Hisoft Compiler
  All Floating Point: 75,059 Frames
  All uInteger:      30,646 Frames.

What this shows is that clearly Boriel's compiler is far and away better than tools we had on the spectrum in the 80s. It also shows that Tobos was a pretty amazing piece of work - though I think the Hisoft one is more polished. The 128K version only uses 500 bytes of real memory - most of it gets tucked away in the extra memory the 128 has. As a result, it could actually compile a 40K basic program. For its day, that was amazing!

Still, all credit to the guys that wrote Tobos. Right now, with quick tests, it seems to be faster at floating point work than Hisoft is at Integer. That's mind blowing. It's also very tight on runtimes. The hisoft code was about 450 bytes long. The tobos code about 315. That's...incredible. That includes the runtimes it needs? Wow.

Forum Jump:

Users browsing this thread: 1 Guest(s)