Thread Rating:
  • 1 Vote(s) - 5 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Compiler Speed Trials
#46
boriel Wrote:A Question: Are you using float or fixed? Or both?

BM7 is Integer only.
BM8 uses the sine and ln functions, so is Float.
fSin moves the sin function to Fixed. Still uses ln (float) though.
Reply
#47
Added in some Draw Benchmarks:
I also very slightly changed the code to make it easier to use the three benchmarks. I think this added a fixed 0.02 second (1 frame) overhead into the benchmark from another goto. I decided this was near enough to not worry about too much.

Clearly the draw code improved at 1.27; though the curved line code is still hampered by the ROM code. It might not be fair to test that part, since it's so rarely used!

Code:
BM1      BM2      BM3      BM4      BM5      BM6      BM7      BM8                       BMDRAW
        Sinclair           4.46     8.46     21.56    19.82    25.34    60.82    87.44    23.30                     80.18
       ZX Basic 1.26 -O3                                                          2.12    20.78
       ZX Basic 1.26-r1603 -O3                                                    0.94    20.78 (17.14 with fSin)
       ZX Basic 1.26-r1812 -O3                                                    1.36    29.00 (24.22 with fSin)   38.02
       ZX Basic 1.27- r2114 -O3                                                                                     30.42
       ZX Basic 1.2.8-r2153 -O3                                                   1.36    29.06 (24.18 with fSin)
       ZX Basic 1.2.8-s644 -O3                                                    1.34    29.02 (24.22 with fSin)   30.42
Reply
#48
britlion Wrote:New benchmarks with 1.2.8.s644. 1.26-r1603 is still the speed king for ZX Basic. Hisoft fastest overall.

Just for curiosity, I downloaded the "legacy" version on the website. It's version r1812. This also shows the slowdown, if you are trying to track when this happened.

Boriel, do you still have the code available to wind it back to 1.26-r1603? We don't have all versions for download any more!

Most version are "hidden" at <!-- m --><a class="postlink" href="http://www.boriel.com/files/zxb/archive">http://www.boriel.com/files/zxb/archive</a><!-- m -->, and all code history is now stored at <!-- m --><a class="postlink" href="https://code.boriel.net/hg/zxbasic/">https://code.boriel.net/hg/zxbasic/</a><!-- m --> (the mercurial repo), just in case. :oops:
Reply
#49
Hi, I've rerun your BM7 and BM8 tests against 1.2.8-s682 with -O3 and BM7 gives 1.05 segs, whilst BM8 outputs 19.05 (the best ever).

These are the code benchmarks I used:
Code:
''' BM7 Benchmark by britlion.
''' Compiles on: ZX Basic, Hisoft, Sinclair BASIC
''' Expected running time: 2.1 segs (ZX BASIC)

7 REM :INT +a,k,v,i,m()
  DIM a, k, v, i as UInteger

8 REM : OPEN#
9 CLS
10 POKE 23672,0: POKE 23673,0
90 POKE 23672,0
100 LET a=0: LET k=5: LET v=0
110 LET a=a+1
120 LET v=k/2*3+4-5
130 GO SUB 1000
140 DIM m(5) as UInteger
150 FOR i=1 TO 5
160 LET m(i)=a
170 NEXT i
200 IF a<1000 THEN GO TO 110: END IF
210 PRINT CAST(FIXED, PEEK(Uinteger, 23672))/50.0
999 STOP
1000 RETURN
1001 PRINT v, m(0)

Code:
REM BM8
POKE Uinteger 23672, 0

DIM i,j as ubyte
j=2
FOR i=1 to 100
result=j^2
result=ln(j)
result=sin(j)
next i

t = Peek(Uinteger, 23672)
print CAST(Float, t) / 50
PRINT result : REM use result to avoid it being optimized
Can you check it? Maybe you're using a different benchmark??
Reply
#50
The complete code I was using is:

Code:
#include "fSin.bas"
FUNCTION t() as uLong
asm
    DI
    LD DE,(23674)
    LD D,0
    LD HL,(23672)
    EI
end asm
end function
cls
DIM i as uInteger
DIM k,var,j as uByte
DIM time,endtime as uLong

LET k=5
LET i=2
let time =t()
goto start

subroutine:
return



'start:
label:
LET i=i+1
LET var=k/2*3+4-5
gosub subroutine
DIM M(5) as uInteger
FOR j=0 to 4
LET M(j)=i
NEXT j

IF i<1000 then GOTO label: END IF
goto finish

REM BM8
'start:
FOR i=1 to 100
result=i^2
result=ln(i)
result=Sin(i)
next i
goto finish


REM BM DRAW
start:
for i=1 to 127
draw i,160
plot i,0
next i

OVER 1

for i=0 to 80
plot 0,i
draw 250,i
next i

OVER 2

for i = 1 to 80
circle 127,87,i
next i

OVER 3

for i = 10 to 18
plot 127,87
draw i,i,i
next i

finish:
endtime=t()

OVER 0
print "Start:";time;" End:";endtime
print (CAST (FLOAT,t())-time)/50; " Seconds"
print "Done!"
print at 23,0;M(1);k;i;var;result

I simple change which start label is active before compiling.
Let me run yours for comparison and come back.


Incidentally, boriel - my loop counter (i) is an integer. Yours is a byte. I think that's the difference.
Reply
#51
Here's the latest with s682:
(using my benchmark suite listed above)

Code:
BM1      BM2      BM3      BM4      BM5      BM6      BM7      BM8                       BMDRAW
        Sinclair           4.46     8.46     21.56    19.82    25.34    60.82    87.44    23.30                     80.18
       ZX Basic 1.26 -O3                                                          2.12    20.78
       ZX Basic 1.26-r1603 -O3                                                    0.94    20.78 (17.14 with fSin)
       ZX Basic 1.26-r1812 -O3                                                    1.36    29.00 (24.22 with fSin)   38.02
       ZX Basic 1.27- r2114 -O3                                                                                     30.42
       ZX Basic 1.2.8-r2153 -O3                                                   1.36    29.06 (24.18 with fSin)
       ZX Basic 1.2.8-s644 -O3                                                    1.34    29.02 (24.22 with fSin)   30.42
       ZX Basic 1.2.8-s682 -O3                                                    0.88    20.56 (16.94 with fSin)   21.14

It's clearly changed! I also note that the start frame has dramatically changed as well, implying the compiled code gets off the ground quite a bit faster. Wow!
Reply
#52
Here's the latest with s696:
(using my benchmark suite listed above)

Code:
BM1      BM2      BM3      BM4      BM5      BM6      BM7      BM8                       BMDRAW
        Sinclair           4.46     8.46     21.56    19.82    25.34    60.82    87.44    23.30                     80.18
       ZX Basic 1.26 -O3                                                          2.12    20.78
       ZX Basic 1.26-r1603 -O3                                                    0.94    20.78 (17.14 with fSin)
       ZX Basic 1.26-r1812 -O3                                                    1.36    29.00 (24.22 with fSin)   38.02
       ZX Basic 1.27- r2114 -O3                                                                                     30.42
       ZX Basic 1.2.8-r2153 -O3                                                   1.36    29.06 (24.18 with fSin)
       ZX Basic 1.2.8-s644 -O3                                                    1.34    29.02 (24.22 with fSin)   30.42
       ZX Basic 1.2.8-s682 -O3                                                    0.88    20.56 (16.94 with fSin)   21.14
       ZX Basic 1.2.8-s696 -O3                                                    0.90    20.60 (16.98 with fSin)   21.18

Still holding together. It's hard to put in new features and not have it slow down on the way; this version looks likeit has a couple of frames worth of startup overhead; but that's just fine. Good work, Boriel.
Reply
#53
Forgot to mention that sometimes a bugfix means a little overhead: sometimes a previous version is faster because it's compiling a wrong (shorter) code! :roll:
Reply
#54
Here's the latest with 1.28 s758:
(using my benchmark suite listed above)

Code:
BM1      BM2      BM3      BM4      BM5      BM6      BM7      BM8                       BMDRAW
        Sinclair           4.46     8.46     21.56    19.82    25.34    60.82    87.44    23.30                     80.18
       ZX Basic 1.26 -O3                                                          2.12    20.78
       ZX Basic 1.26-r1603 -O3                                                    0.94    20.78 (17.14 with fSin)
       ZX Basic 1.26-r1812 -O3                                                    1.36    29.00 (24.22 with fSin)   38.02
       ZX Basic 1.27- r2114 -O3                                                                                     30.42
       ZX Basic 1.2.8-r2153 -O3                                                   1.36    29.06 (24.18 with fSin)
       ZX Basic 1.2.8-s644 -O3                                                    1.34    29.02 (24.22 with fSin)   30.42
       ZX Basic 1.2.8-s682 -O3                                                    0.88    20.56 (16.94 with fSin)   21.14
       ZX Basic 1.2.8-s696 -O3                                                    0.90    20.60 (16.98 with fSin)   21.18
       ZX Basic 1.2.8-s758 -O3                                                    0.90    20.76 (17.10 with fSin)   21.32

Still better than the very fast r1603, but very slightly behind the last version I tested (s682). And of course, still beating the crap out of Sinclair Basic for speed.
Reply
#55
Thanks, britlion, for this very valuable info. :!:
A peak value in any of the latest values would have been suspicious and worth investigating. :wink:
Reply
#56
Yes, certainly - that's one reason I run these from time to time.

I'm also aware that zip 1.5 compiler manages to create code that completes BM7 in well under 0.5 seconds. That's Still there as a challenge Tongue

Perhaps static for loops will make a difference, there. It's going to be intriguing to see what difference that makes.
Reply
#57
Here's the latest with 1.29 s815:
(using my benchmark suite listed above)

Code:
BM1      BM2      BM3      BM4      BM5      BM6      BM7      BM8                       BMDRAW
        Sinclair           4.46     8.46     21.56    19.82    25.34    60.82    87.44    23.30                     80.18
       ZX Basic 1.26 -O3                                                          2.12    20.78
       ZX Basic 1.26-r1603 -O3                                                    0.94    20.78 (17.14 with fSin)
       ZX Basic 1.26-r1812 -O3                                                    1.36    29.00 (24.22 with fSin)   38.02
       ZX Basic 1.27- r2114 -O3                                                                                     30.42
       ZX Basic 1.2.8-r2153 -O3                                                   1.36    29.06 (24.18 with fSin)
       ZX Basic 1.2.8-s644 -O3                                                    1.34    29.02 (24.22 with fSin)   30.42
       ZX Basic 1.2.8-s682 -O3                                                    0.88    20.56 (16.94 with fSin)   21.14
       ZX Basic 1.2.8-s696 -O3                                                    0.90    20.60 (16.98 with fSin)   21.18
       ZX Basic 1.2.8-s758 -O3                                                    0.90    20.76 (17.10 with fSin)   21.32
       ZX Basic 1.2.9-s815 -O3                                                    0.90    20.54 (16.92 with fSin)   21.08

Confusedhock:

Wow. It's faster. Fastest ever, actually. I really hope there aren't any bugs causing this. Boriel - what did you do? THAT IS AWESOME!

We still have zip2's 0.46s target and hisoft basic's 0.5s target on BM7 though! Smile

But yes - it seems the compiler's internals are running quite a lot faster, which has shaved a not insignificant amount off the benchmarks. I've noticed other programs running faster with this version, as well. Awesome work.
Reply
#58
Here's the latest with 1.3.0 s971:
(using my benchmark suite listed above)

Code:
BM1      BM2      BM3      BM4      BM5      BM6      BM7      BM8                       BMDRAW
        Sinclair           4.46     8.46     21.56    19.82    25.34    60.82    87.44    23.30                     80.18
       ZX Basic 1.26 -O3                                                          2.12    20.78
       ZX Basic 1.26-r1603 -O3                                                    0.94    20.78 (17.14 with fSin)
       ZX Basic 1.2.8-r2153 -O3                                                   1.36    29.06 (24.18 with fSin)
       ZX Basic 1.2.8-s644 -O3                                                    1.34    29.02 (24.22 with fSin)   30.42
       ZX Basic 1.2.8-s682 -O3                                                    0.88    20.56 (16.94 with fSin)   21.14
       ZX Basic 1.2.8-s696 -O3                                                    0.90    20.60 (16.98 with fSin)   21.18
       ZX Basic 1.2.8-s758 -O3                                                    0.90    20.76 (17.10 with fSin)   21.32
       ZX Basic 1.2.9-s815 -O3                                                    0.90    20.54 (16.92 with fSin)   21.08
       ZX Basic 1.3.0-s971 -O3                                                    0.90    20.80 (17.16 with fSin)   21.40

We were chasing zip2's 0.46s target and hisoft basic's 0.5s target on BM7 - but we seem to have lost the speed gain from the last test. Sad

Technically the slowest iteration for a few cycles. Bugfixes are unfortunately making the code a little bit more convoluted, I think.
Reply
#59
britlion Wrote:Here's the latest with 1.3.0 s971:
...
We were chasing zip2's 0.46s target and hisoft basic's 0.5s target on BM7 - but we seem to have lost the speed gain from the last test. Sad
Technically the slowest iteration for a few cycles. Bugfixes are unfortunately making the code a little bit more convoluted, I think.
Could be. I've changed the FP scheme to consume less (much less) memory, but it's slowest. SHL is also a few T-states slower (the bugfix you pointed). It's my bet that it's related to FP routines. Did you use Float or Fixed? I'm still working in alternative Fast Floating Point (yes, based on your ideas, and a FP routine I'm programming).
Reply
#60
Hi Boriel,

It's not a bad thing. Had it changed a lot, it might show up some bug or faulty code that might be completing with the correct result (so passing your tests), but doing so in a more inefficient manner.

I'm still curious how zip2 and hisoft manage such fast results - twice as fast - for basic operations in BM7.

boriel Wrote:Could be. I've changed the FP scheme to consume less (much less) memory, but it's slowest. SHL is also a few T-states slower (the bugfix you pointed). It's my bet that it's related to FP routines. Did you use Float or Fixed? I'm still working in alternative Fast Floating Point (yes, based on your ideas, and a FP routine I'm programming).


The whole Benchmark code is unchanged from where I posted it in this thread on Mar 27, 2011! BM8 uses floating poing LN and SIN - and as an alternative (the bit in brackets) uses my fSin routine, which uses a lookup table to get rough, but reasonable results accurate to about a degree.
Reply


Forum Jump:


Users browsing this thread: 3 Guest(s)