Thread Rating:
  • 1 Vote(s) - 5 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Compiler Speed Trials
#61
Here's the latest with 1.3.0 s1121:
(using my benchmark suite listed above)

Code:
BM1      BM2      BM3      BM4      BM5      BM6      BM7      BM8                       BMDRAW
        Sinclair           4.46     8.46     21.56    19.82    25.34    60.82    87.44    23.30                     80.18
       ZX Basic 1.26 -O3                                                          2.12    20.78
       ZX Basic 1.26-r1603 -O3                                                    0.94    20.78 (17.14 with fSin)
       ZX Basic 1.2.8-r2153 -O3                                                   1.36    29.06 (24.18 with fSin)
       ZX Basic 1.2.8-s644 -O3                                                    1.34    29.02 (24.22 with fSin)   30.42
       ZX Basic 1.2.8-s682 -O3                                                    0.88    20.56 (16.94 with fSin)   21.14
       ZX Basic 1.2.8-s696 -O3                                                    0.90    20.60 (16.98 with fSin)   21.18
       ZX Basic 1.2.8-s758 -O3                                                    0.90    20.76 (17.10 with fSin)   21.32
       ZX Basic 1.2.9-s815 -O3                                                    0.90    20.54 (16.92 with fSin)   21.08
       ZX Basic 1.3.0-s971 -O3                                                    0.90    20.80 (17.16 with fSin)   21.40
       ZX Basic 1.3.0-s1121 -O3                                                   0.898   20.818(17.200 with fSin)  21.40

Mixed results. I increased iterations to try to get a finer view of it than the frames variable. Not much change - if anything very very slightly slower, if we allow for rounding errors in the previous results, it's /very/ close, though. Still worth noting that some other compilers (e.g. Zip2) have aced that integer test in half the time.

And again - bugfixes are unfortunately making the code a little bit more convoluted, I think.
Reply
#62
This impressive!! Confusedhock:
You're describing exactly what I did: with -O3 and Byte comparison, I had to add an extra ccf instrucction (4 T-states).
Reply
#63
That would explain it.

Zip2 (and Zip1.5 as originally testing post 1) does seem to be quite a lot tighter, finishing BM7 in .46 seconds. (23 frames); as I noted, however. the downside is this is an integer only compiler - but it proves there's a much shorter solution to the integer code compile.

Demo: https://dl.dropboxusercontent.com/u/4903...erTest.z80

And even that fast result  has code that a clever compiler could tighten.

E.g. poke 16384,255

Could be
Code:
LD a,255
LD (16384),A
14 T states, and move on.

It does:
Code:
LD HL,16384
PUSH HL
LD HL,00000
POP DE
EX DE,HL
LD (HL),E

Which is pretty generic code for the simplest case of set a memory address, but 51 T states. It can make sense if the numbers are calculated - but it does treat all numbers as 16 bit, and just masks to 8 bit where necessary. I can't quite blame it for this - it has to fit inside the zx spectrum, with the basic and also with the compiled result. Space is tight there!

Here's
Code:
LET V=INT(k/2)*3+4-5

Code:
LD HL,(54864) ; Fetch variable k

SRA H
RR L   ; Divide

PUSH HL   ;save   

ADD HL,HL
POP DE
ADD HL,DE ; multiply by 3

INC HL
INC HL
INC HL
INC HL ; Add 4

DEC HL
DEC HL
DEC HL
DEC HL
DEC HL ; Subtract 5

LD (54952),HL ; Store back into v

Aaaagh. Starts well. Goes a bit strange. I think you'd at least optimise +4-5 into -1, and save some hassle. (Though it's bad programming to put +4 - 5 to be fair) I wonder when it stops doing INC and DEC? What if that was -200? or -5000? Smile (Simon Goodwin probably says inc is faster for small numbers, and "small" is apparently at least 5)



So I think a /really/ smart compiler, should be faster than this. Big Grin

(Easy for me to say, isn't it?)
Reply
#64
britlion Wrote:That would explain it.

Zip2 (and Zip1.5 as originally testing post 1) does seem to be quite a lot tighter, finishing BM7 in .46 seconds. (23 frames); as I noted, however. the downside is this is an integer only compiler - but it proves there's a much shorter solution to the integer code compile.

Demo: https://dl.dropboxusercontent.com/u/4903...erTest.z80

And even that fast result has code that a clever compiler could tighten.

E.g. poke 16384,255

Could be
Code:
LD a,255
LD (16384),A
14 T states, and move on.

It does:
Code:
LD HL,16384
PUSH HL
LD HL,00000
POP DE
EX DE,HL
LD (HL),E
I can't testing it now, but if ZXBASIC zxbasic is producing such code, then it's mostly a bug. POKE should be a direct LD whenever possible. :?: :?: :?:

britlion Wrote:Which is pretty generic code for the simplest case of set a memory address, but 51 T states. It can make sense if the numbers are calculated - but it does treat all numbers as 16 bit, and just masks to 8 bit where necessary. I can't quite blame it for this - it has to fit inside the zx spectrum, with the basic and also with the compiled result. Space is tight there!

Here's
Code:
LET V=INT(k/2)*3+4-5

Code:
LD HL,(54864) ; Fetch variable k

SRA H
RR L   ; Divide

PUSH HL   ;save  

ADD HL,HL
POP DE
ADD HL,DE ; multiply by 3

INC HL
INC HL
INC HL
INC HL ; Add 4

DEC HL
DEC HL
DEC HL
DEC HL
DEC HL ; Subtract 5

LD (54952),HL ; Store back into v

Aaaagh. Starts well. Goes a bit strange. I think you'd at least optimise +4-5 into -1, and save some hassle. (Though it's bad programming to put +4 - 5 to be fair) I wonder when it stops doing INC and DEC? What if that was -200? or -5000? Smile (Simon Goodwin probably says inc is faster for small numbers, and "small" is apparently at least 5)
So I think a /really/ smart compiler, should be faster than this. Big Grin

(Easy for me to say, isn't it?)
Hmm. This can also be introduced in ZX BASIC with -O3. Let's try...
Reply
#65
boriel Wrote:I can't testing it now, but if ZXBASIC zxbasic is producing such code, then it's mostly a bug. POKE should be a direct LD whenever possible. :?: :?: :?:

No, as I tried to explain, but seem to have failed Sad

- that's code that's produced by Zip 2 compiler. Which, despite being demonstrably poor as I showed above, completes the task twice as quickly as ZXBasic's generated code.
Reply
#66
It's time for the new version to get tested!

C:\>zxb --version
zxb 1.4.0-s1779

(using my benchmark suite listed above)

Code:
BM1      BM2      BM3      BM4      BM5      BM6      BM7      BM8                       BMDRAW
        Sinclair           4.46     8.46     21.56    19.82    25.34    60.82    87.44    23.30                     80.18
       ZX Basic 1.26 -O3                                                          2.12    20.78
       ZX Basic 1.26-r1603 -O3                                                    0.94    20.78 (17.14 with fSin)
       ZX Basic 1.2.8-r2153 -O3                                                   1.36    29.06 (24.18 with fSin)
       ZX Basic 1.2.8-s644 -O3                                                    1.34    29.02 (24.22 with fSin)   30.42
       ZX Basic 1.2.8-s682 -O3                                                    0.88    20.56 (16.94 with fSin)   21.14
       ZX Basic 1.2.8-s696 -O3                                                    0.90    20.60 (16.98 with fSin)   21.18
       ZX Basic 1.2.8-s758 -O3                                                    0.90    20.76 (17.10 with fSin)   21.32
       ZX Basic 1.2.9-s815 -O3                                                    0.90    20.54 (16.92 with fSin)   21.08
       ZX Basic 1.3.0-s971 -O3                                                    0.90    20.80 (17.16 with fSin)   21.40
       ZX Basic 1.3.0-s1121 -O3                                                   0.898   20.818(17.200 with fSin)  21.40
       ZX Basic 1.4.0-s1779 -O3                                                   0.892   20.628(17.420 with fSin)  21.22

The good: It all compiles and runs. And runs very very slightly faster. We're into much less than a frame (Only visible with more iterations).

The bad: One test (fsin) actually went slower than previously. And we're still about double the time of zip2 compiler for integer results - as discussed above. Zip2 is actually producing that pretty awful code posted in the last few posts, but it's still twice as fast as ZXB for simple integer maths. I suspect the asm modules are mostly untouched - meaning that most of the code the new version produces is identical. It's how it's getting there that's changed!
Reply
#67
britlion Wrote:It's time for the new version to get tested!

C:\>zxb --version
zxb 1.4.0-s1779

The bad: One test (fsin) actually went slower than previously. And we're still about double the time of zip2 compiler for integer results - as discussed above. Zip2 is actually producing that pretty awful code posted in the last few posts, but it's still twice as fast as ZXB for simple integer maths. I suspect the asm modules are mostly untouched - meaning that most of the code the new version produces is identical. It's how it's getting there that's changed!
In fact it's supposed ZX BASIC 1.4 produces always the same or better code than 1.3. Try using --asm and compare diff (both fSin versions).
Regarding to zip, now that ZX BASIC is again ready for developing, we can discuss the memory / speed tradeoff (many compilers have them). So you can chose --optimize-for-speed or --optimize-for-memory
Reply
#68
BTW: Happy birthday, Britlion Smile
Reply
#69
boriel Wrote:BTW: Happy birthday, Britlion Smile
Thankyou very much Wink
Reply
#70
Happy late one :lol:
I'm always on the chat or facebook.
Reply
#71
boriel Wrote:So you can chose --optimize-for-speed or --optimize-for-memory

That would be a cool option.

I think better, though, might be to have it as an inline option - so the critical bits can be speed optimised, but the less critical bits can be memory optimised. Routines that need to be fast (e.g. sprites) are one thing, but some others are out of the game loop, or on intro screens etc. For example, I think the redefine key routine is probably not speed critical; but I bet you don't want it bloated. It probably only runs once.
Reply
#72
C:\>zxb --version
zxb 1.8.3

(using my benchmark suite listed above)

Code:
BM1      BM2      BM3      BM4      BM5      BM6      BM7      BM8                       BMDRAW
        Sinclair           4.46     8.46     21.56    19.82    25.34    60.82    87.44    23.30                     80.18
       ZX Basic 1.26 -O3                                                          2.12    20.78
       ZX Basic 1.26-r1603 -O3                                                    0.94    20.78 (17.14 with fSin)
       ZX Basic 1.2.8-r2153 -O3                                                   1.36    29.06 (24.18 with fSin)
       ZX Basic 1.2.8-s644 -O3                                                    1.34    29.02 (24.22 with fSin)   30.42
       ZX Basic 1.2.8-s682 -O3                                                    0.88    20.56 (16.94 with fSin)   21.14
       ZX Basic 1.2.8-s696 -O3                                                    0.90    20.60 (16.98 with fSin)   21.18
       ZX Basic 1.2.8-s758 -O3                                                    0.90    20.76 (17.10 with fSin)   21.32
       ZX Basic 1.2.9-s815 -O3                                                    0.90    20.54 (16.92 with fSin)   21.08
       ZX Basic 1.3.0-s971 -O3                                                    0.90    20.80 (17.16 with fSin)   21.40
       ZX Basic 1.3.0-s1121 -O3                                                   0.898   20.818(17.200 with fSin)  21.40
       ZX Basic 1.4.0-s1779 -O3                                                   0.892   20.628(17.420 with fSin)  21.22
       ZX Basic 1.4.0-s1980 -O3                                                   0.884   20.818(17.202 with fSin)  21.40
       ZX Basic 1.8.3       -O3                                                   0.874   20.818(17.192 with fSin)  21.40

All right. It's been a long long time since I ran this lot, and I thought it was past time we checked to see if Boriel was being kept honest Smile

The new refactored version of the compiler works great! I did have to change some code that used if...then...statement and then else statement stuff - the new one line If syntax tripped it up. But those were trivial changes that took no time to fix.

The good news is the code on the latest build is a hair faster - fastest ever actually. way to go! The zip compiler still manages to hold the crown of fastest code (by about a factor of 2, which is startling); getting in Benchmark 7 at 0.47 seconds vs zxb's 0.87 seconds - but zip is a very cut down integer only and very very limited scope compiler. I'm surprised that the optimisation routes that it uses for simple code haven't been looked at, however. We certainly had a discussion about how it honestly doesn't cheat some years ago.

I think the reason is that it inlines code for small cases. Looking at the assembly, it seems that zxb does do a very simple divide by two on a byte value (it runs srl a, and we're done). But for multiplication by three, it doesn't recognise that as an easy case, and sets h to three, and runs the generic a*h code with a call - it could have inlined push, add hl, hl, pop de, add hl, de, and we're done. Most multiplies are probably lower than 5, and certainly lower than 8, and optimising for *2, *3, *4 etc could be a big boost. Similarly, add one or subtract one is caught as a simple case, but add 4 and add 5 aren't - zip sees these as cases to simply run inc a few more times, rather than run a generic add code, which is why it comes out faster. I think catching cases of small adds and small subtracts and running them as inc inc inc inc etc is probably reasonable. Obviously you can take it too far, but again, most changes for +/- are going to be small.

Anyway, that said, this is all moving in the right direction - the compiler is more powerful, and actually faster all at once. It's getting smarter, and hats off to Boriel for keeping this a fantastic piece of software.

[/quote]
Reply
#73
You're absolutely right. Great analysis.
I've switched to a new job (yes, again!) so have been busy and have paused ZX Basic devel for a little while until I settle.

But the new version 1.9 (still beta) finally allows anyone (i.e. you) to program his own peephole optimizer using a DSL (an specific micro-language). It already works for -O1 and -O2. For -O3 it's a bit harder.

This means that it no longer uses python to optimize code.
But this language (much simpler) and anyone can create it's own optimization schemes and even contribute to the compiler that way.
Indeed this new optimizer already optimizes further (specially 32 bit values)
Reply
#74
Think we'll get it as fast as the old Zip 2 compiler, which is nearly 2x faster? Smile
Reply
#75
Information 
Time to run speed trials again? Angel
Reply


Forum Jump:


Users browsing this thread: 3 Guest(s)