Posts: 805
Threads: 135
Joined: Apr 2009
Reputation:
5
Here's the latest with 1.3.0 s1121:
(using my benchmark suite listed above)
Code: BM1 BM2 BM3 BM4 BM5 BM6 BM7 BM8 BMDRAW
Sinclair 4.46 8.46 21.56 19.82 25.34 60.82 87.44 23.30 80.18
ZX Basic 1.26 -O3 2.12 20.78
ZX Basic 1.26-r1603 -O3 0.94 20.78 (17.14 with fSin)
ZX Basic 1.2.8-r2153 -O3 1.36 29.06 (24.18 with fSin)
ZX Basic 1.2.8-s644 -O3 1.34 29.02 (24.22 with fSin) 30.42
ZX Basic 1.2.8-s682 -O3 0.88 20.56 (16.94 with fSin) 21.14
ZX Basic 1.2.8-s696 -O3 0.90 20.60 (16.98 with fSin) 21.18
ZX Basic 1.2.8-s758 -O3 0.90 20.76 (17.10 with fSin) 21.32
ZX Basic 1.2.9-s815 -O3 0.90 20.54 (16.92 with fSin) 21.08
ZX Basic 1.3.0-s971 -O3 0.90 20.80 (17.16 with fSin) 21.40
ZX Basic 1.3.0-s1121 -O3 0.898 20.818(17.200 with fSin) 21.40
Mixed results. I increased iterations to try to get a finer view of it than the frames variable. Not much change - if anything very very slightly slower, if we allow for rounding errors in the previous results, it's /very/ close, though. Still worth noting that some other compilers (e.g. Zip2) have aced that integer test in half the time.
And again - bugfixes are unfortunately making the code a little bit more convoluted, I think.
Posts: 1,770
Threads: 55
Joined: Aug 2019
Reputation:
24
This impressive!! hock:
You're describing exactly what I did: with -O3 and Byte comparison, I had to add an extra ccf instrucction (4 T-states).
Posts: 805
Threads: 135
Joined: Apr 2009
Reputation:
5
11-30-2013, 03:48 PM
(This post was last modified: 01-05-2021, 01:24 PM by boriel.)
That would explain it.
Zip2 (and Zip1.5 as originally testing post 1) does seem to be quite a lot tighter, finishing BM7 in .46 seconds. (23 frames); as I noted, however. the downside is this is an integer only compiler - but it proves there's a much shorter solution to the integer code compile.
Demo: https://dl.dropboxusercontent.com/u/4903...erTest.z80
And even that fast result has code that a clever compiler could tighten.
E.g. poke 16384,255
Could be
Code: LD a,255
LD (16384),A
14 T states, and move on.
It does:
Code: LD HL,16384
PUSH HL
LD HL,00000
POP DE
EX DE,HL
LD (HL),E
Which is pretty generic code for the simplest case of set a memory address, but 51 T states. It can make sense if the numbers are calculated - but it does treat all numbers as 16 bit, and just masks to 8 bit where necessary. I can't quite blame it for this - it has to fit inside the zx spectrum, with the basic and also with the compiled result. Space is tight there!
Here's
Code: LET V=INT(k/2)*3+4-5
Code: LD HL,(54864) ; Fetch variable k
SRA H
RR L ; Divide
PUSH HL ;save
ADD HL,HL
POP DE
ADD HL,DE ; multiply by 3
INC HL
INC HL
INC HL
INC HL ; Add 4
DEC HL
DEC HL
DEC HL
DEC HL
DEC HL ; Subtract 5
LD (54952),HL ; Store back into v
Aaaagh. Starts well. Goes a bit strange. I think you'd at least optimise +4-5 into -1, and save some hassle. (Though it's bad programming to put +4 - 5 to be fair) I wonder when it stops doing INC and DEC? What if that was -200? or -5000? (Simon Goodwin probably says inc is faster for small numbers, and "small" is apparently at least 5)
So I think a /really/ smart compiler, should be faster than this.
(Easy for me to say, isn't it?)
Posts: 1,770
Threads: 55
Joined: Aug 2019
Reputation:
24
11-30-2013, 04:03 PM
(This post was last modified: 01-05-2021, 01:25 PM by boriel.)
britlion Wrote:That would explain it.
Zip2 (and Zip1.5 as originally testing post 1) does seem to be quite a lot tighter, finishing BM7 in .46 seconds. (23 frames); as I noted, however. the downside is this is an integer only compiler - but it proves there's a much shorter solution to the integer code compile.
Demo: https://dl.dropboxusercontent.com/u/4903...erTest.z80
And even that fast result has code that a clever compiler could tighten.
E.g. poke 16384,255
Could be
Code: LD a,255
LD (16384),A
14 T states, and move on.
It does:
Code: LD HL,16384
PUSH HL
LD HL,00000
POP DE
EX DE,HL
LD (HL),E
I can't testing it now, but if ZXBASIC zxbasic is producing such code, then it's mostly a bug. POKE should be a direct LD whenever possible. :?: :?: :?:
britlion Wrote:Which is pretty generic code for the simplest case of set a memory address, but 51 T states. It can make sense if the numbers are calculated - but it does treat all numbers as 16 bit, and just masks to 8 bit where necessary. I can't quite blame it for this - it has to fit inside the zx spectrum, with the basic and also with the compiled result. Space is tight there!
Here's
Code: LET V=INT(k/2)*3+4-5
Code: LD HL,(54864) ; Fetch variable k
SRA H
RR L ; Divide
PUSH HL ;save
ADD HL,HL
POP DE
ADD HL,DE ; multiply by 3
INC HL
INC HL
INC HL
INC HL ; Add 4
DEC HL
DEC HL
DEC HL
DEC HL
DEC HL ; Subtract 5
LD (54952),HL ; Store back into v
Aaaagh. Starts well. Goes a bit strange. I think you'd at least optimise +4-5 into -1, and save some hassle. (Though it's bad programming to put +4 - 5 to be fair) I wonder when it stops doing INC and DEC? What if that was -200? or -5000? (Simon Goodwin probably says inc is faster for small numbers, and "small" is apparently at least 5)
So I think a /really/ smart compiler, should be faster than this.
(Easy for me to say, isn't it?) Hmm. This can also be introduced in ZX BASIC with -O3. Let's try...
Posts: 805
Threads: 135
Joined: Apr 2009
Reputation:
5
boriel Wrote:I can't testing it now, but if ZXBASIC zxbasic is producing such code, then it's mostly a bug. POKE should be a direct LD whenever possible. :?: :?: :?:
No, as I tried to explain, but seem to have failed
- that's code that's produced by Zip 2 compiler. Which, despite being demonstrably poor as I showed above, completes the task twice as quickly as ZXBasic's generated code.
Posts: 805
Threads: 135
Joined: Apr 2009
Reputation:
5
It's time for the new version to get tested!
C:\>zxb --version
zxb 1.4.0-s1779
(using my benchmark suite listed above)
Code: BM1 BM2 BM3 BM4 BM5 BM6 BM7 BM8 BMDRAW
Sinclair 4.46 8.46 21.56 19.82 25.34 60.82 87.44 23.30 80.18
ZX Basic 1.26 -O3 2.12 20.78
ZX Basic 1.26-r1603 -O3 0.94 20.78 (17.14 with fSin)
ZX Basic 1.2.8-r2153 -O3 1.36 29.06 (24.18 with fSin)
ZX Basic 1.2.8-s644 -O3 1.34 29.02 (24.22 with fSin) 30.42
ZX Basic 1.2.8-s682 -O3 0.88 20.56 (16.94 with fSin) 21.14
ZX Basic 1.2.8-s696 -O3 0.90 20.60 (16.98 with fSin) 21.18
ZX Basic 1.2.8-s758 -O3 0.90 20.76 (17.10 with fSin) 21.32
ZX Basic 1.2.9-s815 -O3 0.90 20.54 (16.92 with fSin) 21.08
ZX Basic 1.3.0-s971 -O3 0.90 20.80 (17.16 with fSin) 21.40
ZX Basic 1.3.0-s1121 -O3 0.898 20.818(17.200 with fSin) 21.40
ZX Basic 1.4.0-s1779 -O3 0.892 20.628(17.420 with fSin) 21.22
The good: It all compiles and runs. And runs very very slightly faster. We're into much less than a frame (Only visible with more iterations).
The bad: One test (fsin) actually went slower than previously. And we're still about double the time of zip2 compiler for integer results - as discussed above. Zip2 is actually producing that pretty awful code posted in the last few posts, but it's still twice as fast as ZXB for simple integer maths. I suspect the asm modules are mostly untouched - meaning that most of the code the new version produces is identical. It's how it's getting there that's changed!
Posts: 1,770
Threads: 55
Joined: Aug 2019
Reputation:
24
britlion Wrote:It's time for the new version to get tested!
C:\>zxb --version
zxb 1.4.0-s1779
The bad: One test (fsin) actually went slower than previously. And we're still about double the time of zip2 compiler for integer results - as discussed above. Zip2 is actually producing that pretty awful code posted in the last few posts, but it's still twice as fast as ZXB for simple integer maths. I suspect the asm modules are mostly untouched - meaning that most of the code the new version produces is identical. It's how it's getting there that's changed! In fact it's supposed ZX BASIC 1.4 produces always the same or better code than 1.3. Try using --asm and compare diff (both fSin versions).
Regarding to zip, now that ZX BASIC is again ready for developing, we can discuss the memory / speed tradeoff (many compilers have them). So you can chose --optimize-for-speed or --optimize-for-memory
Posts: 1,770
Threads: 55
Joined: Aug 2019
Reputation:
24
BTW: Happy birthday, Britlion
Posts: 805
Threads: 135
Joined: Apr 2009
Reputation:
5
boriel Wrote:BTW: Happy birthday, Britlion Thankyou very much
Posts: 105
Threads: 11
Joined: Oct 2013
Reputation:
0
Happy late one :lol:
I'm always on the chat or facebook.
Posts: 805
Threads: 135
Joined: Apr 2009
Reputation:
5
boriel Wrote:So you can chose --optimize-for-speed or --optimize-for-memory
That would be a cool option.
I think better, though, might be to have it as an inline option - so the critical bits can be speed optimised, but the less critical bits can be memory optimised. Routines that need to be fast (e.g. sprites) are one thing, but some others are out of the game loop, or on intro screens etc. For example, I think the redefine key routine is probably not speed critical; but I bet you don't want it bloated. It probably only runs once.
Posts: 805
Threads: 135
Joined: Apr 2009
Reputation:
5
C:\>zxb --version
zxb 1.8.3
(using my benchmark suite listed above)
Code: BM1 BM2 BM3 BM4 BM5 BM6 BM7 BM8 BMDRAW
Sinclair 4.46 8.46 21.56 19.82 25.34 60.82 87.44 23.30 80.18
ZX Basic 1.26 -O3 2.12 20.78
ZX Basic 1.26-r1603 -O3 0.94 20.78 (17.14 with fSin)
ZX Basic 1.2.8-r2153 -O3 1.36 29.06 (24.18 with fSin)
ZX Basic 1.2.8-s644 -O3 1.34 29.02 (24.22 with fSin) 30.42
ZX Basic 1.2.8-s682 -O3 0.88 20.56 (16.94 with fSin) 21.14
ZX Basic 1.2.8-s696 -O3 0.90 20.60 (16.98 with fSin) 21.18
ZX Basic 1.2.8-s758 -O3 0.90 20.76 (17.10 with fSin) 21.32
ZX Basic 1.2.9-s815 -O3 0.90 20.54 (16.92 with fSin) 21.08
ZX Basic 1.3.0-s971 -O3 0.90 20.80 (17.16 with fSin) 21.40
ZX Basic 1.3.0-s1121 -O3 0.898 20.818(17.200 with fSin) 21.40
ZX Basic 1.4.0-s1779 -O3 0.892 20.628(17.420 with fSin) 21.22
ZX Basic 1.4.0-s1980 -O3 0.884 20.818(17.202 with fSin) 21.40
ZX Basic 1.8.3 -O3 0.874 20.818(17.192 with fSin) 21.40
All right. It's been a long long time since I ran this lot, and I thought it was past time we checked to see if Boriel was being kept honest
The new refactored version of the compiler works great! I did have to change some code that used if...then...statement and then else statement stuff - the new one line If syntax tripped it up. But those were trivial changes that took no time to fix.
The good news is the code on the latest build is a hair faster - fastest ever actually. way to go! The zip compiler still manages to hold the crown of fastest code (by about a factor of 2, which is startling); getting in Benchmark 7 at 0.47 seconds vs zxb's 0.87 seconds - but zip is a very cut down integer only and very very limited scope compiler. I'm surprised that the optimisation routes that it uses for simple code haven't been looked at, however. We certainly had a discussion about how it honestly doesn't cheat some years ago.
I think the reason is that it inlines code for small cases. Looking at the assembly, it seems that zxb does do a very simple divide by two on a byte value (it runs srl a, and we're done). But for multiplication by three, it doesn't recognise that as an easy case, and sets h to three, and runs the generic a*h code with a call - it could have inlined push, add hl, hl, pop de, add hl, de, and we're done. Most multiplies are probably lower than 5, and certainly lower than 8, and optimising for *2, *3, *4 etc could be a big boost. Similarly, add one or subtract one is caught as a simple case, but add 4 and add 5 aren't - zip sees these as cases to simply run inc a few more times, rather than run a generic add code, which is why it comes out faster. I think catching cases of small adds and small subtracts and running them as inc inc inc inc etc is probably reasonable. Obviously you can take it too far, but again, most changes for +/- are going to be small.
Anyway, that said, this is all moving in the right direction - the compiler is more powerful, and actually faster all at once. It's getting smarter, and hats off to Boriel for keeping this a fantastic piece of software.
[/quote]
Posts: 1,770
Threads: 55
Joined: Aug 2019
Reputation:
24
You're absolutely right. Great analysis.
I've switched to a new job (yes, again!) so have been busy and have paused ZX Basic devel for a little while until I settle.
But the new version 1.9 (still beta) finally allows anyone (i.e. you) to program his own peephole optimizer using a DSL (an specific micro-language). It already works for -O1 and -O2. For -O3 it's a bit harder.
This means that it no longer uses python to optimize code.
But this language (much simpler) and anyone can create it's own optimization schemes and even contribute to the compiler that way.
Indeed this new optimizer already optimizes further (specially 32 bit values)
Posts: 805
Threads: 135
Joined: Apr 2009
Reputation:
5
Think we'll get it as fast as the old Zip 2 compiler, which is nearly 2x faster?
Posts: 1,770
Threads: 55
Joined: Aug 2019
Reputation:
24
01-05-2021, 01:26 PM
Time to run speed trials again?
|