![]() |
Compiler Speed Trials - Printable Version +- Forum (https://www.boriel.com/forum) +-- Forum: Compilers and Computer Languages (https://www.boriel.com/forum/forumdisplay.php?fid=12) +--- Forum: ZX Basic Compiler (https://www.boriel.com/forum/forumdisplay.php?fid=11) +--- Thread: Compiler Speed Trials (/showthread.php?tid=161) |
Compiler Speed Trials - britlion - 02-18-2010 I know that ZX Basic is amazing, but I was wondering how it stood up to other basic compilers that were around for use on the ZX Spectrum. We know that Hisoft basic was pretty fast, for example, and LCD mentioned another compiler the other day that was pretty amazing too. Let me borrow from an article in Crash Magazine: http://www.crashonline.org.uk/19/compilers.htm In this article, Simon Goodwin talks about several compilers. Hisoft Basic isn't one of them - it wasn't out yet. He doesn't list the benchmarks, either; but they can be interpolated from this: Code: Benchmark BM1 : A null-action FOR, REPEAT or DO loop, executed Simon didn't use Benchmark 9, and I can see why - it's not clearly specified. BM1 to BM8 are pretty clear, however. My own personal testing with Sinclair Basic gave very slightly differing results. In all cases, my programs were very slightly faster than the timings Goodwin gave in the magazine article. Perhaps he specified things a little differently, perhaps he was using a stopwatch in hand, and human error was the result. Perhaps it was a different version of the ZX Spectrum used. I got the computer to time the programs using the 50 frames per second interrupt timer. For very fast running programs I increased the number of loops by a factor 10 or 100 and estimated back down. The compilers goodwin tested were: A Mehmood's "Compiler". MCODER Softek's FP and IS And a little cheekily, Zip 1.5. He wrote that himself, I believe. The first two rows are for Sinclair Basic. The first being Simon Goodwin's numbers, the second being my own. All times are in seconds, smaller is better. Code: BM1 BM2 BM3 BM4 BM5 BM6 BM7 BM8 BMDRAW Code: REM BM7 BM 8 replaces most of the code with: Code: REM BM8 RESULTS and DISCUSSION First up, passing all the benchmarks and more, clearly Boriel's work is by far the most flexible and comprehensive compiler available. It blows the spots off everything else in terms of WHAT it can compile, and all credit to him for creating it. It is excellent! In terms of performance, it's pretty amazing, too. It's the second fastest of all the compilers listed here. Only ZIP goes faster, generally. BM7 is a little disappointing, in that the produced code seems to be slower than both MCODER 2 and Zip by a quite significant margin. Perhaps some examination of array handling code could improve this. With version 1.25 beta, sadly, I couldn't use -O3 as an option - the programs all failed to compiler with this option enabled, so I couldn't see if peephole optimization would make a difference. It's worth noting that most On Spectrum compilers refused to deal with floating point numbers. In this roundup, only Softek FP could do it, and that barely faster than Basic. Boriel's compiler blew me away with the FP result, frankly. I had to check to see if it was doing it correctly, it was so amazing! There might be some sneaky optimization happening, but printing the numbers as it created them did seem to work fine. (Note: It WAS cheating. It was putting in constants at compile time. A clever option, but not what we were aiming to test. This number has been changed) Fixed Hisoft Basic Numbers. These corrected numbers do in fact show it produces some of the fastest code available, sometimes beaten by ZIP 1.5. It far outmatches what ZIP can do, however, in that it deals with FP as well as integer - and it seems to do both faster than the competition. Of course ZX BASIC basic excels at being FP and Integer aware as well. Added in Tobos. It's fully FP, so tends to be slow where integer math could improve things. But look at BM8! ZX BASIC In short: Solid and well optimized. Seems to be slow in BM7 (array handling). Very clever use of constant insertion to produce good BM8 speed value of 0.1 but now times are corrected because that was cheating a little! [Edit] - Array handling speed has been dramatically increased with later versions. Boriel has stated that he will be looking into further array optimizations similar to Hisoft Basic methods - so we can hope for another doubling of speed, perhaps! ![]() Re: Compiler Speed Trials - boriel - 02-19-2010 First of all, a big Thank you. Wow! What an impressive work! :o britlion Wrote:I know that ZX Basic is amazing, but I was wondering how it stood up to other basic compilers that were around for use on the ZX Spectrum. We know that Hisoft basic was pretty fast, for example, and LCD mentioned another compiler the other day that was pretty amazing too. The above benchmarks are interesting. I'm somewhat surprised of BM7 ![]() BTW do these compilers handle multiple-dimentions arrays? Britlion Wrote:First up, passing all the benchmarks and more, clearly Boriel's work is by far the most flexible and comprehensive compiler available. It blows the spots off everything else in terms of WHAT it can compile, and all credit to him for creating it. It is excellent!:oops: Thank you. I now fell more motivated!!! :twisted: Quote:In terms of performance, it's pretty amazing, too. It's the second fastest of all the compilers listed here. Only ZIP goes faster, generally. BM7 is a little disappointing, in that the produced code seems to be slower than both MCODER 2 and Zip by a quite significant margin. Perhaps some examination of array handling code could improve this. With version 1.25 beta, sadly, I couldn't use -O3 as an option - the programs all failed to compiler with this option enabled, so I couldn't see if peephole optimization would make a difference.Yes, -O3 definitely makes a difference in array-access speed :!: This is something I'm currently fixing. It seems I reintroduced 2 old bugs back (one in the peephole and another on comparators already fixed). I'm currently working on them. Quote:It's worth noting that most On Spectrum compilers refused to deal with floating point numbers. In this roundup, only Softek FP could do it, and that barely faster than Basic. Boriel's compiler blew me away with the FP result, frankly. I had to check to see if it was doing it correctly, it was so amazing! There might be some sneaky optimization happening, but printing the numbers as it created them did seem to work fine.I'm happy to read this. For the FP, it's somewhat odd: I just use constant folding (precalculation) and ROM-CALC for that. Please, check the FP results are right... I mean do you print the FP calculation result on the screen? Do they match? How strange... :| I like this FP result, but... as you, I'm too surprised. Re: Compiler Speed Trials - britlion - 02-19-2010 boriel Wrote:First of all, a big Thank you. Wow! What an impressive work! :o I like ZX Basic A LOT. I want it to produce the best code possible. *grin* Perhaps we can get it to the point where it gives that upstart C compiler a run for its money. *hmmph* A zx spectrum should be coded in basic *laugh* boriel Wrote:The above benchmarks are interesting. I'm somewhat surprised of BM7 No. Yours is the only one that will do that from this list. I think Hisoft Basic did, though. Some of them won't do it at all. ZIP might be fast, but it's VERY limited in what it can do - not even string handling, I believe. Boriel Wrote:Yes, -O3 definitely makes a difference in array-access speed :!: This is something I'm currently fixing. It seems I reintroduced 2 old bugs back (one in the peephole and another on comparators already fixed). I'm currently working on them. When the optimizer is back together, I'll re-run the speed tests, certainly. Sadly, -O1 seems to work, but anything higher than 1 just fails at the moment. Boriel Wrote:I'm happy to read this. I was too, especially given the little tutorial I wrote about using smaller data types when possible to get the fastest code - we know handling five byte numbers (and in the rom routines too) is slower than integers. You can see the code I wrote and ran - it finds sin(2), ln(2) and 2^2 in the loop. I'm pretty sure the 2^2 uses integer math, and rightly so; but sin and ln...well... if you're using ROM routines, I have no idea why it came back that fast. Does it work out a fixed result for sin(2) and just use that? That would be one reason it works so quickly! It ISN'T printing the numbers, because it would make it very slow - PRINT is a pretty time consuming thing to do. I'm wondering if it might be possible to have a faster print routine that doesn't do all the bounds checking, and control character checking. I tested the loop with printing on - and it spun through some decimals happily. I timed it with printing off. I did wonder if the compiler would optimize out data that wasn't being used. The time results for this test were staggeringly fast. Re: Compiler Speed Trials - britlion - 02-19-2010 Added in Hisoft trials. All the reviews and Hisoft said it was the fastest. Not what I found here, by a long margin. It was certainly the most flexible - in its ability to deal with large programs and floating point as well as integer. But the integer benchmarks I managed ran very slowly compared to the competition. Oh dear. Did I make a mistake? Here's the optimized BM7 program - used to avoid using DEF FN (which Hisoft allows, but most compilers don't) Code: 7 REM :INT +a,k,v,i,m() Re: Compiler Speed Trials - britlion - 02-19-2010 I tested tobos with let v=SIN(i) let v=i^2 let v=LN(i) instead of numbers that could be replaced with fixed constants (sin (2) could be replaced, for example). It ran in 0.74 seconds instead of the 0.5 for constants. I don't think it's cheating. When I did the same thing with ZX Basic, it took :!: 24 seconds, instead.... Re: Compiler Speed Trials - boriel - 02-19-2010 britlion Wrote:I was too, especially given the little tutorial I wrote about using smaller data types when possible to get the fastest code - we know handling five byte numbers (and in the rom routines too) is slower than integers. Ok, that's the explanation: precalculation and constant folding => 2^2 => 4, and so on. All those values are constant, and they're calculated at compile time. Even more, if O3 were in use, this program could be reduced to a single NOP, as it does nothing (it does not print on the screen). :!: Try declaring a Float a = 2 variable, and use Sin(a), Ln(a), 2^a, etc... It should have a speed similar to the ROM-BASIC FP calc (so slow). Re: Compiler Speed Trials - boriel - 02-19-2010 britlion Wrote:I tested tobos withOk, this is more in consonance with the FP-CALC Rom. The FP ROM CALC is very powerful... but slow. Tobos and SOFTEK are using their own optimized FP routines so they should get more memory and/or less precision. I have some 3 bytes mantisa FP calc routines (ZX Basic uses 4 bytes) for Z80, or program my ones, but this would require a lot of testing and I don't know if it would worth the hassle). E.g. who will use FP for games? Re: Compiler Speed Trials - britlion - 02-20-2010 boriel Wrote:Ok, this is more in consonance with the FP-CALC Rom. The FP ROM CALC is very powerful... but slow. Tobos and SOFTEK are using their own optimized FP routines so they should get more memory and/or less precision. I have some 3 bytes mantisa FP calc routines (ZX Basic uses 4 bytes) for Z80, or program my ones, but this would require a lot of testing and I don't know if it would worth the hassle). E.g. who will use FP for games? Yes. Well, first up - congrats on spotting constants and optimizing them. No other compiler back in the day did that. Brilliant move! As for the "is it worth it" question - it depends what you want to make here - something that's special purpose or something that's the best all rounder. I just had a look for math routines, and ran across a package of 48 bit floating point routines. Hmm. That would be interesting - being able to go to 32, 40 or 48 bit FP. I really don't know if anyone would use that at this stage, though. That said, the compiler isn't a long hop from being able to work with other z80 devices, like a TI-89 or a Gameboy. Know of any good routines that work on the FIXED type? Re: Compiler Speed Trials - britlion - 02-20-2010 I was doing something 'wrong' with hisoft basic - it was using a floating point division. It always uses a floating point division unless it's in the form INT(a/b) in which case it uses integer division. I'll be retesting and posting new times - only fair, since I assume the other compilers with integer variables are using integer division at the v=k/2*3+4-5 stage. I'm changing that to INT(k/2)*3+4-5. The time for benchmark 7 went down to 0.5 seconds. Much improved. Re: Compiler Speed Trials - boriel - 02-24-2010 Well, -O2 & -O3 seems to be fixed (most of the -On problems were related to previous fixes, in fact). Compiling B7 with -O3 reduces execution time to 0.16segs :!: :wink: So, as I said, -O3 has a great positive impact on array access performance. ![]() (Screenshot) Suggestion: try the benchmarks again using -O3, to see if it improves times on other benchmarks too (download ZX Basic v1.2.5-r1489b here: <!-- m --><a class="postlink" href="http://www.boriel.com/files/zxb/zxbasic-1.2.5r1489b.msi">http://www.boriel.com/files/zxb/zxbasic-1.2.5r1489b.msi</a><!-- m --> ) Re: Compiler Speed Trials - britlion - 05-25-2010 Finally got a quick chance to test this. Yes, -O3 does improve BM7 - but the 0.16 seconds value isn't really fair. -O3 reports that variables are not used and optimizes out all the loops! Putting a print M(1),k,i,var at the end of the program makes it actually do the work rather than skip it (but putting the print AFTER the time is recorded doesn't extend the time), and it duly recorded a time of 2.12 seconds. This is a noticeable improvement, but still a long way behind the code that Hisoft Basic and other integer compilers make. It still looks as though the array handling is somewhat behind other implementations. Incidentally, also tested BM 8, with -O3, which shows about a 16% improvement! It's clear that things like Tobos use very highly optimized FP math structures. Changing out sin for the fSin function listed in the library (which breaks with -O3 btw) ran in 17 seconds instead - showing 7 seconds of speed up (41%). There might be a very strong case, at some point, for looking into optional faster FP functions. Re: Compiler Speed Trials - boriel - 05-25-2010 britlion Wrote:Finally got a quick chance to test this. Yes, -O3 does improve BM7 - but the 0.16 seconds value isn't really fair. -O3 reports that variables are not used and optimizes out all the loops!I don't remember that optimization. Which code snippet are you using? BM7 above? Quote:Putting a print M(1),k,i,var at the end of the program makes it actually do the work rather than skip it (but putting the print AFTER the time is recorded doesn't extend the time), and it duly recorded a time of 2.12 seconds.Did you also run this modified version on the other compilers? (they might also be doing some optimizations: so just to be sure). Please paste the benchmark code here, or tell me witch one are you using. If you made some modifications to BM7 and or BM8, please paste them there. Also, the code MUST be the same (e.g. no function calls for any compiler, of function calls for any of them, etc...). Quote:This is a noticeable improvement, but still a long way behind the code that Hisoft Basic and other integer compilers make. It still looks as though the array handling is somewhat behind other implementations.Do the other compilers allow multidimensional arrays of float / string / Integers? Quote:Incidentally, also tested BM 8, with -O3, which shows about a 16% improvement!As told before, we could use add a --fast-floating-point option to include Fast FP routines instead of ROM ones (most compilers do). This will eat memory for sure (in fact, z88dk uses ROM calc routines too). FP routines aren't used in games. The most common technique is to use precomputed table values (mostly in demos). Re: Compiler Speed Trials - britlion - 05-26-2010 boriel Wrote:britlion Wrote:Finally got a quick chance to test this. Yes, -O3 does improve BM7 - but the 0.16 seconds value isn't really fair. -O3 reports that variables are not used and optimizes out all the loops!I don't remember that optimization. Which code snippet are you using? BM7 above? Hmm. You yourself said: boriel Wrote:if O3 were in use, this program could be reduced to a single NOP, as it does nothing (it does not print on the screen). All I did was add the line print M(1),k,i,var to the end of BM7 listed above. It prints it once, and it prints it AFTER it's worked out how long it took; so it's not like the print line took 2 seconds to print. It HAS to be that -O3 recognizes the loops and variables aren't being "used" and deletes them from the code. It normally would be quite right to do so, as well. Using them in a print statement, slows the whole thing back down. Boriel Wrote:Did you also run this modified version on the other compilers? (they might also be doing some optimizations: so just to be sure). Please paste the benchmark code here, or tell me witch one are you using. It's really not modified, apart from asking it to print the values of some variables right before it ends. As for running it on other compilers; I didn't - I don't even /have/ all of them handy. I do have hisoft basic, however, and I tested that one myself. I'll get round to remaking it and add in that print statement to be sure, but I don't think it even has something like the -O3 as an optimization loop. It does seem to treat array variables as fast as any other type of variable. Boriel Wrote:Do the other compilers allow multidimensional arrays of float / string / Integers? You asked that before, and I'll say again: Almost all of them only allow single dimension number arrays. (Zip compiler won't deal with strings AT ALL!). Hisoft basic does allow multidimensional arrays and string arrays: Hisoft Basic Manual Wrote:HiSoft BASIC supports numeric and string arrays of up to 2 dimensions. Ordinary string variables behave as in BASIC except that they must not exceed in length the amount of space reserved for them at compile time. By default this is 257 bytes (to allow a string of up to 255 characters, plus 2 bytes for the length) but it can be changed by means of the REM : LEN directive. Quite outside the array issues, I think Hisoft Basic's string handling is vastly less flexible, but far faster than yours. Your strings are always mutable, and of variable sizes. Hisoft uses far less memory efficient fixed string system, not in the heap, which is faster to use. As always, it's memory size and waste vs speed, here. Luckily I've never seen your neat and efficient strings running too slowly, so I think the only "issue" here is array handling. I don't know why your version of the array handling code is the slowest of the tested integer compilers. Is it to do with it handling more variable types, and having to deal with that? Does it internally know the difference between an array of strings and an array of bytes? Are these different types to the compiler? A fixed size array of numbers is really just a lookup table. Your variable sized string handling (while brilliant in its memory efficiency) makes string arrays much much more awkward to deal with. If you're using the same code to handle arrays of ANYTHING, I can see why hisoft seems to be going faster, here. Re: Compiler Speed Trials - boriel - 05-26-2010 britlion Wrote:This "reduced to a single NOP" optimization was removed in 1.2.5 since an empty loop makes sense for programmers who need execution delay. But there are other "unused variables" optimizations which only optimize SPACE (memory) not SPEED (code). So I still don't understand this difference in Speed. Anyway, here is your latest BM7 code:boriel Wrote:britlion Wrote:Finally got a quick chance to test this. Yes, -O3 does improve BM7 - but the 0.16 seconds value isn't really fair. -O3 reports that variables are not used and optimizes out all the loops!I don't remember that optimization. Which code snippet are you using? BM7 above? Code: 7 REM :INT +a,k,v,i,m() Britlion Wrote:ZX Basic allows per-element size up to 65535 char strings (2 bytes length). I could reduce it to 1 byte (up to 255), but I know of many Sinclair BASIC programs that use long strings (e.g. 1000 chars) for strange purposes. In fact I used this technique very often. Thus, not supporting 256+ length strings will break (even more) compatibility with Sinclair Basic.Boriel Wrote:Do the other compilers allow multidimensional arrays of float / string / Integers? Britlion Wrote:Does it internally know the difference between an array of strings and an array of bytes? Are these different types to the compiler? A fixed size array of numbers is really just a lookup table. Your variable sized string handling (while brilliant in its memory efficiency) makes string arrays much much more awkward to deal with. If you're using the same code to handle arrays of ANYTHING, I can see why hisoft seems to be going faster, here.Yes, having 1 dimensional array (a vector) is very fast. For a vector of bytes, you can even use IX + n indirections. Having up to 2 dimension arrays can also be optimized (in fact, z88dk also uses 2 dimensional arrays or vector, I can't recall know, but in the end, you have to compute the element offset your self). ZX BASIC tries to be as much compatible as possible with Sinclair Basic. So it allows multiple dimension array. Each dimension carries out a multiplication. It also allows "any size" elements (BTW, Strings are pointers to the Heap, hence 2 byte elements). Currently, only 1, 2, 4, 5 element sizes are allowed. The multiply-chain (1 multiplication per dimension) ends with an extra multiplication (element size) to get the final element offset. This final element-size multiplication could be slightly optimized:
But in the future, when objects / struct are available, "anysize" elements will appear => multiplication. Update: You can implement an array as a cascade of look-up tables. This is the way I implement them in C (it's really fast), but in a 48K-memory machine this is prohibitive! :| More notes... - boriel - 05-27-2010 Ok: I downloaded Hisoft Basic 1.1 from World of Spectrum, and compile and run your Test BM7. It prints 4.8 segs. I recompiled it with ZX BASIC (just declaring variables as Uintegers, which is the +INT equivalent), and execute it. It prints 2 segs. This is 100% faster (or x2 speed). Conclusion: Running *THE SAME* program, ZX Basic compiles better than Hisoft 1.1 Observations:
|