05-28-2010, 04:58 AM (This post was last modified: 11-03-2024, 09:29 AM by boriel.
Edit Reason: Fix url
)
Boriel, I owe you an apology - there is an issue with your hisoft basic test. I absolutely should have made it clearer - and I would have if I thought you were going to replicate the tests!
If you recall, I noted that Hisoft ran far more slowly than I expected. The code listed above that you used does indeed take 4.8 seconds. But it's using floating point math to calculate k/2 on line 120.
As specified in the Hisoft Basic Manual, and as I noted, I had to tweak it to read:
Code:
120 LET V=INT(k/2)*3+4-5
in order to use Integer math, there. I should have made that MUCH clearer.
It's also a little misleading to say "using the same code" ZX Basic doesn't run that program in two seconds unless you use DIM to specify Integers. That's fair of course, since the REM does the same thing for Hisoft Basic. But it is worth noting the code does have to be very slightly different. An END IF if nothing else!
Try comparing this version (arranged so Hisoft Basic uses Integer math):
Code:
7 REM :INT +a,k,v,i,m()
8 REM : OPEN#
9 CLS
10 POKE 23672,0: POKE 23673,0
90 POKE 23672,0
100 LET a=0: LET k=5: LET v=0
110 LET a=a+1
120 LET v=INT (k/2) *3+4-5
130 GO SUB 1000
140 DIM m(5)
150 FOR i=1 TO 5 : REM change to 0 to 4 for ZX BASIC
160 LET m(i)=a
170 NEXT i
200 IF a<1000 THEN GO TO 110 : REM NEEDS AN END IF FOR ZX BASIC
210 PRINT (PEEK 23672+256*PEEK 23673)/50
220 PRINT m(1),k,i,var
999 STOP
1000 RETURN
And you'll find the Hisoft basic version does indeed return a 0.5 second time.
What version did you use to get it to work on ZX BASIC precisely? I would guess something like (arranged so ZX BASIC uses Integer math):
Code:
7 REM :INT +a,k,v,i,m()
8 REM : OPEN#
DIM a,k,v,i as uInteger
9 CLS
10 POKE 23672,0: POKE 23673,0
90 POKE 23672,0
100 LET a=0: LET k=5: LET v=0
110 LET a=a+1
120 LET v=INT (k/2)*3+4-5
130 GO SUB 1000
140 DIM m(5) as uInteger
150 FOR i=0 TO 4
160 LET m(i)=a
170 NEXT i
200 IF a<1000 THEN GO TO 110 : END IF
210 PRINT (PEEK 23672+256*PEEK 23673)/CAST (FIXED, 50)
220 PRINT m(1),k,i,var
999 STOP
1000 RETURN
Under -O3, I get this to return in 2.06 seconds.
These two are as near as I can get to source code designed to do the same thing, making the same optimizations in each compiler - that is using integer values where possible. I think it's fair to compare them both in fully integer mode. I'm sorry to say that the numbers used in the table at the start of this thread are ones I can stand by, certainly for Hisoft and ZX BASIC, because I tested them personally; and have edited the post to show the best times I achieved in all cases.
britlion Wrote:Boriel, I owe you an apology - there is an issue with your hisoft basic test. I absolutely should have made it clearer - and I would have if I thought you were going to replicate the tests!
If you recall, I noted that Hisoft ran far more slowly than I expected. The code listed above that you used does indeed take 4.8 seconds. But it's using floating point math to calculate k/2 on line 120.
As specified in the Hisoft Basic Manual, and as I noted, I had to tweak it to read:
Code:
120 LET V=INT(k/2)*3+4-5
in order to use Integer math, there. I should have made that MUCH clearer.
It's also a little misleading to say "using the same code" ZX Basic doesn't run that program in two seconds unless you use DIM to specify Integers. That's fair of course, since the REM does the same thing for Hisoft Basic. But it is worth noting the code does have to be very slightly different. An END IF if nothing else!
That's why I was asking which code you were using exactly!
The code I use for both tests was this:
Code:
6 DIM a,k,v,i as UInteger: DIM m(5) as UInteger: REM comment this line in HiSoft Basic
7 REM :INT +a,k,v,i,m()
8 REM : OPEN#
9 CLS
10 POKE 23672,0: POKE 23673,0
90 POKE 23672,0
100 LET a=0: LET k=5: LET v=0
110 LET a=a+1
120 LET v=k/2*3+4-5 : REM Int(K/2) rounds to ILong
130 GO SUB 1000
140 DIM m(5)
150 FOR i=1 TO 5 : REM change to 0 to 4 for ZX BASIC
160 LET m(i)=a
170 NEXT i
200 IF a<1000 THEN GO TO 110 : REM NEEDS AN END IF FOR ZX BASIC
210 PRINT (PEEK 23672+256*PEEK 23673)/50
999 STOP
1000 RETURN
1010 PRINT m(1),k,i,var : REM Never executed
Britlion Wrote:And you'll find the Hisoft basic version does indeed return a 0.5 second time.
Then the compiler might be even doing a "Unused var removal" optimization. I need to get the generated ASM CODE, to see the Hisoft Routines, but It's my bet the program is being optimized that way or the m(i) access is optimized.
BTW I've optimized the array access a little (just 8 T-states per dimension).
I've created a benchmark directory in the compiler source and put this one, so benchmarks we create will be there.
Okay, I've reverse engineered the hisoft CM you put above (see attached file).
The routine is fair, and it's doing even an array initialization on *each pass* :!: :o
The bad news first: it's effectively working on a vector. Doing this might break the ZX BASIC compiler, since it's supposed to be multi-architectural.
The good news: Most of the time is gone in the 16bit multiplication used for array accesses. Since most of the arrays and element sizes are near 0, this routine uses it's own array-multiplication (HISoft Basic is doing the same for some multiplications). This will reduce the execution time down to 1.11 segs.
Even better, using multidimensional arrays does only add a little overhead (about 1.39 for 2 dims).
The FOR...NEXT does the comparison at the reverse. This is 10 T-states faster, so I change the FOR...NEXT scheme that way to.
Now this REALLY needs intensive testing (array and FOR...NEXT loops). I'm uploading a new 1.2.6 beta-r1571, if someone is interested.
I'm pretty sure this is setting HL' to the required value for BASIC before exiting. The Basic interpreter (and in particular the Interrupt service routine) assume that IY is pointed at the system variables and HL' is set to a specific value. I assume this one. Otherwise the spectrum can crash hard when dropping to BASIC from machine code.
I'm pretty sure this is setting HL' to the required value for BASIC before exiting. The Basic interpreter (and in particular the Interrupt service routine) assume that IY is pointed at the system variables and HL' is set to a specific value. I assume this one. Otherwise the spectrum can crash hard when dropping to BASIC from machine code.
Yes, I guess it's just that. If you look to ZX BASIC generated code, it stores IX (not needed), IY and HL', and recovers them on exit (unless returning with an error, RST #8). It's a pity IY is used for TIME interrupt, because IY could be very handy for managing data structures like Objects and structs. I think It would be a good idea to use IM2 for people wanting time-frames counting, without using IY.
;Instead of
neg
add a,N ;you want to calculate N-A
;Do it this way:
cpl
add a,N+1 ;neg is practically equivalent to cpl \ inc a
; -> save 1 byte and 4 T-states
And this one, when you learn it, is solid gold:
Looping with 16 bit counter
There are two ways to make loops with a 16bit counter :
* the naive one, which results in smaller code but increased loop overhead (24 * n T-states) and destroys a
Code:
ld bc, ...
loop:
; loop body here
dec bc
ld a, b
or c
jp nz,loop
* the slightly trickier one, which takes a couple more bytes but has a much lower overhead (12 * n + 14 * (n / 16) T-states)
(This is harder to understand why it works, but is MUCH faster - almost as fast as djnz with an 8 bit counter (and oddly written, such that it uses B and D as loop counters; personally I'd use B and C, I think)
Code:
dec de
ld b, e
inc b
inc d
loop2:
; loop body here
djnz loop2
dec d
jp nz,loop2
How it works is because if b=0 DJNZ will loop 256 times. So we start with B set to the number of loops past a multiple of 256, and then loop D lots of 256 times. It saves a lot of time over a reasonably large loop - for example, it's about half the overhead, just for something as small as a 1000 loop.
There are quite a few other similar tricks listed. You might find some are handy.
:o The last one is really nice :!: I could use it for string management, maybe...
I've improved the DRAW routine (slightly, but only a nano-bit faster).
Incidentally, Boriel - is the compiler correctly removing all the un-needed run time routines?
I noted that the Hisoft version of BM7 was around 350 bytes, and the ZX BASIC version was almost 1,800 bytes, with -O3 enabled.
I'm not sure if that's a problem or not, given that most of them would be needed for reasonably large programs; but if we're considering an option of a 45 byte overhead for a faster SQR function, for example, then not having the runtime packages we don't need becomes a little more important.
britlion Wrote:Incidentally, Boriel - is the compiler correctly removing all the un-needed run time routines?
I noted that the Hisoft version of BM7 was around 350 bytes, and the ZX BASIC version was almost 1,800 bytes, with -O3 enabled.
I'm not sure if that's a problem or not, given that most of them would be needed for reasonably large programs; but if we're considering an option of a 45 byte overhead for a faster SQR function, for example, then not having the runtime packages we don't need becomes a little more important.
It's OK. Because of the PRINT. Print routine is about 580bytes (ITALIC, BOLD, ATRIBUTES, etc...). But a --use-rom-print is on the way if you needn't ITALIC/BOLD nor speed (I wonder if the PRINT routine could be optimized more, BTW)
britlion Wrote:I think Hisoft optimized down to whether ink or paper changes were made, and didn't include code for them if not used.
I suppose you could arrange for the print routine to not include the BOLD and ITALIC (and INVERSE and OVER) routines, if it was made fairly modular?
I just looked at the print.asm - and it looks like it IS modular. All those #includes at the start - could the compiler make them optional, if not needed?
britlion Wrote:I think Hisoft optimized down to whether ink or paper changes were made, and didn't include code for them if not used.
I suppose you could arrange for the print routine to not include the BOLD and ITALIC (and INVERSE and OVER) routines, if it was made fairly modular?
I just looked at the print.asm - and it looks like it IS modular. All those #includes at the start - could the compiler make them optional, if not needed?
Because people asked me to support attribute control codes (e.g. CHR$(22, 1, 10) = AT 1, 10; etc...) Since PRINT a$ could also mean PRINT INVERSE 1; ITALIC 1; "xxx" in a single string, all must be included.