Posts: 615
Threads: 49
Joined: Feb 2009
Reputation:
2
This will put Tiles of 2x2 characters with attributes on screen, and is not restricted to use a particular number of tiles, as it uses a memory pointer to the adress of tile in memory, and it is almost fast enough for games (until someone will write a ASM version)
Note: Lines(23) is used for screen adress calculation. I used only 24 bytes for this table (without bAND function I need a Table for this). not using a array but InLine ASM DEFB data can save 3 bytes.
POKE and PEEK UINTEGER use two-Byte transfer, so this is much faster than transfer of a single byte.
Screen adress calculation is splited (so variable A had to be created) because of a calculation problem. Usualy I would write these lines:
Code: a=peek(@lines+y+3)
scr=(a<<5)+x+16384
as:
Code: scr=peek(@lines+y+3)<<5+x+16384
but it does not work (yet)
also if y is not uinteger but ubyte type, scr value is calculated wrong, so even if y is in range of ubyte, do not change it as binary shifts on Ubyte also results in a ubyte.
Code: Dim lines(23) As uByte => { _
0,1,2,3,4,5,6,7,64,65,66,67,68,69,70,71,128,129,130,131,132,133,134,135 }
sub puttile(x as Uinteger,y as Uinteger,adr as Uinteger)
dim scr as Uinteger
dim a as Uinteger
a=peek(@lines+y+3)
scr=(a<<5)+x+16384
poke Uinteger scr,peek(Uinteger,adr)
poke Uinteger scr+256,peek(Uinteger,adr+2)
poke Uinteger scr+512,peek(Uinteger,adr+4)
poke Uinteger scr+768,peek(Uinteger,adr+6)
poke Uinteger scr+1024,peek(Uinteger,adr+8)
poke Uinteger scr+1280,peek(Uinteger,adr+10)
poke Uinteger scr+1536,peek(Uinteger,adr+12)
poke Uinteger scr+1792,peek(Uinteger,adr+14)
poke Uinteger 22528+x+(y<<5),peek(Uinteger,adr+32)
a=peek(@lines+y+4)
scr=(a<<5)+x+16384
poke Uinteger scr,peek(Uinteger,adr+16)
poke Uinteger scr+256,peek(Uinteger,adr+18)
poke Uinteger scr+512,peek(Uinteger,adr+20)
poke Uinteger scr+768,peek(Uinteger,adr+22)
poke Uinteger scr+1024,peek(Uinteger,adr+24)
poke Uinteger scr+1280,peek(Uinteger,adr+26)
poke Uinteger scr+1536,peek(Uinteger,adr+28)
poke Uinteger scr+1792,peek(Uinteger,adr+30)
poke Uinteger 22560+x+(y<<5),peek(Uinteger,adr+34)
End sub
dim x,y as ubyte
dim adr as Uinteger
adr=0
for y=0 to 11
for x=0 to 15
puttile(x<<1,y<<1,adr)
'adr=adr+36
next x
next y
end
Posts: 805
Threads: 135
Joined: Apr 2009
Reputation:
5
PutTile? I'm wondering if the latest version of the fourspriter code might be able to rework for that. Hmm. Maybe. It uses data in an akward format - but it's /really/ fast. It uses PUSH and POP to write its four character squares.
Of course, I got it tied up in knots and handed it to Boriel to fix :-) When that happens: Call a Genius, I always say.
Posts: 1,763
Threads: 55
Joined: Aug 2019
Reputation:
24
britlion Wrote:PutTile? I'm wondering if the latest version of the fourspriter code might be able to rework for that. Hmm. Maybe. It uses data in an akward format - but it's /really/ fast. It uses PUSH and POP to write its four character squares.
Of course, I got it tied up in knots and handed it to Boriel to fix :-) When that happens: Call a Genius, I always say. Sorry, I did't understood very well :oops: Did you already passed me the FINAL version of fourspriter to be included in the library??
Posts: 805
Threads: 135
Joined: Apr 2009
Reputation:
5
Nope. It's a little bit broken - though the actual sprite code should be solid. The only problem it has is when sprites overlap, right now.
I'm not sure of the best solution:
1> Instead of Sprite 1: Erase-Print, Sprite 2: Erase-Print.... etc; could do Erase-Erase-Erase-Erase-print-print-print-print (as the original fourspriter does)
2> Outside the sprite printing loop, flag sprites that overlap, and turn one of them off while overlapping.
I was looking at doing version 2, because it keeps the main loop faster; but it is less elegant. On the plus side, it does mean that a collision detection function could be implemented very easily.
[What I was talking about was using the Fourspriter fast push-pop code as part of a 4 character Put-Tiles function, as listed here]
Posts: 615
Threads: 49
Joined: Feb 2009
Reputation:
2
britlion, if you can adapt the fast fourspriter routine, it would be great.
Posts: 805
Threads: 135
Joined: Apr 2009
Reputation:
5
I'm sure it wouldn't be too hard to rip part of that for that purpose.
Though the data would have to be stored in 16 bit sections - so you'd need the data for the top line of both characters, then the data for the second line of two characters, and then the 16 bits for the third line and so on.
Posts: 615
Threads: 49
Joined: Feb 2009
Reputation:
2
britlion Wrote:I'm sure it wouldn't be too hard to rip part of that for that purpose.
Though the data would have to be stored in 16 bit sections - so you'd need the data for the top line of both characters, then the data for the second line of two characters, and then the 16 bits for the third line and so on. Great. Anyway, the data format looks exactly the same as for my routine, POKE UINTEGER stores two byte too. Looking forward to see it.
Posts: 805
Threads: 135
Joined: Apr 2009
Reputation:
5
Here's a start point for you, LCD:
(We'll need these functions! for the FourSpriter version anyway - and that will be quite long, if fast....)
Code: FUNCTION scrAddress(x as uByte, y as uByte) as Uinteger
asm
; This function returns the address into HL of the screen address
; x,y in character grid notation.
; Original code was extracted by BloodBaz - Adapted for ZX BASiC by Britlion from Na_TH_AN's fourspriter
; x Arrives in A, y is in stack.
and 31
ld l,a
ld a,(IX+7) ; Y value
ld d,a
and 24
add a,64
ld h,a
ld a,d
and 7
rrca
rrca
rrca
or l
ld l,a
end asm
END FUNCTION
Code: FUNCTION attrAddress (x as uByte, y as uByte) as uInteger
';; This function returns the memory address of the Character Position
';; x,y in the attribute screen memory.
';; Adapted from code by Jonathan Cauldwell - Adapted for ZX BASiC by Britlion from Na_TH_AN's fourspriter
asm
ld a,(IX+7) ;ypos
rrca
rrca
rrca ; Multiply by 32
ld l,a ; Pass to L
and 3 ; Mask with 00000011
add a,88 ; 88 * 256 = 22528 - start of attributes.
ld h,a ; Put it in the High Byte
ld a,l ; We get y value *32
and 224 ; Mask with 11100000
ld l,a ; Put it in L
ld a,(IX+5) ; xpos
add a,l ; Add it to the Low byte
ld l,a ; Put it back in L, and we're done. HL=Address.
end asm
END FUNCTION
Posts: 1,763
Threads: 55
Joined: Aug 2019
Reputation:
24
There's already such a function in attr.asm library, but this one uses a different approach:
Code: __ATTR_ADDR:
; calc start address in DE (as (32 * d) + e)
; Contributed by Santiago Romero at http://www.speccy.org
ld h, 0
ld l, d
add hl, hl ; HL = HL*2
add hl, hl ; HL = HL*4
add hl, hl ; HL = HL*8
add hl, hl ; HL = HL*16
add hl, hl ; HL = HL*32
;; Note: *THIS IS WRONG*
; the addition of 6144 could be optimized to
; ld a, 18h
; add a, h
; ld h, a
; this saves 3 T-States (The above calculation misses E register)
;; adds 6144 for attribute start
ld d, 18h ; DE = 6144 + E. Note: 6144 is the screen size (before attr zone)
add hl, de
ld de, (SCREEN_ADDR) ; Adds the screen address
add hl, de
; Return current screen address in HL
ret
Please check which one is faster (I'm not sure): This function takes 102 T-States (RET Included), your takes 67 (RET Included). With the commented optimization it will take 99 T-States.
The problem here is __ATTR_ADDR is taking into account a configurable SCREEN_ADDRESS variable (previously discussed here). So if you want your program to work in a variable screen address, you should work considering 0 offset an adding (SCREEN_ADDRESS) at the end, so should change it this way:
Code: FUNCTION FASTCALL attrAddress(x as uByte, y as uByte) as Uinteger : REM Will issue a Warning
asm
; This function returns the address into HL of the screen address
; x,y in character grid notation.
; Original code was extracted by BloodBaz - Adapted for ZX BASiC by Britlion from Na_TH_AN's fourspriter
ld e, a ; X comes in A register (fastcall). Saves X in E register
pop hl ; ret Address
pop af ; Get Y in A [F not used]
ld d, a ; Saves Y in D register, So DE register stores YX coords.
push hl ; Put RET address back in the Stack
;; 39 T-States up to here, but we don't count this to be fair with the above
;; So 0 T-States. Start counting them from here
;; At this point, A already has Y coord
rrca
rrca
rrca ; Multiply by 32
ld l,a ; Pass to L
and 3 ; Mask with 00000011
;; 23 T-States up to here
;; The following must be changed, start form 0 and add (SCREEN_ADDRESS) later
; add a,88 ; 88 * 256 = 22528 - start of attributes.
add a,24 ; 24 * 256 = 6144 - start of attributes from address 0x0000h
ld h,a ; Put it in the High Byte
ld a,l ; We get y value *32
and 224 ; Mask with 11100000
;; The following can be optimized ???
; ld l,a ; Put it in L
; ld a,d ; xpos
; add a,l ; Add it to the Low byte
add a, e ; add xpos
ld l,a ; Put it back in L, and we're done. HL=Address.
;; 53 T-States up to here
ld de, (SCREEN_ADDRESS)
add hl, de ;; Total T-States = 84 T-States
;; (implicit RET) + 10 T-States = 94 T-States
end asm
END FUNCTION
So this function, adapted for a configurable screen address is 8 T-states faster than the original in ZX Basic (only 5 if the commented optimization is done) :?:
Posts: 615
Threads: 49
Joined: Feb 2009
Reputation:
2
britlion Wrote:Here's a start point for you, LCD:
(We'll need these functions! for the FourSpriter version anyway - and that will be quite long, if fast....) Currently just tested by filling screen completly with tiles 8 times:
131 Frames with my table
128 Frames with your screen/attr calculation.
Not bad after all, because the now only the drawing needs optimisations.
Posts: 805
Threads: 135
Joined: Apr 2009
Reputation:
5
boriel Wrote:Code: ld de, (SCREEN_ADDRESS)
add hl, de ;; Total T-States = 84 T-States
;; (implicit RET) + 10 T-States = 94 T-States
end asm
END FUNCTION
So this function, adapted for a configurable screen address is 8 T-states faster than the original in ZX Basic (only 5 if the commented optimization is done) :?:
I haven't looked yet, but I will. Could the assembler/compiler optimize this by KNOWING if the screen address is changed ever? I would consider it unusual code to move it on the spectrum?
(I know, multi-platform). One optimization would be spectrum specific. If screen address=16384, then go short on the code :-)
Incidentally, I was looking at some asm:
Code: poke Uinteger @ptDatapoint+3,value becomes:
ld hl, __LABEL__ptDataPoint
inc hl
inc hl
inc hl
Can't the assembler deal with
Code: ld hl, __LABEL__ptDataPoint+3
Being able to do that would reduce the code quite a lot, I would think - if the assembler could code the actual value there, instead of the base and then inc three times.
Posts: 1,763
Threads: 55
Joined: Aug 2019
Reputation:
24
LCD Wrote:britlion Wrote:Here's a start point for you, LCD:
(We'll need these functions! for the FourSpriter version anyway - and that will be quite long, if fast....) Currently just tested by filling screen completly with tiles 8 times:
131 Frames with my table
128 Frames with your screen/attr calculation.
Not bad after all, because the now only the drawing needs optimisations. Then I suggest you to use the 2nd routine if you want SCREEN_ADDRESS relocation. In fact, Drawing routines can also be made relocatable just by modifying the PLOT.ASM low-level routine. :wink:
Posts: 805
Threads: 135
Joined: Apr 2009
Reputation:
5
Wait, wait....
Boriel - the "attradrress" function you timed there is actually the SCREEN address code I listed above, not the attribute code...
I'm actually also still hopelessly confused with the changes you made to get the values out with pop. I've stared at that stack, and I don't see how two pops gets you values 5,6 and 7,8. Surely that gets you values 1,2 and 3,4.
Guess I still haven't quite got the hang of how you stack variables. To the point where I'm writing putTile, and poking the screen address into a memory value instead of try to work out where it is. (in IX-3,4 apparently...)
Finally, how can you fastcall with multiple parameters???
Posts: 1,763
Threads: 55
Joined: Aug 2019
Reputation:
24
britlion Wrote:Wait, wait....
Boriel - the "attradrress" function you timed there is actually the SCREEN address code I listed above, not the attribute code... Not exactly. I just copied by mistake the comment line of the SCREEN address function, but the function body is actually the ATTR address.
Quote:I'm actually also still hopelessly confused with the changes you made to get the values out with pop. I've stared at that stack, and I don't see how two pops gets you values 5,6 and 7,8. Surely that gets you values 1,2 and 3,4.
Guess I still haven't quite got the hang of how you stack variables. To the point where I'm writing putTile, and poking the screen address into a memory value instead of try to work out where it is. (in IX-3,4 apparently...)
By values 5,6 etc... you mean (IX + 5) and (IX + 6)... don't you? :roll:
Well, I think I've somewhat already explained it here.
Now, this is important:
FASTCALL convention call is like STDCALL but: - The 1st parameter will be passed in the registers following the register convention (A => 8 bits, HL => 16 bits, DEHL => 32 bits AEDCB => 5 bytes, Float)
- If more parameters are specified, the compiler will issue a warning, and will push them onto the stack in *reverse order as in STDCALL.
- Finally the routine is called. But in the entry point, NOTHING is done to the IX register. Everything must be done by the user.
FASTCALL routines are ideal for single/none parameter functions and routines not having local variables and mainly done in ASM. When there are more than 1 parameters, the user *must* clean up the stack on return or the program might crash, etc.... That's why on normal functions you have to go to the LEAVE point (the end of the function) instead of directly using an asm RET. If you cleanup the stack at the beginning, you can RET at any point. On STDCALL The compiler will write everything required to pop out parameters and returning safe.
Posts: 805
Threads: 135
Joined: Apr 2009
Reputation:
5
Oh, so it's not passing the first value in the stack - stdcall passes it in the register AND in the stack, which is a little confusing.
So we CAN use FASTCALL, even if the compiler complains to shortcut that, and use a shorter stack. I think I see.
There's the confusion - fastcall would say "hey, you can't do that!" and so I sort of went with that.
|