Login

LCD · 05-29-2010, 10:12 AM

This will put Tiles of 2x2 characters with attributes on screen, and is not restricted to use a particular number of tiles, as it uses a memory pointer to the adress of tile in memory, and it is almost fast enough for games (until someone will write a ASM version)
Note: Lines(23) is used for screen adress calculation. I used only 24 bytes for this table (without bAND function I need a Table for this). not using a array but InLine ASM DEFB data can save 3 bytes.
POKE and PEEK UINTEGER use two-Byte transfer, so this is much faster than transfer of a single byte.
Screen adress calculation is splited (so variable A had to be created) because of a calculation problem. Usualy I would write these lines:

Code:
a=peek(@lines+y+3)

scr=(a<<5)+x+16384

as:

Code:
scr=peek(@lines+y+3)<<5+x+16384

but it does not work (yet)
also if y is not uinteger but ubyte type, scr value is calculated wrong, so even if y is in range of ubyte, do not change it as binary shifts on Ubyte also results in a ubyte.

Code:
Dim lines(23) As uByte => { _

    0,1,2,3,4,5,6,7,64,65,66,67,68,69,70,71,128,129,130,131,132,133,134,135 }

sub puttile(x as Uinteger,y as Uinteger,adr as Uinteger)

    dim scr as Uinteger

    dim a as Uinteger

    a=peek(@lines+y+3)

    scr=(a<<5)+x+16384

    poke Uinteger scr,peek(Uinteger,adr)

    poke Uinteger scr+256,peek(Uinteger,adr+2)

    poke Uinteger scr+512,peek(Uinteger,adr+4)

    poke Uinteger scr+768,peek(Uinteger,adr+6)

    poke Uinteger scr+1024,peek(Uinteger,adr+8)

    poke Uinteger scr+1280,peek(Uinteger,adr+10)

    poke Uinteger scr+1536,peek(Uinteger,adr+12)

    poke Uinteger scr+1792,peek(Uinteger,adr+14)

    poke Uinteger 22528+x+(y<<5),peek(Uinteger,adr+32)

    a=peek(@lines+y+4)

    scr=(a<<5)+x+16384

    poke Uinteger scr,peek(Uinteger,adr+16)

    poke Uinteger scr+256,peek(Uinteger,adr+18)

    poke Uinteger scr+512,peek(Uinteger,adr+20)

    poke Uinteger scr+768,peek(Uinteger,adr+22)

    poke Uinteger scr+1024,peek(Uinteger,adr+24)

    poke Uinteger scr+1280,peek(Uinteger,adr+26)

    poke Uinteger scr+1536,peek(Uinteger,adr+28)

    poke Uinteger scr+1792,peek(Uinteger,adr+30)

    poke Uinteger 22560+x+(y<<5),peek(Uinteger,adr+34)

End sub

dim x,y as ubyte

dim adr as Uinteger

adr=0

for y=0 to 11

    for x=0 to 15

        puttile(x<<1,y<<1,adr)

        'adr=adr+36

    next x

next y

end

britlion · 05-29-2010, 02:52 PM

PutTile? I'm wondering if the latest version of the fourspriter code might be able to rework for that. Hmm. Maybe. It uses data in an akward format - but it's /really/ fast. It uses PUSH and POP to write its four character squares.

Of course, I got it tied up in knots and handed it to Boriel to fix :-) When that happens: Call a Genius, I always say.

***boriel*** · 05-29-2010, 03:55 PM

britlion Wrote:PutTile? I'm wondering if the latest version of the fourspriter code might be able to rework for that. Hmm. Maybe. It uses data in an akward format - but it's /really/ fast. It uses PUSH and POP to write its four character squares.

Of course, I got it tied up in knots and handed it to Boriel to fix :-) When that happens: Call a Genius, I always say.

Sorry, I did't understood very well :oops: Did you already passed me the FINAL version of fourspriter to be included in the library??

britlion · 05-29-2010, 03:59 PM

Nope. It's a little bit broken - though the actual sprite code should be solid. The only problem it has is when sprites overlap, right now.

I'm not sure of the best solution:

1> Instead of Sprite 1: Erase-Print, Sprite 2: Erase-Print.... etc; could do Erase-Erase-Erase-Erase-print-print-print-print (as the original fourspriter does)

2> Outside the sprite printing loop, flag sprites that overlap, and turn one of them off while overlapping.

I was looking at doing version 2, because it keeps the main loop faster; but it is less elegant. On the plus side, it does mean that a collision detection function could be implemented very easily.

[What I was talking about was using the Fourspriter fast push-pop code as part of a 4 character Put-Tiles function, as listed here]

LCD · 05-29-2010, 07:40 PM

britlion, if you can adapt the fast fourspriter routine, it would be great.

britlion · 05-30-2010, 12:29 PM

I'm sure it wouldn't be too hard to rip part of that for that purpose.

Though the data would have to be stored in 16 bit sections - so you'd need the data for the top line of both characters, then the data for the second line of two characters, and then the 16 bits for the third line and so on.

LCD · 05-30-2010, 09:47 PM

britlion Wrote:I'm sure it wouldn't be too hard to rip part of that for that purpose.

Though the data would have to be stored in 16 bit sections - so you'd need the data for the top line of both characters, then the data for the second line of two characters, and then the 16 bits for the third line and so on.

Great. Anyway, the data format looks exactly the same as for my routine, POKE UINTEGER stores two byte too. Looking forward to see it.

britlion · 05-31-2010, 01:51 AM

Here's a start point for you, LCD:

(We'll need these functions! for the FourSpriter version anyway - and that will be quite long, if fast....)

Code:
FUNCTION scrAddress(x as uByte, y as uByte) as Uinteger

asm 

; This function returns the address into HL of the screen address

; x,y in character grid notation. 

; Original code was extracted by BloodBaz - Adapted for ZX BASiC by Britlion from Na_TH_AN's fourspriter

         ; x Arrives in A, y is in stack.

         and     31

         ld      l,a

         ld      a,(IX+7) ; Y value

         ld      d,a

         and     24

         add     a,64

         ld      h,a

         ld      a,d

         and     7

         rrca

         rrca

         rrca

         or      l

         ld      l,a

end asm

END FUNCTION

Code:
FUNCTION attrAddress (x as uByte, y as uByte) as uInteger               

';; This function returns the memory address of the Character Position

';; x,y in the attribute screen memory.

';; Adapted from code by Jonathan Cauldwell - Adapted for ZX BASiC by Britlion from Na_TH_AN's fourspriter

asm

         ld      a,(IX+7)        ;ypos

         rrca

         rrca

         rrca               ; Multiply by 32

         ld      l,a        ; Pass to L

         and     3          ; Mask with 00000011

         add     a,88       ; 88 * 256 = 22528 - start of attributes.

         ld      h,a        ; Put it in the High Byte

         ld      a,l        ; We get y value *32

         and     224        ; Mask with 11100000

         ld      l,a        ; Put it in L

         ld      a,(IX+5)   ; xpos 

         add     a,l        ; Add it to the Low byte

         ld      l,a        ; Put it back in L, and we're done. HL=Address.

end asm

END FUNCTION

***boriel*** · 05-31-2010, 11:02 AM

There's already such a function in attr.asm library, but this one uses a different approach:

Code:
__ATTR_ADDR:

    ; calc start address in DE (as (32 * d) + e)

    ; Contributed by Santiago Romero at http://www.speccy.org

    ld h, 0

    ld l, d

    add hl, hl   ; HL = HL*2

    add hl, hl   ; HL = HL*4

    add hl, hl   ; HL = HL*8

    add hl, hl   ; HL = HL*16

    add hl, hl   ; HL = HL*32

    ;; Note: *THIS IS WRONG*

    ; the addition of 6144 could be optimized to

    ; ld a, 18h

    ; add a, h

    ; ld h, a 

    ; this saves 3 T-States  (The above calculation misses E register)

    ;; adds 6144 for attribute start

    ld d, 18h ; DE = 6144 + E. Note: 6144 is the screen size (before attr zone)

    add hl, de

    ld de, (SCREEN_ADDR)    ; Adds the screen address

    add hl, de

    ; Return current screen address in HL

    ret

Please check which one is faster (I'm not sure): This function takes 102 T-States (RET Included), your takes 67 (RET Included). With the commented optimization it will take 99 T-States.

The problem here is __ATTR_ADDR is taking into account a configurable SCREEN_ADDRESS variable (previously discussed here). So if you want your program to work in a variable screen address, you should work considering 0 offset an adding (SCREEN_ADDRESS) at the end, so should change it this way:

Code:
FUNCTION FASTCALL attrAddress(x as uByte, y as uByte) as Uinteger : REM Will issue a Warning

asm 

; This function returns the address into HL of the screen address

; x,y in character grid notation. 

; Original code was extracted by BloodBaz - Adapted for ZX BASiC by Britlion from Na_TH_AN's fourspriter

         ld e, a    ; X comes in A register (fastcall). Saves X in E register

         pop hl    ; ret Address

         pop af    ; Get Y in A [F not used]

         ld d, a    ; Saves Y in D register, So DE register stores YX coords.     

         push hl   ; Put RET address back in the Stack

         ;; 39 T-States up to here, but we don't count this to be fair with the above

         ;; So 0 T-States. Start counting them from here

         ;; At this point, A already has Y coord

         rrca       

         rrca

         rrca               ; Multiply by 32

         ld      l,a        ; Pass to L

         and     3          ; Mask with 00000011

         ;; 23 T-States up to here

         ;; The following must be changed, start form 0 and add (SCREEN_ADDRESS) later

         ; add     a,88       ; 88 * 256 = 22528 - start of attributes.

         add     a,24       ; 24 * 256 = 6144 - start of attributes from address 0x0000h

         ld      h,a        ; Put it in the High Byte

         ld      a,l        ; We get y value *32

         and     224        ; Mask with 11100000

         ;; The following can be optimized ???

         ; ld      l,a        ; Put it in L

         ; ld      a,d   ; xpos 

         ; add     a,l        ; Add it to the Low byte

         add  a, e        ; add xpos

         ld      l,a        ; Put it back in L, and we're done. HL=Address.

         ;; 53 T-States up to here

         ld de, (SCREEN_ADDRESS)

         add hl, de  ;; Total T-States = 84 T-States

         ;; (implicit RET)  + 10 T-States = 94 T-States

end asm

END FUNCTION

So this function, adapted for a configurable screen address is 8 T-states faster than the original in ZX Basic (only 5 if the commented optimization is done) :?:

LCD · 05-31-2010, 12:05 PM

britlion Wrote:Here's a start point for you, LCD:

(We'll need these functions! for the FourSpriter version anyway - and that will be quite long, if fast....)

Currently just tested by filling screen completly with tiles 8 times:
131 Frames with my table
128 Frames with your screen/attr calculation.
Not bad after all, because the now only the drawing needs optimisations.

britlion · 05-31-2010, 12:29 PM

boriel Wrote:
Code:
ld de, (SCREEN_ADDRESS) add hl, de ;; Total T-States = 84 T-States ;; (implicit RET) + 10 T-States = 94 T-States end asm END FUNCTION
So this function, adapted for a configurable screen address is 8 T-states faster than the original in ZX Basic (only 5 if the commented optimization is done) :?:

I haven't looked yet, but I will. Could the assembler/compiler optimize this by KNOWING if the screen address is changed ever? I would consider it unusual code to move it on the spectrum?
(I know, multi-platform). One optimization would be spectrum specific. If screen address=16384, then go short on the code :-)

Incidentally, I was looking at some asm:

Code:
poke Uinteger @ptDatapoint+3,value becomes:

ld hl, __LABEL__ptDataPoint

    inc hl

    inc hl

    inc hl

Can't the assembler deal with

Code:
ld hl, __LABEL__ptDataPoint+3

Being able to do that would reduce the code quite a lot, I would think - if the assembler could code the actual value there, instead of the base and then inc three times.

***boriel*** · 05-31-2010, 12:34 PM

LCD Wrote:
britlion Wrote:Here's a start point for you, LCD:

(We'll need these functions! for the FourSpriter version anyway - and that will be quite long, if fast....)
Currently just tested by filling screen completly with tiles 8 times:
131 Frames with my table
128 Frames with your screen/attr calculation.
Not bad after all, because the now only the drawing needs optimisations.

Then I suggest you to use the 2nd routine if you want SCREEN_ADDRESS relocation. In fact, Drawing routines can also be made relocatable just by modifying the PLOT.ASM low-level routine. :wink:

britlion · 05-31-2010, 12:45 PM

Wait, wait....

Boriel - the "attradrress" function you timed there is actually the SCREEN address code I listed above, not the attribute code...

I'm actually also still hopelessly confused with the changes you made to get the values out with pop. I've stared at that stack, and I don't see how two pops gets you values 5,6 and 7,8. Surely that gets you values 1,2 and 3,4.

Guess I still haven't quite got the hang of how you stack variables. To the point where I'm writing putTile, and poking the screen address into a memory value instead of try to work out where it is. (in IX-3,4 apparently...)

Finally, how can you fastcall with multiple parameters???

***boriel*** · 05-31-2010, 04:27 PM

britlion Wrote:Wait, wait....

Boriel - the "attradrress" function you timed there is actually the SCREEN address code I listed above, not the attribute code...

Not exactly. I just copied by mistake the comment line of the SCREEN address function, but the function body is actually the ATTR address.

Quote:I'm actually also still hopelessly confused with the changes you made to get the values out with pop. I've stared at that stack, and I don't see how two pops gets you values 5,6 and 7,8. Surely that gets you values 1,2 and 3,4.

Guess I still haven't quite got the hang of how you stack variables. To the point where I'm writing putTile, and poking the screen address into a memory value instead of try to work out where it is. (in IX-3,4 apparently...)

By values 5,6 etc... you mean (IX + 5) and (IX + 6)... don't you? :roll:
Well, I think I've somewhat already explained it here.

Now, this is important:
FASTCALL convention call is like STDCALL but:

The 1st parameter will be passed in the registers following the register convention (A => 8 bits, HL => 16 bits, DEHL => 32 bits AEDCB => 5 bytes, Float)
If more parameters are specified, the compiler will issue a warning, and will push them onto the stack in *reverse order as in STDCALL.
Finally the routine is called. But in the entry point, NOTHING is done to the IX register. Everything must be done by the user.

FASTCALL routines are ideal for single/none parameter functions and routines not having local variables and mainly done in ASM. When there are more than 1 parameters, the user *must* clean up the stack on return or the program might crash, etc.... That's why on normal functions you have to go to the LEAVE point (the end of the function) instead of directly using an asm RET. If you cleanup the stack at the beginning, you can RET at any point. On STDCALL The compiler will write everything required to pop out parameters and returning safe.

britlion · 05-31-2010, 06:15 PM

Oh, so it's not passing the first value in the stack - stdcall passes it in the register AND in the stack, which is a little confusing.

So we CAN use FASTCALL, even if the compiler complains to shortcut that, and use a shorter stack. I think I see.

There's the confusion - fastcall would say "hey, you can't do that!" and so I sort of went with that.

Login
Username:
Password:	Lost Password?
	Remember me