Post Tue Oct 25, 2011 1:32 am

Attribute Address

To unearth an aside I hadn't ever responded to in an old topic...I got playing.

boriel wrote:There's already such a function in attr.asm library, but this one uses a different approach:
  Code:
__ATTR_ADDR:
    ; calc start address in DE (as (32 * d) + e)
    ; Contributed by Santiago Romero at http://www.speccy.org
    ld h, 0
    ld l, d
    add hl, hl   ; HL = HL*2
    add hl, hl   ; HL = HL*4
    add hl, hl   ; HL = HL*8
    add hl, hl   ; HL = HL*16
    add hl, hl   ; HL = HL*32

    ;; Note: *THIS IS WRONG*
    ; the addition of 6144 could be optimized to
    ; ld a, 18h
    ; add a, h
    ; ld h, a
    ; this saves 3 T-States  (The above calculation misses E register)
   
    ;; adds 6144 for attribute start
    ld d, 18h ; DE = 6144 + E. Note: 6144 is the screen size (before attr zone)
    add hl, de

    ld de, (SCREEN_ADDR)    ; Adds the screen address
    add hl, de
   
    ; Return current screen address in HL
    ret

Please check which one is faster (I'm not sure): This function takes 102 T-States (RET Included), your takes 67 (RET Included). With the commented optimization it will take 99 T-States.

The problem here is __ATTR_ADDR is taking into account a configurable SCREEN_ADDRESS variable (previously discussed here). So if you want your program to work in a variable screen address, you should work considering 0 offset an adding (SCREEN_ADDRESS) at the end, so should change it this way:
  Code:
FUNCTION FASTCALL attrAddress(x as uByte, y as uByte) as Uinteger : REM Will issue a Warning
asm
; This function returns the address into HL of the screen address
; x,y in character grid notation.
; Original code was extracted by BloodBaz - Adapted for ZX BASiC by Britlion from Na_TH_AN's fourspriter
         ld e, a    ; X comes in A register (fastcall). Saves X in E register
         pop hl    ; ret Address
         pop af    ; Get Y in A [F not used]
         ld d, a    ; Saves Y in D register, So DE register stores YX coords.     
         push hl   ; Put RET address back in the Stack
         ;; 39 T-States up to here, but we don't count this to be fair with the above
         ;; So 0 T-States. Start counting them from here
         ;; At this point, A already has Y coord
         rrca       
         rrca
         rrca               ; Multiply by 32
         ld      l,a        ; Pass to L
         and     3          ; Mask with 00000011
         ;; 23 T-States up to here
         ;; The following must be changed, start form 0 and add (SCREEN_ADDRESS) later
         ; add     a,88       ; 88 * 256 = 22528 - start of attributes.
         add     a,24       ; 24 * 256 = 6144 - start of attributes from address 0x0000h
         ld      h,a        ; Put it in the High Byte
         ld      a,l        ; We get y value *32
         and     224        ; Mask with 11100000
         ;; The following can be optimized ???
         ; ld      l,a        ; Put it in L
         ; ld      a,d   ; xpos
         ; add     a,l        ; Add it to the Low byte
         add  a, e        ; add xpos
         ld      l,a        ; Put it back in L, and we're done. HL=Address.
         ;; 53 T-States up to here
         ld de, (SCREEN_ADDRESS)
         add hl, de  ;; Total T-States = 84 T-States
         ;; (implicit RET)  + 10 T-States = 94 T-States
end asm
END FUNCTION

So this function, adapted for a configurable screen address is 8 T-states faster than the original in ZX Basic (only 5 if the commented optimization is done) :?:



Just to be completely oddball, I started toying with this.
  Code:
takes Y value in D, and X value in E returns attr address

4 AND A ; clear carry flag (this may optimise out if we know it's cleared?)

7 LD H,22
4 LD A,D
; we know left three bits are zero, since d <= 23 so rotate around:
4 RLCA
4 RLCA
4 RLCA

; now it /could/ get interesting we have to rotate potential 1's into H, simultaneously turning that 22 into the 88 H needs to point to address 22528.
4 RLA
8 RL H
4 RLA
8 RL H
4 add a,l
4 ld l,a

63 T states.


I haven't tested this, and it's after midnight. So it might not work. But I think that doing left shifts all the way might be faster?

I think this shaves about 40 T states of each Attr lookup in the current assembly. More if it can be assured it doesn't need to clear the carry flag.

This method does demand that screen address is on 1K boundaries, so as to be divided by 4 and stored in H. The default of 16384 for a screeen mem start fits (attr=22528).