05-28-2013, 09:57 AM
Some ideas 
Is
At 15 T states
A better choice than:
At 7+4+7=18 T states perhaps? (Also 1 byte shorter, and doesn't mess up your A register, for what it's worth - which is usually the biggest reason it's useful).
Similarly:
You are faffing with A, where:
is more direct, surely?
I suspect that:
Again faffs with the A register (for 7+4+7 T states), where
Is 15 T states and doesn't faff with the A register.
CP 0 works, but is two bytes. You can trigger the flags with something like AND A - which is a single byte opcode, and faster.
Finally, I get the feeling that using a bunch of bytes in memory for the bit counter could be optimized by using bits of a byte for the bit information. But I'm way too distracted to try to refactor that right now
So I'll stick to simple tweak suggestions.
Al that said. Wow. You're using assembler, and it's working like a charm! Well done! This isn't easy stuff

Is
Code:
RL (HL)
At 15 T states
A better choice than:
Code:
LD A,(HL)
RLA
LD (HL),A
At 7+4+7=18 T states perhaps? (Also 1 byte shorter, and doesn't mess up your A register, for what it's worth - which is usually the biggest reason it's useful).
Similarly:
Code:
LD A,31
LD L,A
LD A,(__LABEL__bit0)
You are faffing with A, where:
Code:
LD L,31
LD A,(__LABEL__bit0)
is more direct, surely?
I suspect that:
Code:
LD A,(HL)
OR 1
LD (HL),A
Again faffs with the A register (for 7+4+7 T states), where
Code:
SET 0, (HL)
Code:
LD A,(__LABEL__bit0)
CP 0
JP Z,dontset0
CP 0 works, but is two bytes. You can trigger the flags with something like AND A - which is a single byte opcode, and faster.
Code:
LD A,(__LABEL__bit0)
AND A
JP Z,dontset0
Finally, I get the feeling that using a bunch of bytes in memory for the bit counter could be optimized by using bits of a byte for the bit information. But I'm way too distracted to try to refactor that right now

Al that said. Wow. You're using assembler, and it's working like a charm! Well done! This isn't easy stuff
