The author of the original routine, na_th_an told me they have released v2.0 which is faster:
http://www.mojontwins.com/warehouse/fsp2.0.asm
They also gave me permission to include in the compiler, provided they're mentioned (they will be).
We need to wrap this code with sub/end sub and see if it works, and runs faster than previous version.
In this version they use LDI as we did. I see a tentative optimization with the push bc/pop bc sequences, as they can be replaced with inc bc/inc bc.
http://www.mojontwins.com/warehouse/fsp2.0.asm
They also gave me permission to include in the compiler, provided they're mentioned (they will be).
We need to wrap this code with sub/end sub and see if it works, and runs faster than previous version.
In this version they use LDI as we did. I see a tentative optimization with the push bc/pop bc sequences, as they can be replaced with inc bc/inc bc.