WinSTon Branching/Cycle Count Bug
23 November 2004
Apparently there is a problem in WinSTon where the cycle counts come out incorrectly in certain situations, as described here and here in more detail.
We took a look, and found the problem, which is described below.
First Thought
Our first thought was that it was bus arbitration locking the CPU (1) and/or bad instruction times (2).
- The video needs access to the host memory at the given time and that blocks the CPU accessing the bus, hence the instruction time varies. For this to work you would need to know the exact “DMA slots” taken by the video circuitry. People writing cycle-exact Commodore 64 emulators know about this kind of thing.
- CLR on a 68000 is an read-modify-write (RMW) cycle instruction. This means the address is first read, modified, then written. However stupid this sounds, it simplified the original design.
The confusion is not really surprising
If you cannot completely disable video DMA circuitry when sampling CPU timing on the ST, it will be slightly incorrect. This is due to bus arbitration - whenever the video needs bus access the CPU will be halted.
Fortunately, on the Amiga it is possible to sample CPU cycles properly, since all DMA access can be disabled (which is actually what our dumping software does in order to get accurate readings), and so we able (and did) find out what they were.
Both the cycle counts for the branch and the compare instruction are incorrect.
Calculations
The cmpa.l An,An
instruction timing is incorrect. It is 6 cycles, which is 2 off from 100.
The bcc
instruction timing is incorrect. If the branch is not taken, it is 8 cycles, if it is taken, it is 10. This is because setcpu pc
prefetches the data from the destination address and is not visible here - that is 4 cycles discounting arbitration. Note that instruction prefetch cycles are not seen here, as they are subject to bus arbitration so that is another 4 cycles not visible in the source excerpt, this is how the values come to be. So 10 cycles is another 2 off from 100.
So...
100 - (2+2) = 96
This particular timing instance is certainly and exactly 96 cycles without the two bugs, but there may be more. If there are timing problems with spectrum pictures or whatever, the instructions used in them should be revised as well...
Not Currently Fixed
Please note that we have not fixed this bug in the code, just investigated what is wrong and what to do to correct it. We have not touched anything in the emulator unless a change was necessary (FDC, init, exit, system clock and sound routines had to be altered for various reasons explained in previous WIPs). We do not want to change anything in the CPU core because we do not know it, and we would hate to break anything.