Analyser: Generic MFM Support - Gap

2010-04-08

In the previous WIP, we talked about the new algorithm for gap analysation we developed because after issues found in testing the previous one. This WIP will go into more detail on exactly how the gap is used, to help highlight the problems in analysing it. In particular we should explain what we mean when we said “data can be read in two directions between sectors”.

Block Gap

From now on a (logical) block refers to a logically grouped entity on the disk surface written in one go. This might be a complete block to be read by a controller, or just a header or data block as in MFM, or header and data written together like in AmigaDOS, but exactly what is irrelevant; what matters is that in order to enable any controller to read it, it has to be written continuously, in one go.

Here is how a logical sector/block is written. You have some lead-in, to allow the controller to lock onto the cells, within tolerance this is auto-detected (sort of) and locked by PLLs or other ways to compensate for the speed wobble of the drive, so you always have this. Then you have marks (sync), then whatever else needed for the block and finally usually some lead-out.

To visualize what you would see (if considering what is between two blocks):

...
<gap>
lead-in
 <block N>
lead-out
<gap>
lead-in
 <block N+1>
lead-out
...

Since the disk is circular, you will eventually start over at whatever block you started with first, but the order of things remain the same.

  • If you are reading before a block, you can possibly see a lead-in.
  • If you are reading after a block, you will see a block and a lead-out.
  • If you are reading between blocks you can see lead-out, gap, lead-in.

The gap above may or may not contain data patterns or anything else, a single value, different values, zones of patterns etc.

For simplicity the entire area of lead-out, gap, and lead-in is usually referred to just as “gap”.

Track Gap

Block (aka sector) gaps are between normal consecutive blocks in the order of writing; there is one gap between the last block written and the first one, and as such there is only one track gap for each track. The main difference between block gap and track gap is that the track gap is used to compensate for the track length differences caused by drive speed and wobble and is usually sufficiently long for that purpose. E.g. If you have a slightly faster drive, your track gap shrinks. With a slower drive it gets larger, without ever affecting the rest of the recording.

Track gaps can be referred as fillers as well, they may or may not be different values depending on being present after the lead-out or before the lead-in.

Algorithm Considerations

Now imagine that (for simplicity) you have an area of bytes for a lead-out, gap, lead-in sequence that is perfectly divisible for data elements, say bytes, there is no remainder. If you start decoding at lead-out towards the next block, chances are you would read the exact same values you would read if you instead started reading from the lead-in of the next block towards the lead-out of the current block.

Ok, now imagine the entire gap can NOT be divided into complete bytes, you have a fractional data somewhere. This can happen easily on:

  • Write splice due to rewritten data (the gap between a sector header and sector data part is used exactly to compensate for non-precise timing)
  • Track gaps
  • Gap was designed that way in the first place (not likely though for commercial mastering due to limitations)
  • ...other special cases such as bad reads, etc.

Now to make life a bit more complicated:

  • You do not and cannot assume the length of lead-out, gap, lead-in. It is possible, that one is short the other is long.
  • It is possible that gap is completely unwritten or just partially written.
  • There may intentionally be data in any part of the gap.
  • Sync marks may have been added that would cause the controller to resync and read data differently to how you would see them.
  • “Weak” bits added.
  • Many other things!

You do not want to keep the fractional data in your analysis and it may be a single byte or more. For a track gap, if drive was slow or the track gap is unwritten (noise) it is very likely to be a sequence of rubbish.

So... once you read a gap that has fractional data, at the point of fraction your reading “slips” by the length of the fraction - but you can still see legible data that is not correct if your replicate it. Why? Because if there was real data hidden or even just pattern values checked they would now read a different value, if the position to start reading from is incorrect.

It’s easy to see that reading in either directions between the same blocks (<lead-out, gap, lead-in> vs <lead-in, gap, lead-out>) would produce first correct values, then some garbage, then some seemingly correct but in reality shifted data, that if checked by the game would be found incorrect.

  • <lead-out, gap, lead-in> is referred by the analyser as forward or alignment ‘0’ since you continue reading in the order you would try to decode the data.
  • <lead-in, gap, lead-out> is referred as backward or alignment ‘1’.

Remember, as long as the gap size is fractional, neither is correct, they are only correct until the fractional element. But... Which one is the fractional element? How long is it? Are you sure it’s not data/protection/signature whatever? Is there any data in the gap? Just some of the questions a gap resolving algorithm needs to answer.

Our implementation uses different views to understand the gaps: the gap between blocks (lead-out, gap, lead-in) and the gap before and after a block (gap, lead-in, block, lead-out, gap) depending on what piece of the puzzle it is trying to resolve.

Note: It is impossible to (properly!) analyse the gap if read through a hardware controller (such as the ones found in a PC) as it would resync the data on sync, and hence the gaps would never have a fractional size and so part of it will not be readable. You would have to analyse the code trying to read the gap data to see the original intent, and create a data sequence that would simulate that data expected by the program - as you would not be able to understand by the partial data.

Thalion ST games use gap sequences that partially can’t be read by generic MFM controllers, and as such the original mastering data cannot be determined and thus re-imaged - but there are other games using this trick too.