Analyser: Generic MFM Support - Gap Analyser

2010-04-07

The analyser can now properly detect the exact patterns used to generate the gap, regardless of how many different filler values were used.

As mentioned in a previous WIP, the algorithms developed for gap analysation have been replaced due to issues found in testing. The new code is very complex, and now is a good time to explain how it works.

First it decides whether the gap contains any real world data, e.g. serial numbers or other marks, etc. The new code decides how the gap area was filled with different values other than data. While this might sound easy, in reality, with often partially unwritten or overlapping gaps, as well as other complications, it can get very complex; not least in filtering the write splice. The goal is to restore the original mastering data, not to replicate imperfections caused by writing; such data couldn’t be compared later (i.e. to see if images differ or not) or prepared properly for writing.

In fact to deal with the complexity, the new code is very similar to the data detection algorithm. It maps anything that can be understood for certain as some kind of data, then continues with the remaining data by trying to identify patterns, sees if something new was identified positively and then repeats the process (maybe even somewhere else on the track), until no data remains. Anything that cannot be mapped is certain to be random data generated when writing or reading.

If it helps to visualise this, it is a bit like code to resolve a puzzle by filling parts of the picture that are certain, leaving questionable elements for later, and trying until it can see that some elements were not needed at all or the remaining blanks can only be placed in one particular way. The interesting part though, is that the puzzle has not cut beforehand, so you have to identify how to cut the picture first to assemble the correct one. In our case, the pieces are the patterns found, and the more you find the more you can resolve, until nothing remains.

All of this seems to be working very reliably, detecting the structure of some very sneaky protections too without any scripting patterns defined. If anything goes unnoticed, it can still be scripted specifically as usual. Based on this pattern data, it is now possible to add commands to the scripts describing what’s been found - we still need to implement those commands though.

Complications

For those interested in the kind of complications we are having to deal with.

Data can be read in two directions between sectors:

From the end of the current sector to the start of the next sector.
From the beginning of the next sector back to end of the current sector.

You may or may not get identical reads, and you do not know which one is correct. We’ll have more about the reasoning behind this in a later WIP.

A game does not need to find out the correct reading direction(s!), or where legible data stops and rubbish (such as write splice or track gap) starts. It can define the values used, or can be told what to look out for. Since we cannot interpret the game code in an automated way, we can assume nothing about gaps.

Now to add more fun:

Imagine that some of the rubbish in the gap can be real data. The game knows, but we have to decide if it is or not.
Some data may be sync markers that you must preserve as is.

These are just some of the problems that had to be resolved.

We have done lots of testing, and we have found that the more samples you verify your solution against, the more you realise how complex it can get.