The Analyser
6 February 2002
The disk image analyser work is going very well. The disk format description framework is complete, and it allows us to describe each part of a disk.
Basically:
- The Format descriptor: Description of a whole track.
- The Block descriptor: Description of a block.
- The Data descriptor: Description of a data element of a block.
Lets explain that a little...
A track is made of one or more blocks, and these block contain one or more data elements. All this is based on reverse engineering of the dumps and the structure, or otherwise copy protection that they contain. These descriptors are then able to recognise the same structure on any other dump (say, of a different game with the same disk protection).
This reverse engineering takes a lot of time to begin with, but once it is done, all future games with the same disk structure are recognised immediately. For example, the AmigaDOS and Copylock (old and new) Amiga disk structures were the first to go in, so any dump that uses these is part or full is recognised automatically. If it varies slightly from that then we will be able to tell immediately. Another example would be Firebird games for the Amiga, most of which use the same (or very similar) copy protection scheme.
This all gets very involved when you consider the different things about a disk that need to be handled. Just take sync parameters, each part of a disk can have many sync variables (lead clocking, lead MFM clocking, multi sync, variable sync, variable data block, data block MFM - we could go on) and that is just for the sync! Then you can start describing each block, and at the most basic data level possible, describing each tiny data element (sync value, check bits, start bits, stop bits, encoded bits, data areas, area sizes, etc.).
Okay, lets look at Amiga Copylock in greater detail - just because it is one of the ones that the largest amount of research has gone into. The game used is called “Custodian”, and below is the layout of the disk, where colour is coded by the copy protection scheme found on each track.
Summarising the possible types of track by colour:
- Grey: Represents noise - basically an unformatted track.
- White Tracks that were not dumped, since many drives cannot read these tracks anyway.
- Blue Is the Copylock track (track zero, second side).
Other colours are also used (Red for long tracks, etc.) but since other density protections are not used in this particular game, they are not shown.
Lets have a look at the Copylock track in greater detail...
Here we have the Copylock protection density graph as shown in previous WIP’s. You may notice that the samples (pink line) are more finely grained than in most previous graphs, this is purely because this game was dumped on a faster Amiga and so more samples were taken per unit length of track, but it doesn’t make any difference. The blue line is the analysed density.
Lets look at some data...
This is a small part of the raw bit patterns on the Copylock track. It is used to aid analysation and the reverse engineering of the copy protection. This is just for illustrative purposes, don’t take much notice of the actual values, or... erm, text.
What we have basically done is develop a language for the analyser to allow us to describe abstract data that may or may not vary at certain points. This is no small task, but it is done now. Any work done without this would have been wasted effort. But now, when the analyser searches a track it finds all the parts already described. Data that cannot be marked with the current descriptors is highlighted for human checking, and probably needs a new descriptor made for it.
You may ask, “why does all this need to be done?”. Well...
- There is otherwise no way to check the integrity of the data dumped if we do not know how the data is organized and how is it checked against failures when loading the game (parity, checksum, etc.) we just can’t tell if the data read (dumped) is good or not. We might perhaps just hope that we are lucky, or say “it seems to work as far we have played it”. But that is just not good enough for a preservation project.
- There are some disk loader systems that push their luck by simply not doing any type of integrity checks on the data read. They simply expect the data read to be correct all the time. NOT good for the user. However reverse engineering the raw format stored may show that there is integrity information among the raw data, it is just not used by the game itself - it was probably used to verify the data during the duplication process.
- There is no way to reliably write encoded magnetic data without knowing how to write it. Knowing what to write - the data itself - is just not enough, what you read back will be normally different from what you wanted to write (at least, by reading back through the machine - i.e. loading and playing the game).
- There is no way of verifying what was written during mastering since there are no checkpoints.
You can find futher discussion about this here. But to put it very simply - disks format had to be specified before disks could be mastered by the duplicators. In comparison to what we are doing, that was reletively easy. They had a specification to work with from whoever developed the format/protection and did not have to reverse engineer it.
Now, having already reverse engineered some copy protection schemes (Copylock, Firebird, and a few custom ones), many games can be verified, exported to the release format and remasted with no trouble at all.
Hopefully you now will have a slightly better idea of what is going on. All this is getting us closer... But there is still lots of work to be done...