Why do you need to describe disk formats?

There is a seemingly widespread confusion about this particular part of what we do, and so warrants extra coverage. Indeed, this is a rather different concept than people are used to - at least for people not in the industry at the time.

What?

On the original duplication equipment (normally from Trace) there was a “master” or “gold” disk, and a “disk layout description file” which described all sorts of things about the disk:

  • The Encoding used
  • The structure of each Track
  • The make-up of the blocks that each track contains
  • Right down to the data elements that make up each block

The master disk was read into the memory of the “host” machine controlling the duplicator (a Unix-based machine at the time). The “description file” was written in a special language called Freeform in which you could describe (almost) anything about the disk. It didn’t matter what system it was for, (PC, Atari, Amiga, Mac, anything) the scripting facility gave you this flexibility.

Who?

If somebody created a new Copy Protection and/or disk format, they also had to write the description file for it to be properly duplicated. It was likely this was done by the duplicator in many cases, who probably had had a service to create these scripts and even the protection itself. It was also done by an independent entities like Rob Northen Computing, and by the game developers themselves.

There are several reasons the disks need to be described.

Why?

“A raw bit stream of 0s and 1s provides few clues to how it is to be interpreted, let alone its meaning, and low level context information such as block size, encoding standards, and file structure, will prove essential. It might be feasible to draw out a text document or a data set relatively easily, but a digital image whose format was completely unknown and appearing as a single bit stream, would require significant analysis before it could be rendered.”

Changing Trains at Wigan: Digital Preservation and the Future of Scholarship
Dr. Seamus Ross, HATII, University of Glasgow
http://www.bl.uk/services/npo/occpaper.pdf (Cached)

This quote refers to reading bits off magnetic media as they appear on the disk using a process known as Magnetic Force Microscopy. Of course, we have our own way that means such expensive equipment is not necessary. The “significant analysis” mentioned is only part of what we do. We actually do a whole lot more, such as integrity checks, authenticity checks, etc.

You need to know how to write the data in order to write it reliably.

This is true for any system. The Amiga “knows” how to write its own disks, as does the ST, PC, etc. This is obvious. But for a non-specific machine to do it, or any other custom format, it needs to be told how, otherwise the disk would almost certainly suffer the effects of bit rot more quickly.

It is the only generic way to verify the written data.

The script contains information on what things it should expect when it verifies the format, from where the integrity checksum data is located to what algorithms are used to calculate it.

To allow flexibility of track format.

Describing the disk format meant you can then write disks for any system, and nearly (there were some things you could not do on a Trace duplicator) any copy protection. For example, you could even say how dense the bits are distributed across a track - as you may see from a Copylock graph. It was very accurate too. Take Dungeon Master, with its Flakey Bits. The script would say “the bits at XXX position are flakey”. You cannot blindly write that sort of thing. You need to “create” the effect.

SPS and Now

Back to the present. Nothing has changed. What we do is quite similar to what you used to have to do to get a disk written in a commercial environment, except we hold the data (master or gold copy) and the description (Freeform script) in the same “virtual” disk. An IPF (Interchangeable Preservation Format) file.

The information contained in an IPF file can be used to write a real disk. It is in fact what they were designed for. Perhaps in the future a DIY hardware based device will be available that will act very much like the Trace duplicator, just a bit cheaper. ;-)

Of course, the story does not end there. With a real disk, you simply put it in the drive and you are set since disks designed for the system are read by the disk controller in the format it expects. However, IPF files used in an emulator need to be interpreted back into the same thing the emulated disk controller expects.

Rather simplified this means that the IPF library takes the disk data and format descriptors from an IPF file and generates the data in a form that is expected by the a real disk controller, and not the system itself which gets post-processed controller data. This is quite complicated, since we are not just talking about track data, not even just MFM (Modified Frequency Modulation) (or whatever) encoded data. It is raw signal data as it is contained on the surface of the disk. It might be MFM, but it might not be legal MFM. You need to emulate the FDC (Floppy Disk Controller) very accurately - or at least the effects of it - to get the data you expect since what you can normally read through an FDC is not actually what is contained on the disk!

This describing of formats, coupled with our complementary “modification detection” technology where we can see if the same high-quality duplicator drive has written the whole disk, or was later modified by another drive means we can verify that the disks are 100% and true to their original mastering.

Our technology took over two years to develop, if we did not really care about data integrity, we could have started releasing original games after the first couple of months. It is all a bit extreme, but we believe that anything less is not appropriate for preservation. After all, from a preservation point of view, if you cannot guarantee the authenticity of your items, they are worthless. Ask a museum curator.