This document describes and documents the stream files produced by the KryoFlux device. It is a technical document, describing a binary stream, but some effort has been made to make it more accessible to a wider audience, including references to further information.
It is important to note that stream files are not intended as a file format as such, they are actually part of a protocol between the KryoFlux device and the host system, and are in fact exactly the byte stream communicated over USB and saved by the host system when a disk is imaged.
Important: This stream format is optimised for running over “full-speed” (version 1.1) USB, which has a theoretical maximum data transfer of only 12 Mbit/s. Since the stream is sent over the wire while a track is being read, the stream is encoded for transfer and processing efficiency - some complexity arises from this. This stream requires decoding before it is suitable for further interpretation.
Stream files are not intended for long term preservation, and they are currently hardware specific (a KryoFlux device). Long term storage of disk data should be kept in the DRAFT format. DRAFT is not yet complete, however, this should not delay preservation activity as a converter will be made available around the same time.
WARNING: This file format is WORK IN PROGRESS and you will need to update your application frequently. We therefore recommend switching to DRAFT once implemented. We understand that working with stream files currently is the only option to work with raw data ingested with KryoFlux. Version 2.0 of DTC which is already in beta testing will come with a slightly enhanced protocol that will transfer additional data (hardware clock, time stamp, etc.). Once published this will break applications that still use the older protocol. Developers should get in touch, especially if there is interest in obtaining the source for the protocol.
There are two items of logical data that are transferred in a stream, timing of flux transitions1) and timing of the index signal2). These two pieces of information are preceded by control code structures, referred in this document as “OOB” (Out-Of-Band) information.
- Stream position - position in the data stream represented by a stream file.
- Cell position - position in the decoded cell buffer.
- OOB (Out-Of-Band) - other information that is not sample values. These can be are control structures to guide decoding, as well as index signal information.
In order to capture everything on a disk, you need everything between two index signals. Since a KryoFlux device will stream the data from a drive, the stream will start before the first index signal, and end after the last index signal. This is important to ensure everything from a disk is captured. Outside an index signal data cannot be meaningfully decoded, and should be ignored.
For various reasons, especially for games, multiple revolutions of data should be captured in a constant stream. This means a stream file may contains more than two index signals. For SPS purposes, the minimum requirement is five. Having multiple revolutions is generally helpful for analysis.
Data is byte-aligned3) for processing efficiency on low powered devices. This means that no information is encoded at the bit level - there is no need to break a byte down into bits in order to be interpreted further.
Numeric data is little-endian4) encoded. Values that do not fit into a byte value, are placed next to each other with the least-significant byte first.
Each piece of meaningful information starts with an OOB header, followed by the type of data, followed by the data itself. Data seen before the first header is possible, but can be ignored.
All control values and encoded timing values are denoted here in hexadecimal by the prefix “0x”. Decoded numeric values are shown in decimal (no prefix).
Important: All offset values are relative to the position of the encoding marker.
The first byte of a decode loop is a special value that indicates how to proceed with processing, which might be the type of data to decode, be part of a longer code sequence, or might be data itself. It has the following possible meanings.
|<= 0×07||Value||2||New cell value: Upper 8 bits are *this* value, and whose lower 8-bits are the next position in the stream|
|0×08||Nop1||1||Ignore value. Continue decoding at current position + 1|
|0×09||Nop2||2||Ignore value. Continue decoding at current position + 2|
|0x0A||Nop3||3||Ignore value. Continue decoding at current position + 3|
|0x0B||Overflow16||1||Next cell value is increased by 0×10000 (16-bits). Decoding of *this* cell should continue at next stream position|
|0x0C||Value16||3||New cell value: Upper 8 bits are offset+1 in the stream, lower 8-bits are offset+2|
|0x0D||OOB||Variable||First byte of OOB data header (see OOB section for decoding rules)|
|>= 0x0E||Sample||1||New cell value: one that can be represented in by remaining 0x0D-0xFF space|
The NOffset values here indicate the next position to decode after the current one has been processed. A value of 1 indicates the decoding iteration should start on the following byte, and a value of 2 indicates decoding should start at current position + 2. The NOffset cannot be determined by OOB data, as it is variable depending on the OOB data type (as OOB data varies in size depending on what kind of data it is). Please see the OOB section for decoding rules.
Nop (No-operation) codes are just used to skip data in the buffer, ignoring the affected stream area. This makes it possible for the firmware to create data in its ring buffer without the need to break up a single code sequence when the filling of the ring buffer wraps.
The range used for ‘Value’ (indicating a “short 16-bit value”) can be used to hold sample values from 0×00 up to 0x07FF (0×07 in the high-order byte, and the next byte in the stream in the lower-order byte). However, they can also be used to hold sample values <= 0x0D, which can’t be (at least, not without further complicating decoding) represented using a normal sample because of the special meaning of the 0×08 to 0x0D control values in this encoding marker byte.
The Overflow16 indicates a cell value that does not fit in the maximum space supported by the Value16 encoding (16-bits). It supports a variable length of precision for any value above 16-bits. Finding this value signifies that the final resulting cell value should be increased by 0×10000 (2^16) but decoding of the cell should continue at the next stream position.
For example, a stream bit pattern of:
Would produce a cell value of:
0x10000 + 0xDD87 = 0x1DD87 (122247)
There is no limit on the number of overflows present in a stream, and so the counter resolution is virtually unlimited, although the decoder in the KryoFlux host software currently uses 32 bits.
Cell values that do not fit into 16-bits like this are quite unusual, but are have been found in games that attempt to fool the AGC (Automatic Gain Control) of the drive electronics.
The encoding of sample values, working from value to how it is encoded, is as follows:
|0×0100...0x07FF||sample 15:8, sample 7:0|
|0×0800...0xFFFF||<value16>, sample 15:8, sample 7:0|
|0×10000...||<overflow> repeated for each 65536 sampling period counted, until terminated by a normal sample|
Where ‘n:m’ indicates the range of bits this sample occupies in the result. E.g. 15:8 would indicate that part of the sample resides in bit 8 to 15 of the result.
Encodings separated by commas indicate each byte encoding required to represent that sample.
The data header indicates the start of a new OOB information block. E.g. index signal, transfer status, stream information. A repeated OOB code makes it possible to detect the end of the read stream while reading from the device with a simple check regardless of the current stream alignment.
|0||1||0x0D||Sign||Constant value indicating start of OOB header|
|1||1||Type||OOB block data type, see below for possible values|
|2||2||Size||The number of bytes of the OOB data that follow the header|
The type value indicates the following OOB data section, or no following data section if it is type “End”.
|0×01||Stream Read||Start of flux transition timing data block (multiple per track)|
|0×02||Index||Index signal data|
|0×03||Stream End||Signifies there are no more stream read blocks (one per track)|
|0x0D||End||End of data (no more data to process)|
|4||4||Stream Position||Start offset of this transfer (at first byte of OOB header)|
|8||4||Tr Time||Elapsed time since last transfer in milliseconds|
|4||4||Stream Position||End offset of transfer (at first byte of OOB header)|
|8||8||Result||Result code, see below for possible values|
|0×00||Ok||Transfer success (does not imply data is good, just that streaming was successful)|
|0×01||Buffer||Buffering problem - data transfer delivery to host could not keep up with disk read|
|0×02||No Index||No index signal detected|
|4||4||Stream Position||Position in stream where index is detected (at first byte of OOB header)|
|8||4||Timer||The timer value for a when an index is detected. Used to get accurate time of index after last flux transition|
|12||4||Sys Time||The system clock time when index detected|
The clock frequencies required to convert the flux transition timing to absolute timing values are specified on the KryoFlux hardware, and can be queried using a device command. If the KryoFlux hardware changes at some point in the future, these frequencies may change. For documentation and illustration, these values are reproduced here for the current hardware.
All timings here are represented as (64-bit) floating point values.
|mck||Master Clock Frequency||((18432000 * 73) / 14) / 2|
|sck||Sample Frequency||mck / 2|
|ick||Index Frequency||mck / 16|
|Sample to Index Ratio||sck / ick|
|Index to Sample Ratio||ick / sck|
Flux transition timing contained in the stream only makes any sense when the disk index timing is accounted for. Once all of the stream data has been read, several calculations are required. In particular, we need to know the speed of the drive.
To do this, we need to have determined:
- Cell Position - the position in the cell buffer where the index occurred. This can be done during decoding by storing the current cell position as each index occurs.
- Index Time - This is the time taken for one complete revolution of the disk, the number of clock cycles since the last index occurred. It is calculated by summing all cell values that we recorded since the previous index, accounting for the timer value at which the index itself was generated (see OOB Disk Index - Timer) in order to correctly support indexes that happen during a flux transition timing period. This is done in the KryoFlux decoder by taking the clock cycles since last index, adding the index timer value, and subtracting the timer value of the previous index.
Note that up until the first index, a reliable index time cannot be generated as it will always be a partial revolution.
To calculate the RPM for each revolution, using double-precision arithmetic, the following algorithm is used.
index time = system time at index (n) - system time at index (n-1) rpm of index n to n-1 = (index frequency * 60) / index time
Note that the RPM of the data before the first index cannot be reliably determined.
To increase reliability, the KryoFlux decoding software performs RPM interpolation when converting timing to absolute values. If the RPM of one index is significantly different from the following index, it may be that the disk drive doing the reading is unreliable, and the drive speed from index to index is not constant. This would affect all cell values across the track, as a simple RPM calculation would assume all cells were taken at the average rate. We can mitigate this affect by converting each cell value using an interpolated RPM value. Various interpolation algorithms are possible to do this, but since this feature is not strictly necessary, description of it is out of the scope of this document.
After decoding has been performed in the above sections, we are left with cell values and index times. However, these times are actually in terms of clock cycles of the KryoFlux device. They now need to be converted into absolute values to be of practical use.
real cell time = cell value n / sample frequency
The real cell time is now calculated by taking each cell value and dividing it by the sample frequency. This will take the KryoFlux hardware-specific cell time, and convert it to a time in units of seconds (again using double-precision).
Since the drive doing the reading may not run at the same RPM as the target system’s drive, this real cell time should be adjusted to it’s platform-specific target time. This allows KryoFlux to support reading of disks for drastically different system RPM (360 RPM system disks read on a normal 300 RPM drive, CLV systems that vary the RPM using different zones across the disk), as well as supporting minor variations in the speed of the drive doing in the reading. Also, in the activities of the Software Preservation Society, we have seen that it is very common to have drives that run a bit faster or slower (usually in the range 297 to 303 RPM), and so cells read on such drives should be adjusted to the “perfect” system RPM, even if you are reading a disk for a system that uses the same RPM at the drive you are reading it in.
target-adjusted real cell time = real cell time * (actual rpm / target system rpm)
Where ‘actual rpm’ is the current interpolated RPM value for this cell and revolution, or the average for this revolution if no interpolation is being performed.