We have tried quite a few iterations of various algorithms now, researching each ones weaknesses and strengths. Ultimately we will probably end up with an improved version of the band analyser, plus a bit of pattern based learning, with cell tracking.
As far as we can see (after over a week of analysing data), we have flux transition intervals behaving the same if they have the same neighbourhood. That is, they are likely to shift in the same direction all the time.
The band analyser would:
- Pick up all used major cell timings on a track.
- Pick the most likely median of the bands. This is not an average, it’s based on the observation that similar sized cells next to one another would have shifts that distribute evenly (except for the first and last cell in such a sequence), so this should be an average of such cells if possible, and fallback on medians if no such data can be found.
- Find bands that are related to each other, that is, more or less multiples of each other. Finding their common base will give the cell size used.
Patterns of cells could be derived that should behave the same.
Such patterns could be tracked on areas in safer zones where the outcome is likely to be correct, and next time the same pattern is encountered, apply the effect of the pattern to correct an ambiguous cell size. This is basically post-compensation, but without the need to know whether pre-compensation was applied to the data at all, or the exact parameters of the pre-compensation used.
As a final step, the current jitter of the cells can be tracked to alter the timings of patterns found and decode the resulting cell. A low-pass filter is also helpful for tracking jitter to prevent some rogue cells tripping up the whole decoding.
It’s not exactly trivial to implement all of these features working together, and we have quite a few other things to do, so we are not sure it would make a first release. We are likely to implement a sub-set of these features to begin with.
One very simple alternative to all this work is to simply guess values when ambiguous data is encountered. This would certainly save on development effort, and if you tried for long enough, it might work. However, due the nature of the problem, it is very possible to end up deadlocked in otherwise readable data if you never guess right values, or the required values are needed at multiple connected places. For example, you have two ambiguous cells in a block, and one would be correct at one iteration but the other one would not be, and the opposite happen at the next iteration - you are statistically unlikely to ever get it right.
A proper solution (and one that we said we would have to consider as a long term project), is to build a decision tree for all ambiguous cells and decode a track with all the possible (finite) combinations and traversing all nodes of the tree, and then let the analyser find which one is correct. If we can get the cell processing stuff super-reliable, having a system like this would allow very good data recovery.