So RollCaster has had a long-standing desync issue. This is a pretty minor thing as far as the Western player base is concerned, since the community don’t have a lot of players and it happens infrequently enough that it’s more a nuisance than anything else. They probably thought I was mad for attempting to fix it.
If I wasn’t before, I sure went mad trying!
I was moderately successful, and I’m not sure it’s even possible to be truly 100% on this one so I’m not going to worry about it too much. Instead I’ll discuss the origin, strategy used for finding it, the results, and a ridiculous case of coincidence that popped up.
Floating-Point Unit(FPU) Synchronization
Yep, it’s this again.
All x86 based processors use an IEEE standard Floating-Point Unit for non-integer math, which obey a specific set of rules. Unfortunately, this is only in respect to input and output standards, and how things are handled internally is still defined by the implementation. This also means that slightly different units can create slightly different results, sometimes on the order of about 0.000001. This doesn’t sound like much, but it can snowball into a larger value rapidly, or make a simple less-than or greater-than check return an incorrect value.
The early desync fix for Caster, back in 2006 or 2007 or thereabouts, I forget exactly when, involved setting the FPU’s Control Word’s precision setting down from double precision(53-bit) to float precision(24-bit), to match the size of the data structures used by IaMP. In th075Caster’s case, that was enough to fix nearly everything, but there were further issues involved in RollCaster due to the more invasive nature of the shenanigans it does.
Now, Intel processors use 80-bit precision internally, regardless of the FPUCW’s precision setting. All that setting does is round off to the nearest value. So even if you are using the correct one for the data type you are using, it will still simply round off to the given precision value before handing it back to you. Not only that, it will still leave minor bits of garbage data in the floating point data. When RollCaster rewinds time, it may have reset all the data, but it didn’t reset the state of the FPU unit. This would create small accumulations of 0.00000001 and so forth in the parts of the data that do not get overwritten.
If you are a programmer, learn that you should never, ever, ever directly equate floating point values. It is entirely possible that ((2.0f != 2.0f) == true) is valid.
How I verified this
I was not 100% sure that it was the FPU was at fault. “Pretty sure” wasn’t good enough for me, so I had to come up with a strategy to determine what exactly was going on in there.
My eventual goal was to create a version that would synchronize with the replay watcher, so what I did was create versions of RollCaster and th075tool that would output a log containing the data within every frame’s state. It would also record rollbacks and frame timings while it was doing it. These log files were frustratingly large, generally in the 200MB to 500MB range.
So I created a standalone tool to parse and compare the data within the logfiles. As there was data that would never match up between sides, like pointers to memory, and I had to make sure none of that was compared when it went through. This was done with an initial pass over the first frame’s data, assuming that synchronization was valid at that time, so anything that didn’t match was not worth checking.
This led me to wonderful output like this:
?? 12 | 03 : 42ff78c4 != 42ff78c5 [127.736 != 127.736]
Yup, that’s a one-off bug. Snowballs into something much larger occasionally. One example I ran into was that one of Yuyuko’s butterflies bounced off the wall at the wrong frame, causing the RNG to fall out of sync and weird things to happen for the rest of the match.
There was also another kind of desync, which was when two players were going to overlap it would determine the wrong order, and place the players on different sides. Simple less than/greater than bug.
In the end it verified that I needed to do something about the way RollCaster handles the floating point data, and I needed to figure out how.
There are three functions which allow you to save and restore the floating point state whole: FSAVE, FNSAVE, and FRSTOR. FSAVE stores the full data to a 108-byte chunk of memory, FNSAVE does so without checking for exceptions, and FRSTOR restores the full data from a 108-byte chunk of memory. As FSAVE can sometimes fail because of the exception handling for certain types of data, I want to use FNSAVE instead.
As a fairly obnoxious bonus, FNSAVE also implicitly calls FINIT, which resets the FPU to a default state. This is in Do Not Want territory as that means the data is completely emptied and the current precision mode is lost. So every time FNSAVE is called, FRSTOR has to be called immediately afterwards.
So now the question is: Where do they go? The obvious answer is on the state save and restoration code. But that’s not enough, because RollCaster does a few other things.
When it’s catching up from an old frame, it will disable rendering, using the frame skip mode from the 30fps mode in the original IaMP. But there’s a bug in the original game with how it renders dash shadows, so Roll has some code to override that and manually run the code. Because this is normally run in the middle of the rendering code it doesn’t affect the resultant state of the FPU much there, but where it is now we need to prevent it from leaving any leftover data, so it gets wrapped.
The other issue is the camera. Roll has a new camera which limits the amount of movement that can happen each frame, so that a Remilia mashing left and right doesn’t cause you to have a seizure. As such, it modifies the FPU in a way that will leave over garbage data after the rendering occurs. In order to prevent this from being a problem, the block of code which is affected by the camera is also wrapped by the storage functions.
In the end, this improves the situation, but doesn’t fully fix it.
Actually fixing it is futile.
I found that out by total accident.
The Completely Accidental Linux Desync Fix
So after my initial implementation of FNSAVE, the desyncs went haywire since I didn’t know about the implicit FINIT. This completely baffled me. I eventually checked the state of the FPUCW and found that the precision mode was wrong. Clearly, this is a bug in my initialization code since FNSAVE totally doesn’t affect anything.
There is some Wine-specific code that sets the control word, since the normal method does not work. It does this the instant that the application starts up. I felt this was probably behaving incorrectly, so I replaced it with some code that would apply it later on in time.
When I noticed this didn’t do the trick, I found out about the implicit FINIT call and hit my head on my desk pretty hard.
But after I removed the FNSAVE code to check what was going on, something else happened: I was suddenly synchronizing perfectly with replays almost all of the time, when before I was getting small rounding errors even without the FPU calls.
Therefore: There are floating point values initialized on startup before the desync fix is applied! And these values are affecting what happens afterwards!
I don’t know where or how or even what it’s setting, it’s just undeniably there. These are either cached and vary slightly after the startup sequence, or leave leftover data in the FPU for when the control word is set.
Both replay watchers set it at replay loading time, in order to ensure platform synchronization. I am not exactly sure when th075Caster sets it but it seems fairly late. RollCaster’s Linux code was setting it instantly, and thus was causing different results down the line than the replay watcher got. What this means is that the desyncs can’t be fixed without breaking compatibility, because there is some code that is not affected by the current desync fix that absolutely cannot be changed without also causing desyncs for both players and the replay watcher.
The current solution is adequate but imperfect. Occasional desyncs are simply inevitable without also breaking compatibility, unfortunately.
Very frustrating, but also some very important lessons to be learned here regarding how computers work at a (very) low level.