Investigating RollCaster’s Timing Issues

After finishing the desync fix I thought I had finally finished. Shortly after, a few Japanese guys tried it out and got back to me that the timing was unstable, causing the frame rate to jitter unnecessarily.

Sure, I thought, it can’t be that hard to fix.

This started a very long process which led me all over the codebase, analyzing every last part of the process involved in it, for over a month. I’m sure nearly everyone who put up with me begging for testing and putting out updates is sick of me by now!

The reasoning behind the new timing system

Way back when I started design work on RollCaster, the original game was running into timing problems as it would not run at a consistent speed from one operating system to the next. Win2K/WinXP ran at a speed different from Vista/Win7 which, of course, ran differently from Linux+Wine and so forth. Very frustrating for the players involved. When designing it in the first place, I had decided to exchange the game’s main timing loop with a new one, to make it consistent across all platforms.

But that’s not the only reason that the new timing was needed.

Rollbacks take a variable amount of time depending on how much processing the game needs to do in order to get caught up. This is anywhere from 1ms to 12ms or so. Practically, this means that if you do it immediately before you would normally do the game, the actual current game processing and rendering would be delayed by a variable amount.

This causes a substantial amount of visual jitter, as it’s no longer consistently synchronized with the monitor refresh rate. In addition, since the game does not use vsync and the standard is to play with vsync off, this causes a fair amount of tearing too. Obviously, this isn’t acceptable.

So how did I resolve this?

Rather than doing the game update immediately, the frame is split into two portions, where the Caster update loop which does inputs and calculates rollbacks comes first, followed by a short wait, and then it does the game portion. This causes both the inputs and the visible graphics to happen at very consistent speeds. This is very ideal and a great way to synchronize the game’s updates.

The amount of wait time is calculated dynamically after each rollback, running under the assumption that it’s going to take the general same amount of time for each. I also added a lower bounds 2ms and an upper bounds of 12ms for this, so it doesn’t try to wait too little or too much.

In an ideal situation, the input should be delayed to just before the actual game part itself is run, but Caster’s code is a real hassle to work with and I just decided it wasn’t worth the effort to rewrite the way the loop works just for this. Most modern computers can run the rollback period in less than 4-5 milliseconds, and I’m pretty sure there are extremely few players that can notice a 0.25 frame input delay.

Had I done everything else correctly, this would have worked great.

Unfortunately, I didn’t.

Goddammit Windows

I’m not a Windows programmer primarily. I do Linux stuff mainly, and most Windows development I do is done through Wine. It’s not an uncommon thing that this comes back to bite me in the ass. In this case, I made two very critical mistakes when I rewrote the timing loop, that I flat out just never noticed for, well, years. Fortunately not many other people did, but a Japanese player brought it up to me.

The first issue is that I used GetTickCount. It turns out that the update cycle for this is very sparse, often only synchronizing to anywhere between 14 and 17ms intervals. This leads to a very unstable update frequency and obviously shouldn’t be used for anything that needs to be reliable. On Wine it is, of course, perfectly accurate, so I didn’t notice.

The answer to that one is simple enough, it’s that timeGetTime from the winmm library should be used instead. On Windows this updates on a much higher frequency and is much more accurate to base your timing loop on.

The other is that I killed the game’s call to timeBeginPeriod, which dictates the frequency with which the winmm timer is updated. Normally this is around 1-2ms, but on systems where the default is poor, it’ll aim for 10 to 16. Fixed this by just involved reinstating the call at first run of the main loop.

For completeness’ sake, I’ll go over a third option that can be used but I chose not to use. QueryPerformanceCounter and QueryPerformanceFrequency are functions used to query a high performance timer. It is not exactly the same as the speed of the CPU, but related to it. The downside is that on multicore systems the counter can vary based on which core you’re running on, which would force you to manually set the core used by the main thread. It can also fail or not work correctly on certain types of CPUs.

My testing showed that Sleep(1)’s accuracy is roughly around 1-2ms, so highly accurate timing’s not important enough to worry about and timeGetTime is plenty sufficient, without needing to do any wonkery with threading. Since I am not aiming for 100% CPU usage and the absolutely perfect timing it would give, this is plenty good enough.

The part I forgot about

When I started working on Roll, I had all kinds of crazy ideas for improving the overall performance of Caster, now that I had direct control of the game’s timing.

One of these ideas was to change how it handles waiting for network input. Even with rollbacks you still need to synchronize to an opponent’s timing, just from a few frames behind. I decided that it would be for the best if I changed it to wait only 1ms instead of one frame. At the time, this seemed like a really good idea and would increase overall stability.

The reality of the matter is less pretty.

There are tiny little timing differences from system to system. This is inevitable with PCs, so it has to be taken into consideration. So this means that while the two clients will start off fully synchronized, they will slowly fall out of sync from each other over time. This is something which can’t be avoided.

What happens here, with 1ms wait periods, is that the faster client will be waiting longer and longer for input from the other side, until it eventually starts pushing beyond single frame boundaries. This destroys the stability of the partial wait system, and generally the stability of the updates entirely as it’ll also push the start of the frame around.

I stared right at logging data proving that this was happening and didn’t recognize it. For, like, two weeks. This is mostly because I forgot that I even had the 1ms wait period in there, despite looking at the function that does it every day. Sometimes, I’m just not very bright.

Returning this to the original version, which would delay a single frame when dealing with lack of inputs, effectively resolved this issue and no longer jammed things up against the tail end of a frame. Losing one frame once in awhile is considerably more preferable and invisible than screwing up the main timer.

Other fixes and stuff

Another major fix I did was some RollCaster specific code, where it would handle packet loss by allowing the game to continue running normally for one extra frame, and then roll it back afterwards. The idea is sound, and GGPO does something similar to it, but I had it written incorrectly. Because I messed it up, the game was effectively running at one higher input buffer than necessary, which caused the clients to desynchronize the amount of rewinds that would occur, often up to 2 higher than the required value.

To fix a Linux bug a long time ago, I changed the main recvfrom loop from blocking i/o to a short Sleep() loop, and didn’t think much of it. This had the downside of increasing the player’s ping a few milliseconds and making packet handling more unstable. I reverted this to a blocking state on Windows.

After fixing all of these I think I am finally done with this client. The current build runs fantastic and seems to have no real major issues at time of writing. Knowing my luck someone will get back to me with a bunch of bugs, but oh well!

Either way, I will be happy to have it out of my hair! That’s not to say I regret making this, as it was a great learning experience and it did a hell of a lot for the community for the game, but let’s just say I don’t ever want to do it The Caster Way ever again.

  1. I guess you have seen one these videos already:
    http://www.youtube.com/watch?v=F4OihoHRLoU

Leave a Comment