Twitter  YouTube  E-Mail  RSS
The One Man MMO Project
The story of a lone developer's quest to build an online world :: MMO programming, design, and industry commentary
Optimization - The (not) Lost Week
By Robert Basler on 2011-01-14 17:21:54
Homepage: email:one at onemanmmo dot com

I've been getting anxious about having a playable to show people. It is getting really close now, so this week I expected to be getting the code running to move the camera and synchronize it with the simulation server. But as I was getting that going, I was shocked to discover that a round trip from the client to the simulation server took



That number should be some small fraction of a second. What the heck was going on to make it soooooooo slow?

Turning on network traces, I discovered that some packets were taking up to 9 seconds to arrive. Over the loopback interface -- the packets weren't even going over a network! Adding a display of the time between trace lines, I noticed a number of several-second breaks in the program execution at random spots. The program was evidently starving for CPU cycles. My laptop is a 1.5GHz Core 2 Duo. Not really quick, but it should be quick enough to run a game plus server for one player. Looking at the CPU gauge, both cores were hitting 100% fairly frequently, but they weren't pegged at 100%.

The first thing I tried was upping the program priority of the server so it would respond more quickly. That certainly worked at making it respond quickly, but it didn't effect my late-packet problems.

I thought maybe UDP packets were getting dropped, and it was waiting for packet send retries, but adding some counters to the network protocol, I learned that Windows localhost doesn't seem to drop UDP packets at all. Good to know.

One of my long-time gripes with Visual Studio is that it doesn't include a profiler in the Professional Edition. Apparently professionals don't need profilers. So I went over to Stack Overflow and discovered AMD Code Analyst, a nice for-free profiler which integrates with Visual Studio (or runs standalone.) (Getting the app for free from AMD was a nice win for me considering the loss I've taken on my AMD stock the last few years.)

Running Code Analyst I discovered a few interesting things: that my memory tracking system takes about 15% of the program runtime, that a 16-bit CRC routine takes another 10%, and totally shocking, the empty constructor for QuantizedInts also takes about 10% - that must be getting called a lot. The first one is no surprise, the second two a little more so. But it still didn't give me a smoking gun for my 7-1/2 second delay.

Windows Vista and 7 have a resource manager built in (access it by pressing Ctrl-Alt-Delete, select Task Manager, then click on Resource Manager on the Performance tab.) This cool application shows graphs of CPU, network and disk use in real-time.

Running the resource manager, I was able to see that network use was somewhat higher than I would have expected. That got me digging into my networking code, where I found a pretty horrendous bug. I was resending packets that had already been received and acked by the other end of the connection - ack!

Fixed that bug, but it was still resending long strings of packets all the time. Drat. Comparing packet sending and receiving traces, the ack packets were being delayed greatly waiting for other packets to be processed. So I added an output queue to hold packets before sending them to the program to be processed. Those packets are now processed later in the update and I limit the amount of time spent on packet processing. With that change, ack packets are processed immediately, eliminating a lot of the resends.

With these changes, resends are virtually eliminated, and the round trip time has dropped to about 450ms in a debug build. That's slow, but should be fast enough, and I know where to look if I want to make that number smaller. Running a release build over a wireless network, it is below 200ms which is just peachy.

To make sure I don't get surprised like this again in the future, I've added some canaries to my code, one for packet processing time over 30ms, one for the whole update loop over 100ms. They print out in the traces so I can see right away when something new is way too slow.

So the week didn't go the way I expected, but the improvements I made and things I learned were surprisingly worthwhile.

By Robert Basler on 2011-01-19 18:41:14
Homepage: email:one at onemanmmo dot com
I found mention that a single hop should be <1ms, so that got me wondering what my problem was. I made a second high-priority thread that calls my old select code with a 500 microsecond wait, and sure enough, I was able to reduce the total round trip time to only 50ms. Basically at this point the entire delay is how often I pump my packet sending code. Receives are handled immediately. This makes the code more complicated since it is now multithreaded, but I think the performance improvement is worth the tradeoff. With some additional tweaking, I should be able to shrink that number further, but not right now.

New Comment

Cookie Warning

We were unable to retrieve our cookie from your web browser. If pressing F5 once to reload this page does not get rid of this message, please read this to learn more.

You will not be able to post until you resolve this problem.

Comment (You can use HTML, but please double-check web link URLs and HTML tags!)
Your Name
Homepage (optional, don't include http://)
Email (optional, but automatically spam protected so please do)
What color is the ocean? (What's this?)

  Admin Log In

[Home] [Blog] [Video] [Shop] [Press Kit] [About]
Terms Of Use & Privacy Policy