:: The story of a lone developer's quest to build an online world
:: MMO programming, design, and industry commentary
[The Imperial Realm :: Miranda] [Blog] [Gallery] [About]

The One Man MMO Project: Looking for a Fast Square Root

By Robert Basler on 2012-03-29 11:19:56
Homepage: www.onemanmmo.com email:one at onemanmmo dot com

Last night I needed a square root function that would work on 64-bit unsigned integers. I have been using the standard library's sqrt, but my first thought was that an integer square root could be faster.

I looked around and found what was said to be a "fast" integer square root on Stack Overflow by Craig McQueen. I used the code as-is except that I changed it to use 64-bit unsigned ints because that was what I needed.

It seemed like a reasonably performance efficient algorithm, but I had always heard that sqrt, even in hardware, was slow. Because sqrt gets used a lot, I thought better safe than sorry, so this morning I decided to benchmark the integer algorithm against sqrt just to be sure.

  • Int Elapsed 1506419us
  • Float Elapsed 159330us
  • Float+Cast Elapsed 182160us

It turns out that the hardware-based floating point sqrt is 9.45x faster than the "fast" integer square root. If I add a call to static_cast to get the float value into a uint64, it is still 8.27x faster.

I'll keep the integer square root, but this post is going in the comments for it.

By Robert Basler on 2012-03-29 13:36:42
Homepage: www.onemanmmo.com email:one at onemanmmo dot com
The original testing was done on a Core I7 in a debug build. I thought it might not be fair measuring in debug, so I tried it again in release:

Int Elapsed 571950us
Float Elapsed 38746us
Float + Cast Elapsed 56759us

Float is 14.76x faster, 10.08x with the static_cast.

Also, this was not the timing for one sqrt, but 4,000,000.
By omeg on 2012-03-30 05:40:30
Homepage: omeg.pl email:
Got similar results on my i7 laptop:
Sqrt float: 12583.508ms
Sqrt double: 12371.292ms
Fast uint sqrt: 24388.667ms
Fast ulong sqrt: 96830.309ms

That's for 1e9 loops with adding all sqrts together to prevent optimizer from ignoring the values. Also did the same test in C# for fun:
Sqrt double: 13547ms
Fast uint sqrt: 25701ms
Fast ulong sqrt: 121126ms

Not much difference. :)
By omeg on 2012-03-31 04:30:28
Homepage: omeg.pl email:
Did some more research, here are my results.
By Robert Basler on 2012-03-31 11:06:01
Homepage: www.onemanmmo.com email:one at onemanmmo dot com
Wow! Check out this graph - the performance spikes for 64-bit fast ulong are quite shocking. 32-bit apps don't seem to handle 64-bit values very well at all. That's really interesting, thanks!


Add New Comment to this Topic

Admin Log In