:: The story of a lone developer's quest to build an online world
:: MMO programming, design, and industry commentary
[The Imperial Realm :: Miranda] [Blog] [Gallery] [About]

The One Man MMO Project: Looking for a Fast Square Root

By Robert Basler on 2012-03-29 11:19:56
Homepage: www.onemanmmo.com email:one at onemanmmo dot com

Last night I needed a square root function that would work on 64-bit unsigned integers. I have been using the standard library's sqrt, but my first thought was that an integer square root could be faster.

I looked around and found what was said to be a "fast" integer square root on Stack Overflow by Craig McQueen. I used the code as-is except that I changed it to use 64-bit unsigned ints because that was what I needed.

It seemed like a reasonably performance efficient algorithm, but I had always heard that sqrt, even in hardware, was slow. Because sqrt gets used a lot, I thought better safe than sorry, so this morning I decided to benchmark the integer algorithm against sqrt just to be sure.

  • Int Elapsed 1506419us
  • Float Elapsed 159330us
  • Float+Cast Elapsed 182160us


It turns out that the hardware-based floating point sqrt is 9.45x faster than the "fast" integer square root. If I add a call to static_cast to get the float value into a uint64, it is still 8.27x faster.

I'll keep the integer square root, but this post is going in the comments for it.

By Robert Basler on 2012-03-29 13:36:42
Homepage: www.onemanmmo.com email:one at onemanmmo dot com
The original testing was done on a Core I7 in a debug build. I thought it might not be fair measuring in debug, so I tried it again in release:

Int Elapsed 571950us
Float Elapsed 38746us
Float + Cast Elapsed 56759us

Float is 14.76x faster, 10.08x with the static_cast.

Also, this was not the timing for one sqrt, but 4,000,000.
By omeg on 2012-03-30 05:40:30
Homepage: omeg.pl email:
Got similar results on my i7 laptop:
Sqrt float: 12583.508ms
Sqrt double: 12371.292ms
Fast uint sqrt: 24388.667ms
Fast ulong sqrt: 96830.309ms

That's for 1e9 loops with adding all sqrts together to prevent optimizer from ignoring the values. Also did the same test in C# for fun:
Sqrt double: 13547ms
Fast uint sqrt: 25701ms
Fast ulong sqrt: 121126ms

Not much difference. :)
By omeg on 2012-03-31 04:30:28
Homepage: omeg.pl email:
Did some more research, here are my results.
By Robert Basler on 2012-03-31 11:06:01
Homepage: www.onemanmmo.com email:one at onemanmmo dot com
Wow! Check out this graph - the performance spikes for 64-bit fast ulong are quite shocking. 32-bit apps don't seem to handle 64-bit values very well at all. That's really interesting, thanks!

sqrt-cpp1.png

New Comment

Cookie Warning

We were unable to retrieve our session cookie from your web browser. If reloading this page does not get rid of this message, please read this to learn more.

You will not be able to post until you resolve this problem.

Comment (You can use HTML, but please double-check web link URLs and HTML tags!)
Your Name
Homepage (optional, don't include http://)
Email (optional, but automatically spam protected so please do)
What color is an orange? (What's this?)

Admin Log In