@saurabh: Thanks! I had trouble understanding the paper at first, writing down my thoughts helped make the ideas

[…] Reference ] Subscribe to comments Posted on 06.03.10 to Game Development by Eddie Post Tags: […]

Good article, but one thing bugged me; no CPU that I know of can do anything “zillions of times” per second. We are limited to clock cycles in the billions, not quadrillions (and especially not “zillions”), and multiple cores don’t get us there either.

Whoever wrote that code is a true gangster.

Using “Try the Demo” and plugging in values of 3, 4, 5 for n, the result blows up.

[…] is a truly awesome magic number that seems to come out of no where, 0x5f3759d5. Eco World Content From Across The Internet. Featured on EcoPressed A sand box for low […]

[…] This has been sitting in my drafts folder, waiting for me to read the article, learn about it and summarize it here. […]

Come on, this blog cannot be a better explanation because

it is NOT 0x5f3759d5

IT IS 0x5f3759df

If you shift the entire value right, why doesn’t the value stored in the left size of the value bleed into the part stored in the right side?

8-digit example in decimal:

0913|0020

Let’s say the right half is the exponent and we shift it all right by 1 column, effectively dividing it by 10 (without rounding):

0091|3002

^ Why doesn’t this happen? It’s clear that it doesn’t, but I don’t know why. I’d think we’d end up with 91 for the value with the right-most column of the value bleeding into the exponent.

Thanks Khalid,

well explained!

I also further reffered the derivation of magic number from:

http://blog.quenta.org/2012/09/0x5f3759df.html

Whoops, I had the wrong constant copied in (not sure how that happened), updated now!

Hi!

Please forgive my poor English at first.

I use lua to compute the sqrt inverse,and my_sqrt_inverse

is my conduction form the form:

x = 1/sqrt(i)

x^2 = 1/i;

f(x) = x^2 - 1/i;

then use the Newton’ method:

gn+1 = gn - f(gn)/f’(gn)

gn+1 = gn - (gn^2 - 1/i) / ( 2gn)

the answer is correct.

but when i use your method,I got the

my_sqrt_inverse2,and then output is wrong,and it seems go to far and far for each step.

By the way, when I input some value into your calculator ,such as 3, then I get:

76,714,239,181,881.8 fourth

this seems looks like my where my_sqrt_inverse2 goes wrong.

Maybe I did something wrong in my code or I misunderstanding your explaination?

Thank you very much!

function my_sqrt_inverse(x)

local g = 1;

for i=1,5 do

local gnew = (g*g + 1/x)/(2*g)

g = gnew;

end

return g;

end

function my_sqrt_inverse2(x)

local g = 1;

for i=1,5 do

local gnew = g*(1.5 - 0.5*x*g*g);

g = gnew;

end

return g;

end

for i=1,10 do

print(1/math.sqrt(i),my_sqrt_inverse2(i));

end

the output:

1 1

0.70710678118655 0.70710644469591

0.57735026918963 0

0.5 -0.5

0.44721359549996 -1

0.40824829046386 -2.3426102130054e+030

0.37796447300923 -6.5250474437501e+044

0.35355339059327 -3.8865809594061e+055

0.33333333333333 -2.151224660005e+064

0.31622776601684 -5.4644191047306e+071

Good article for a start, before diving into the papers, but I agree with some people, some things are not phrased correctly, normalization is not a fancy term for division, there is division into it but it’s dividing each component of a vector with it’s length. And zillion of times, yes it might be just a figure of speech.

Also, I think if you just shift bits in a float (which is just an integer pointer to the floating point bits, not really converted to int) things might go bad, it doesn’t work the same as with integers. But what I guess is, that magic number does all the work and allows things to not go awry, I just need to study more why it works. I remember there was another problem with performance, when converting floats to integer, but a way to solve it was another strange piece of code I found which did something similar with a magic number applied on the bits of the floating point number. It worked like a charm, much faster than conventional FP to Int conversion, but I didn’t know why it worked.

Hi, Kalid.

I am always looking for ways to make my code execute faster, so found this article interesting. However, a few questions come to mind.

Does the input to the “Inverse Square Root Demo” have to be a number within a particular range? I entered 7.917 and it diverged *away* from the result.

Is this algorithm applicable to specific CPUs/compilers/operating systems only? AMD chips used to behave slightly different than Intel chips for some operations. I wonder if that is a factor for this routine. Or perhaps the operating system. If this routine is written in JavaScript for posting on a web site, would its performance improvement still apply, since a programmer has no idea ahead of time what kind of computer, or OS, or browser a user might be running?

Most importantly, are these results still valid? Since this routine was first developed, several improvements have been made and the difference in execution time may no longer be as significant. Do you know if this approach to computing the Fast Inverse of a Square Root still delivers the same performance difference versus the straightforward computation of the inverse of a square root? Or has this routine been rendered moot by developments in other aspects of computing?