setting the speed for the future of games programming
vectorc

contentsclose
 

3DNOW!

3DNow! is a set of new processor instructions added to AMD's K6-2 processor. All AMD's processors since the K6-2 support 3DNow!. The new instructions perform single-precision floating-point operations on values in MMX registers. Because MMX registers are used, all the restrictions applied to MMX registers apply to 3DNow! code also. So, it is not possible to mix 3DNow! code and normal FPU floating-point code. Because only a limited set of floating-point operations are supplied by 3DNow!, if you want to make your code work with these new instructions, you should not use any unsupported floating-point operations in the same functions or loops.

In addition to the normal integer vectors, MMX registers can now also be treated as type:-

float [2];

It is also possible to use 3DNow! instructions to operate on normal float types.

Floating-point operations carried out in 3DNow! registers will not produce exactly the same results in all circumstances as that of normal floating-point code. If 2 versions of your program must produce exactly the same results, you should enable consistent precision.

3DNow! is ideal for multiplying vectors and matrices as well as 3D projection. It can also be used with MMX for image and sound processing using floating-point intermediates.

Operations Supported by 3DNow!



Arithmetic - addition, subtraction, multiplication on floats or float vectors

e.g.

Source code
Compiled for 3DNow!

typedef struct {float f [2];} VECTOR2F;

VECTOR2F __declspec (codeplay_3dnow) example (VECTOR2F a, VECTOR2F b)
   {
   VECTOR2F r;
   r.f [0] = a.f [0] * b.f [0];
   r.f [1] = a.f [1] * b.f [1];
   return r;
   }

@example@3DN_16:
   pfmul mm0,mm1

   ret



Reciprocal 12-bit precision - floats

VectorC will also do 12-bit division by calculating the reciprocal of the right hand side and multiplying by the left hand side.

e.g.

Source code
Compiled for 3DNow!

float __declspec (codeplay_3dnow) example (float a)
   {
   return 1 __hint__ ((precision (12))) / a;
   }

@example@3DN_4:
   pfrcp mm0,mm0

   ret



Reciprocal full precision - floats

VectorC will also do division by calculating the reciprocal of the right hand side and multiplying by the left hand side.

e.g.

Source code
Compiled for 3DNow!

float __declspec (codeplay_3dnow) example (float a)
   {
   return 1 / a;
   }

@example@3DN_4:
   movq mm2,mm0
   pfrcp mm1,mm2
   pfrcpit1 mm2,mm1
   pfrcpit2 mm2,mm1
   movq mm0,mm2
   ret



Reciprocal Square Root - 12-bit precision float.

The "sqrt" function in "math.h" is declared with doubles. Doubles cannot be processed with 3DNow!, so you can either declare your own version of "sqrt", or use the command-line switch "/vec:single" or "/single". This will use a "float" version of "sqrt" if the argument is a float.

e.g.

Source code
Compiled for 3DNow!

float sqrt (float);

float __declspec (codeplay_3dnow) example (float a)
   {
   return 1 / __hint__ ((precision (12))) sqrt (a);
   }

@example@3DN_4:
   pfrsqrt mm0,mm0
   ret



Conversion to or from 32-bit signed integer or signed 32-bit integer vector

e.g.

Source code
Compiled for 3DNow!

typedef struct {float f [2];} VECTOR2F;
typedef struct {int i [2];} VECTOR2SD;

VECTOR2SD __declspec (codeplay_3dnow) example (VECTOR2F a)
   {
   VECTOR2SD r;
   r.i [0] = a.f [0];
   r.i [1] = a.f [1];
   return r;
   }

@example@3DN_8:
   pf2id mm0,mm0

   ret



Minimum and Maximum - floats or float vectors

e.g.

Source code
Compiled for 3DNow!

typedef struct {float f [2];} VECTOR2F;

float __inline min (float a, float b)
   {
   if (b < a) a = b; return a;
   }

VECTOR2F __declspec (codeplay_3dnow) example (VECTOR2F a, VECTOR2F b)
   {
   VECTOR2F r;
   float f;
   r.f [0] = min (a.f [0], b.f [0]);
   r.f [1] = min (a.f [1], b.f [1]);
   return r;
   }

@example@3DN_16:
   pfmin mm0,mm1

   ret



Absolute and Negate - floats and float vectors

The MMX logical instructions can be used to negate floats and calculate the absolute (positive) values.

Conditional Move - float, float vectors

It is possible to conditional assignments on vectors using a sequence of instructions without branching. This can be much faster than branching (which can be very slow on modern processors).

e.g.

Source code
Compiled for SSE

typedef struct {float f [2];} VECTOR2F;

float __inline cond (float a, float b)
   {
   if (a == 5) a = b; return a;
   }

VECTOR2F __declspec (codeplay_3dnow) example (VECTOR2F a, VECTOR2F b)
   {
   VECTOR2F r;
   r.f [0] = cond (a.f [0], b.f [0]);
   r.f [1] = cond (a.f [1], b.f [1]);
   return r;
   }

@example@3DN_16:
   movq mm3,const
   movq mm4,mm0
   movq mm5,mm1
   pcmpeqd mm4,mm3
   movq mm2,mm0
   pand mm5,mm4
   pandn mm4,mm2
   por mm4,mm5
   movq mm2,mm4
   movq mm0,mm2
   ret

const dd 40A00000H,40A00000H



Prefetching

Prefetching is available on processors with 3DNow! support.

top

contentsclose