setting the speed for the future of games programming

3DNOW!

3DNow! is a set of new processor instructions added to AMD's K6-2 processor. All AMD's processors since the K6-2 support 3DNow!. The new instructions perform single-precision floating-point operations on values in MMX registers. Because MMX registers are used, all the restrictions applied to MMX registers apply to 3DNow! code also. So, it is not possible to mix 3DNow! code and normal FPU floating-point code. Because only a limited set of floating-point operations are supplied by 3DNow!, if you want to make your code work with these new instructions, you should not use any unsupported floating-point operations in the same functions or loops.

In addition to the normal integer vectors, MMX registers can now also be treated as type:-

float [2];

It is also possible to use 3DNow! instructions to operate on normal floattypes.

Floating-point operations carried out in 3DNow! registers will not produce exactly the same results in all circumstances as that of normal floating-point code. If 2 versions of your program must produce exactly the same results, you should enable consistent precision.

3DNow! is ideal for multiplying vectors and matrices as well as 3D projection. It can also be used with MMX for image and sound processing using floating-point intermediates.

Operations Supported by 3DNow!

Arithmetic - addition, subtraction, multiplication on floats or float vectors

e.g.

Source code

Compiled for 3DNow!

typedef struct {float f [2];} VECTOR2F;

VECTOR2F __declspec (codeplay_3dnow) example (VECTOR2F a, VECTOR2F b) { VECTOR2F r; r.f [0] = a.f [0] * b.f [0]; r.f [1] = a.f [1] * b.f [1]; return r; }

@example@3DN_16: pfmul mm0,mm1
ret

Reciprocal 12-bit precision - floats

VectorC will also do 12-bit division by calculating the reciprocal of the right hand side and multiplying by the left hand side.

e.g.

Source code	Compiled for 3DNow!
`float __declspec (codeplay_3dnow) example (float a) { return 1 __hint__ ((precision (12))) / a; }`	`@example@3DN_4: pfrcp mm0,mm0` `ret`

Reciprocal full precision - floats

VectorC will also do division by calculating the reciprocal of the right hand side and multiplying by the left hand side.

e.g.

Source code	Compiled for 3DNow!
`float __declspec (codeplay_3dnow) example (float a) { return 1 / a; }`	`@example@3DN_4: movq mm2,mm0 pfrcp mm1,mm2 pfrcpit1 mm2,mm1 pfrcpit2 mm2,mm1 movq mm0,mm2 ret`

Reciprocal Square Root - 12-bit precision float.

The "sqrt" function in "math.h" is declared with doubles. Doubles cannot be processed with 3DNow!, so you can either declare your own version of "sqrt", or use the command-line switch "/vec:single" or "/single". This will use a "float" version of "sqrt" if the argument is a float.

e.g.

Source code

Compiled for 3DNow!

float sqrt (float);

float __declspec (codeplay_3dnow) example (float a) { return 1 / __hint__ ((precision (12))) sqrt (a); }

@example@3DN_4: pfrsqrt mm0,mm0 ret

Conversion to or from 32-bit signed integer or signed 32-bit integer vector

e.g.

Source code

Compiled for 3DNow!

typedef struct {float f [2];} VECTOR2F; typedef struct {int i [2];} VECTOR2SD;

VECTOR2SD __declspec (codeplay_3dnow) example (VECTOR2F a) { VECTOR2SD r; r.i [0] = a.f [0]; r.i [1] = a.f [1]; return r; }

@example@3DN_8: pf2id mm0,mm0
ret

Minimum and Maximum - floats or float vectors

e.g.

Source code

Compiled for 3DNow!

typedef struct {float f [2];} VECTOR2F;

float __inline min (float a, float b) { if (b < a) a = b; return a; }

VECTOR2F __declspec (codeplay_3dnow) example (VECTOR2F a, VECTOR2F b) { VECTOR2F r; float f; r.f [0] = min (a.f [0], b.f [0]); r.f [1] = min (a.f [1], b.f [1]); return r; }

@example@3DN_16: pfmin mm0,mm1
ret

Absolute and Negate - floats and float vectors

The MMX logical instructions can be used to negate floats and calculate the absolute (positive) values.

Conditional Move - float, float vectors

It is possible to conditional assignments on vectors using a sequence of instructions without branching. This can be much faster than branching (which can be very slow on modern processors).

e.g.

Source code

Compiled for SSE

typedef struct {float f [2];} VECTOR2F;

float __inline cond (float a, float b) { if (a == 5) a = b; return a; }

VECTOR2F __declspec (codeplay_3dnow) example (VECTOR2F a, VECTOR2F b) { VECTOR2F r; r.f [0] = cond (a.f [0], b.f [0]); r.f [1] = cond (a.f [1], b.f [1]); return r; }

@example@3DN_16: movq mm3,const movq mm4,mm0 movq mm5,mm1 pcmpeqd mm4,mm3 movq mm2,mm0 pand mm5,mm4 pandn mm4,mm2 por mm4,mm5 movq mm2,mm4 movq mm0,mm2 ret

const dd 40A00000H,40A00000H

Prefetching

Prefetching is available on processors with 3DNow! support.