setting the speed for the future of games programming
vectorc

contentsclose
 

ALIGNMENT

Alignment will not have appeared to be a serious problem on PCs until recently. PC processors will read from unaligned addresses without complaining and most compilers will align variables and fields on boundaries equal to their size. However, unaligned accesses can cause a serious penalty (it may be faster to use 2 aligned memory accesses and combine the results).  Also, if the compiler wishes to vectorize, then the values must be aligned to the size of the combined vector, not their own original size. For example:-
void fn_add (short *a, short *b, short *c)
    {
    for (i=0; i<100; i++)
        {
        a [i] = b[i] + c [i];
        }
    }
It is possible with MMX to do 4 16-bit additions at once, which might considerably speed up this loop. However, 'a', 'b' and 'c' would have to aligned on 8-byte boundaries, not the 2-byte boundaries their types would suggest. Therefore, VectorC may not vectorize this loop (the results might be significantly slower if it did).

You need to tell VectorC that 'a', 'b' and 'c' are multiples of 8. This is done with '__declspec (alignedvalue (n))'. e.g.

void fn_add (__declspec (alignedvalue (8)) short *a,
            __declspec (alignedvalue (8)) short *b,
            __declspec (alignedvalue (8)) short *c)
    {
    for (i=0; i<100; i++)
        {
        a [i] = b[i] + c [i];
        }
    }
In the above code, VectorC knows that 'a', 'b' and 'c' point to 8-byte aligned memory and so the loop can be converted to use MMX.  If you call this function with values that are not 8-byte aligned, the code will still work, but may be significantly slower on some machines.  However, the Pentium III's Streaming SIMD extensions work with values of 16 bytes and have fast aligned reads and writes. If you use alignedvalue (16), and VectorC uses SSE, then you may get unaligned access exceptions, so be careful.

You may also need to align variables.  This can be done with '__declspec (align (n))'. For MMX and 3D Now!, align to 8 bytes. For Streaming SIMD Extensions, align to 16 bytes.

By default VectorC aligns variables according to the total size of the variable, not according to the size of the base type of the variable. This is different from other compilers. So, if you declare an array of floats in another compiler, call a VectorC routine that accesses this array, it may assume the array is aligned to 8 or 16-bytes and vectorize accordingly.

When designed structures that contain vectors, you also need to be careful about alignment.  Most compilers align fields according to the size of the base type. So an  array of ints in a structure will be aligned to 4-bytes. VectorC needs to continue this system for compatibility.  So, you should align field positions yourself by re-ordering fields or adding pad bytes.

VectorC will try to align local variables wherever possible. However, this is not always possible and sometimes slow to do. VectorC therefore introduces some new calling conventions which maintain 16-byte stack alignment at all times. This only works effectively if most of your functions are defined with these calling conventions. Unfortunately, it is not possible for code compiled with other compilers to directly call functions defined with CodePlay's calling conventions.

top

contentsclose