setting the speed for the future of games programming
vectorc

contentsclose
 

PREFETCH

Prefetching is a technique introduced in recent processors to speed up memory access.  On the Intel Pentium III and AMD K6-2 and above, you can use prefetching to tell the processor to move data from main memory into the caches while the processor is doing other work.  The main difference between using prefetch and just reading the data in early is that prefetch will not cause exceptions when accessing illegal memory or hardware memory.  Therefore, it is not necessary to worry about whether the memory prefetched will actually be used.  Also, fewer processor resources (e.g. registers) are used when prefetching compared with executing a read instruction.

Prefetches are a hint to the processor and may be removed by the optimizer if registers are not available, or too many prefetches in the same place or to the same area of memory, or if the processor does not support them.

Prefetches are not guaranteed to speed up your program.  You may even get a small slow-down.

There are 4 ways to generate prefetch instructions in VectorC:-

  1. Tell the compiler to prefetch memory accesses to all variables that are larger than the size of the caches.  This is done by adding '/uncachedsizen' (when using vectorc.exe) or '/vec:uncachedsizen' (when using VectorCL.exe for Microsoft compatibility).

  2. This option is easy, but not very precise.
  3. Tell the compiler to prefetch memory accesses within specific variables.

  4. This is done by putting '__hint__((prefetch))' at the start of a variable declaration.
  5. Tell the compiler to prefetch memory accesses via specific pointer variables.

  6. This is done by putting '__hint__((prefetch))' before the '*' in the pointer declaration.
  7. Issue a prefetch directly. Use '__hint__((prefetch(address)));'.  Remember that the address is of data that will be required, soon (maybe the next loop iteration).  You should experiment with the prefetch distance to find the best results.  Prefetch reads in an entire cache line which is usually at least 16 bytes on a 16-byte alignment boundary.
The __hint__ built-in function is defined here.

top

contentsclose