I investigated the use of a branchless absolute value function. This techniques alleviates register pressure at the expense of increasing branch mispredictions. The first implementation is for a 386 architecture, where eax holds the value to operate on:
cdq
xor eax,edx
sub eax,edx
It requires some creative thinking to understand exactly why this technique works. The cdq instruction is a sign extension instruction that extends register eax into edx:eax. The xor immediately following can be thought of as a selective invert; that is, the number is ones-complemented, but only if it is negative. The final sub instruction, somewhat paradoxically, adds one to the result if the number is negative, thereby performing a twos-complement. With some effort, one can see that the number remains unchanged if it is originally positive.
For clarity, cdq can be replaced with:
mov edx, eax
sar edx, 31
Both these implementations, of course, clobber register edx. In architectures where register pressure is high, such as the x86, this behavior may be undesirable.
As a result, the WATCOM and x86 gcc compilers use the classical implementation of (x < 0) ? -x : x. Indeed, this is demonstrated in the following x86 generated code:
mov eax, +4[esp]
test eax, eax
jge L1
neg eax
L1: ret
The branch misprediction penalty is acceptable. Similarly, the use of registers in a RISC architecture is relatively inexpensive; the first implementation is therefore preferred. The gist is this: don’t try to outwit the compiler. It’s smarter than you are.