To address these attacks for OpenSSL, Matthew Wood (Intel) submitted a patch that first appeared in version 0.9.7h. The key change is to the modular exponentiation function in openssl/crypto/bn/bn_exp.c.
This patch stripes the pre-computed m2…x values across cache lines instead of storing them sequentially in the array. That is, the first byte of m2 would be at address 0, the first byte of m3 at 1, etc. A memory dump of this region with w = 3 would look like this:
Thus, the access pattern for reading any pre-computed value is exactly the same as any other: 256 sequential cache line reads. This is a clever way of removing the timing leak. I think it is commendable that Intel spent the time developing this code and contributing it to OpenSSL, especially after the widespread criticism that surrounded this problem was directed primarily at them.
There are two problems regarding this approach. The first is the limitation that it can’t fix problems with AES cache timing (see also this detailed analysis of Bernstein’s work and this proposed cache architecture).
"The second may only affect certain hardware. The patch is configurable for machines with a 32 or 64-byte cache line size. The default is set to 64. If an older machine or a non-x86 CPU with a smaller cache line size runs the default compile of OpenSSL, it will still have a small leak if a large window is used (>= actual cache line size). For example, consider a machine with an 8-byte cache line size and a 12-bit window. OpenSSL would store m2…9[0] in the first cache line and m10…11[0] in the second, allowing an attacker to determine if a given set of exponent bits was less than 210 or not. This might be exploitable in certain situations."