June | 2017 | Steve's Programming Blog

A Heisenbug is a type of bug that disappears or alters its behavior when you attempt to debug it.

A coworker came to me today with an apparent odd behavior, which he suspected was the fault of a compiler optimization bug. Basically, the code worked through nested loops to generate a derived image from a base image.

Imagine something like:

for (...) {
  do {
    p1 = baseImage.getPixel(x1, y1);
    p2 = baseImage.getPixel(x2, y2);
    ... // do stuff with p1 and p2
  } while (...) // condition on p1 and p2

  newImage.setPixel(x, y, ...) // color based on p1 and p2 result
}

This code worked completely correctly, but was a little slow. Getting and setting pixels via the Image API is the slow part, so he wanted to introduce a cache (basically, read all the pixels into a 2-D array upfront). This seems straightforward.

double[,] cache = ... // get all pixel intensities into 2-D array
for (...) {
  do {
    p1 = cache[x1, y1];
    p2 = cache[x2, y2];
    ... // do stuff with p1 and p2
  } while (...) // condition on p1 and p2

  newImage.setPixel(x, y, ...) // color based on p1 and p2 result
}

However, when run, this produced plausible but visibly incorrect output. The output made using the cache version contained “block”-like visual artifacts, while the output from the original version was smooth.

He fiddled with it for a while and then asked for advice. He told me up front that he suspected the do-while loop was either incorrectly being optimized or itself had a bug.

My initial thought, of course, was that the cache wasn’t accurate. There were three ways the cache could be inaccurate that I identified:

The cache did not actually contain the correct initial values in the correct ordering
An essential part of the algorithm involved modifying the base image as the work progressed, and the use of cache was disrupting this process
The cache was inadvertently being modified during the process

I wanted to verify that baseImage and newImage were entirely unrelated buffers, that no writing to baseImage occurred, and no writing to cache occurred. This was all verified with code inspection (he had already done this, and I did it too, confirming his result).

Then my coworker showed me something disturbing. He simply reintroduced one call to the Image API, discarding the result:

double[,] cache = ... // get all pixel intensities into 2-D array
for (...) {
  do {
    unusedGarbage = baseImage.getPixel(x1, y1);
    p1 = cache[x1, y1];
    p2 = cache[x2, y2];
    ... // do stuff with p1 and p2
  } while (...) // condition on p1 and p2

  newImage.setPixel(x, y, ...) // color based on p1 and p2 result
}

This immediately resolved the problem, even though the variable “unusedGarbage” was in fact never used, and NO other changes were made. Additionally, as extra proof, he showed an assertion against both the cache and original image:

double[,] cache = ... // get all pixel intensities into 2-D array
for (...) {
  do {
    op1 = baseImage.getPixel(x1, y1);
    op2 = baseImage.getPixel(x2, y2);
    p1 = cache[x1, y1];
    p2 = cache[x2, y2];

    Debug.Assert(p1 == op1);
    Debug.Assert(p2 == op2);
    ... // do stuff with p1 and p2
  } while (...) // condition on p1 and p2

  newImage.setPixel(x, y, ...) // color based on p1 and p2 result
}

These Asserts succeeded completely, proving the cache and image were identical.

Based on this research, he believed that the presence of the Image API call, or its absence, changed the way the compiler optimized the loop; and that the loop optimization using the cache must be incorrect.

I was totally baffled at this point. It took a minute of just thinking:
Computers Do Not Work That Way

After clearing my head of disbelief, I decided to see what else I could come up with. Console writing each p1 and p2 was far too numerous to view, but adding them up into a sum made for a quick poor-man’s check.

double[,] cache = ... // get all pixel intensities into 2-D array
int sum = 0;
for (...) {
  do {
    p1 = cache[x1, y1]; //  baseImage.getPixel(x1, y1);
    p2 = cache[x2, y2]; // baseImage.getPixel(x2, y2);
    sum += p1;
    sum += p2;
    ... // do stuff with p1 and p2
  } while (...) // condition on p1 and p2

  newImage.setPixel(x, y, ...) // color based on p1 and p2 result
}
Console.WriteLine(sum);

We tried this with both the cache and Image API call. In the output image, we could see the distinct difference between the two. However, the calculated sum was identical in both cases! This suggested that the actual loop was running the same and producing the same results.

At this point, I started to wonder about the setting of the pixel on the new image. The … above is an over-simplification; the code was something more like:

double[,] cache = ... // get all pixel intensities into 2-D array
for (...) {
  do {
    p1 = cache[x1, y1]; //  baseImage.getPixel(x1, y1);
    p2 = cache[x2, y2]; // baseImage.getPixel(x2, y2);
    sum += p1;
    sum += p2;
    ... // do stuff with p1 and p2
  } while (...) // condition on p1 and p2

  Random r = new Random();
  if (p1 == p2)
     newImage.setPixel(x, y, r.randomColor())
  else
     newImage.setPixel(x, y, specificColor)
}

We had ignored this section of code as not relevant to the output. After all, the behavior is simply based on the output of the do-while, and if the do-while is working the same, then certainly this code must behave the same. Right?

Noting that Random should not be re-instantiated during program run, I promoted the variable to before the loops and picked a fixed Random seed so that we could compare the output without the differences in Randomness.

Random r = new Random(1); // FIXED seed
double[,] cache = ... // get all pixel intensities into 2-D array
for (...) {
  do {
    p1 = cache[x1, y1]; //  baseImage.getPixel(x1, y1);
    p2 = cache[x2, y2]; // baseImage.getPixel(x2, y2);
    sum += p1;
    sum += p2;
    ... // do stuff with p1 and p2
  } while (...) // condition on p1 and p2

  if (p1 == p2)
     newImage.setPixel(x, y, r.randomColor())
  else
     newImage.setPixel(x, y, specificColor)
}

In this case, both the cache and non-cache versions output was identical! The problem appeared to go away. I removed the fixed seed, and the problem remained solved.

How so?

When a Random() instant is created in .NET, it is seeded with the system time by default. In fact, the MSDN docs warn:

The default seed value is derived from the system clock and has finite resolution. As a result, different Random objects that are created in close succession by a call to the default constructor will have identical default seed values and, therefore, will produce identical sets of random numbers.

Fair enough, but how does this explain the difference in behavior? Why was the program working correctly originally given that it was performing this re-instantiation of Random all along?

In the original version of the program, the Image API calls in the inner loop, being quite slow, were sufficient to cause the initial seed (system time) to be DIFFERENT for each instantiation of Random, thus producing a plausible Random-ish sequence. However, when the cache was used, it was much much faster, causing repeated Random instantiations with the SAME system time, thus generating a series of identical values, appearing as visual “blocks” in the output image.

Monthly Archives: June 2017

A Truly Random() Heisenbug

Recent Posts

Recent Comments

Archives

Categories

Meta