A Truly Random() Heisenbug

A Heisenbug is a type of bug that disappears or alters its behavior when you attempt to debug it.

A coworker came to me today with an apparent odd behavior, which he suspected was the fault of a compiler optimization bug. Basically, the code worked through nested loops to generate a derived image from a base image.

Imagine something like:

for (...) {
  do {
    p1 = baseImage.getPixel(x1, y1);
    p2 = baseImage.getPixel(x2, y2);
    ... // do stuff with p1 and p2
  } while (...) // condition on p1 and p2

  newImage.setPixel(x, y, ...) // color based on p1 and p2 result
}

This code worked completely correctly, but was a little slow. Getting and setting pixels via the Image API is the slow part, so he wanted to introduce a cache (basically, read all the pixels into a 2-D array upfront). This seems straightforward.

double[,] cache = ... // get all pixel intensities into 2-D array
for (...) {
  do {
    p1 = cache[x1, y1];
    p2 = cache[x2, y2];
    ... // do stuff with p1 and p2
  } while (...) // condition on p1 and p2

  newImage.setPixel(x, y, ...) // color based on p1 and p2 result
}

However, when run, this produced plausible but visibly incorrect output. The output made using the cache version contained “block”-like visual artifacts, while the output from the original version was smooth.

He fiddled with it for a while and then asked for advice. He told me up front that he suspected the do-while loop was either incorrectly being optimized or itself had a bug.

My initial thought, of course, was that the cache wasn’t accurate. There were three ways the cache could be inaccurate that I identified:

  1. The cache did not actually contain the correct initial values in the correct ordering
  2. An essential part of the algorithm involved modifying the base image as the work progressed, and the use of cache was disrupting this process
  3. The cache was inadvertently being modified during the process

I wanted to verify that baseImage and newImage were entirely unrelated buffers, that no writing to baseImage occurred, and no writing to cache occurred. This was all verified with code inspection (he had already done this, and I did it too, confirming his result).

Then my coworker showed me something disturbing. He simply reintroduced one call to the Image API, discarding the result:

double[,] cache = ... // get all pixel intensities into 2-D array
for (...) {
  do {
    unusedGarbage = baseImage.getPixel(x1, y1);
    p1 = cache[x1, y1];
    p2 = cache[x2, y2];
    ... // do stuff with p1 and p2
  } while (...) // condition on p1 and p2

  newImage.setPixel(x, y, ...) // color based on p1 and p2 result
}

This immediately resolved the problem, even though the variable “unusedGarbage” was in fact never used, and NO other changes were made. Additionally, as extra proof, he showed an assertion against both the cache and original image:

double[,] cache = ... // get all pixel intensities into 2-D array
for (...) {
  do {
    op1 = baseImage.getPixel(x1, y1);
    op2 = baseImage.getPixel(x2, y2);
    p1 = cache[x1, y1];
    p2 = cache[x2, y2];

    Debug.Assert(p1 == op1);
    Debug.Assert(p2 == op2);
    ... // do stuff with p1 and p2
  } while (...) // condition on p1 and p2

  newImage.setPixel(x, y, ...) // color based on p1 and p2 result
}

These Asserts succeeded completely, proving the cache and image were identical.

Based on this research, he believed that the presence of the Image API call, or its absence, changed the way the compiler optimized the loop; and that the loop optimization using the cache must be incorrect.

I was totally baffled at this point. It took a minute of just thinking:
Computers Do Not Work That Way

After clearing my head of disbelief, I decided to see what else I could come up with. Console writing each p1 and p2 was far too numerous to view, but adding them up into a sum made for a quick poor-man’s check.

double[,] cache = ... // get all pixel intensities into 2-D array
int sum = 0;
for (...) {
  do {
    p1 = cache[x1, y1]; //  baseImage.getPixel(x1, y1);
    p2 = cache[x2, y2]; // baseImage.getPixel(x2, y2);
    sum += p1;
    sum += p2;
    ... // do stuff with p1 and p2
  } while (...) // condition on p1 and p2

  newImage.setPixel(x, y, ...) // color based on p1 and p2 result
}
Console.WriteLine(sum);

We tried this with both the cache and Image API call. In the output image, we could see the distinct difference between the two. However, the calculated sum was identical in both cases! This suggested that the actual loop was running the same and producing the same results.

At this point, I started to wonder about the setting of the pixel on the new image. The … above is an over-simplification; the code was something more like:

double[,] cache = ... // get all pixel intensities into 2-D array
for (...) {
  do {
    p1 = cache[x1, y1]; //  baseImage.getPixel(x1, y1);
    p2 = cache[x2, y2]; // baseImage.getPixel(x2, y2);
    sum += p1;
    sum += p2;
    ... // do stuff with p1 and p2
  } while (...) // condition on p1 and p2

  Random r = new Random();
  if (p1 == p2)
     newImage.setPixel(x, y, r.randomColor())
  else
     newImage.setPixel(x, y, specificColor)
}

We had ignored this section of code as not relevant to the output. After all, the behavior is simply based on the output of the do-while, and if the do-while is working the same, then certainly this code must behave the same. Right?

Noting that Random should not be re-instantiated during program run, I promoted the variable to before the loops and picked a fixed Random seed so that we could compare the output without the differences in Randomness.

Random r = new Random(1); // FIXED seed
double[,] cache = ... // get all pixel intensities into 2-D array
for (...) {
  do {
    p1 = cache[x1, y1]; //  baseImage.getPixel(x1, y1);
    p2 = cache[x2, y2]; // baseImage.getPixel(x2, y2);
    sum += p1;
    sum += p2;
    ... // do stuff with p1 and p2
  } while (...) // condition on p1 and p2

  if (p1 == p2)
     newImage.setPixel(x, y, r.randomColor())
  else
     newImage.setPixel(x, y, specificColor)
}

In this case, both the cache and non-cache versions output was identical! The problem appeared to go away. I removed the fixed seed, and the problem remained solved.

How so?

When a Random() instant is created in .NET, it is seeded with the system time by default. In fact, the MSDN docs warn:

The default seed value is derived from the system clock and has finite resolution. As a result, different Random objects that are created in close succession by a call to the default constructor will have identical default seed values and, therefore, will produce identical sets of random numbers.

Fair enough, but how does this explain the difference in behavior? Why was the program working correctly originally given that it was performing this re-instantiation of Random all along?

In the original version of the program, the Image API calls in the inner loop, being quite slow, were sufficient to cause the initial seed (system time) to be DIFFERENT for each instantiation of Random, thus producing a plausible Random-ish sequence. However, when the cache was used, it was much much faster, causing repeated Random instantiations with the SAME system time, thus generating a series of identical values, appearing as visual “blocks” in the output image.

Moving to HTTPS

In the early days of the web, HTML documents were served over HTTP. The original HTTP protocol was a simple human-readable ASCII implementation. Over time, two major changes came to HTTP: In 1994, Netscape introduced an encryption and verification layer known as HTTPS; and in 2015, the W3C finalized the HTTP/2 specification, which includes data compression and server push (bidirectional) capabilities.

HTTPS was originally envisioned only for high-security situations, like banking and e-commerce, but with en-route content injection and other security concerns, vendors began pushing for increased use of encryption and verification. In 2010, a Firefox extension “HTTPS Everywhere” was released, one of the first hints that all traffic would eventually be transferred over an encrypted channel.

Fast forward to 2016. Browser vendors have increased the urgency of moving sites to HTTPS. Google and Firefox have joined efforts to deprecate non-secure HTTP. W3C has joined in, saying that “We recommend that such legacy functionality begin requiring a privileged context as quickly as is reasonably possible.” Current versions of Chrome and Firefox already issue console warnings for data submission over HTTP, and within a year, in one of the first major breaks in backwards compatibility for the web, many existing capabilities will be rejected over HTTP.

Adoption of HTTPS has been held back by the cost of certification (a TLS certificate can easily more than double the cost of web hosting) and, for those running their own servers, the complexity of installation. To act as the carrot corresponding to the aforementioned stick, major web players have created the Extended Verification certificates.

Domain operators who have not yet done so should move to HTTPS by:

  1. Acquiring and installing a TLS certificate from Let’s Encrypt or another reputable source.
  2. Verify and correct any failing/warning components (in particular, any absolute URLs involving http:// should be identified and replaced with https://, as these will cease to work in the future)
  3. Adding a server rewrite rule to redirect all HTTP requests to HTTPs.
  4. Ensuring that subsequent client connections can’t be hijacked via HTTP using HSTS to tell the client to never attempt to connect over non-secure HTTP again. An additional server rewrite rule enables HSTS.

You can check the certificate of your site by verifying the green lock icon:

https

You can also check that HSTS is forcing the HTTPS version of your site by attempting to go to the HTTP only URL and observing an “internal redirect”:
hsts

.NET Decimal Precision

I was confronted by a very unusual claim by a coworker: two .NET Decimals, containing the same value, running through the same block of code, were rendering differently (one as, say, “120” and one as “120.00”). One of the big advantages of Decimal is that, unlike Float or Double, a Decimal represents an exact base-10 value whereas Float and Double are base-2, and so can only approximate many common fractional base-10 values. Therefore, I don’t normally expect problems of this kind with Decimal.

I initially assumed that the coworker was confused; some other input must be coming in, or a different section of code executing, or a different string formatting code used. However, the problem clearly reproduced. Further, it occurred even when no fractional component existed. But stripping away the extraneous, the essence of the code was fairly straightforward:

decimal v = ...;
return v.ToString();

Detailed inspection revealed that in one case, the decimal value was loaded from the user input, and in another case, it was loaded from a database table. In order to test the significance of this, I made a test case which performed both side-by-side. At first, everything seems routine:
decimals

What happened next was anything but routine:
decimals2

What would cause one decimal to render with 2 decimal places and another, the exact same value, to render without them? It turns out that the .NET Decimal implementation intentionally includes the concept of significant trailing zeros, which have no impact on calculations but can carry information about the value’s precision.

Although there is no direct method to expose the precision, the details of the significant trailing zeros arrangement are discussed in the Decimal.GetBits method documentation. In this case, it is clear that the same logical value can be represented with different exponents. In the case above, we can have a value of 120 with an exponent of 0, and a value of 12000 with an exponent of 2 (the exponent “indicates the power of 10 to divide the integer number”), so 12000 * 10-2 = 120.00.

This is indeed confirmed by analysis. The first three bytes contain the value, while the exponent is defined as “Bits 16 to 23” of the fourth byte.

Decimal.GetBits(vFromUser) = [120, 0, 0, 0]
Decimal.GetBits(vFromDb) = [12000, 0, 0, 131072]
131072 >> 16 = 2

This confirms the vFromDb value is represented as 12000 * 10-2 while vFromUser is represented as 120 * 100. Although these values are logically equal, the default implementation of Decimal.ToString() outputs the value with all significant zeros, including trailing zeros.

Although the Decimal class does not expose properties for the precision nor the scale, it is possible to take advantage of the helper SqlDecimal class to access these values. In this context, precision means the total number of digits and scale means the number of digits to the right of the decimal place.

var sFromDb = new System.Data.SqlTypes.SqlDecimal(vFromDb);
Console.WriteLine(sFromDb.Precision);
Console.WriteLine(sFromDb.Scale);

For vFromDb, this outputs a precision of 5 and a scale of 2; while for vFromUser this outputs a precision of 3 and a scale of 0.

Outlook, Unicode, and Copy-Paste

It seemed straightforward: the customer told me they wanted the system to generate an email, and that they would send me the content (a parameterized one-liner). I copy/pasted it into the application source code as a string and parameterized it as specified. I tested the email, and it worked correctly.

Aside: At this point, someone will be saying the email content should have been placed as a resource or template file, and not a string in the source code. Such a claim is correct, but the email content is very short, so it wasn’t worth the extra effort. Also, it would have had no impact on this particular situation.

Having tested it myself, I commit’d the change and sent it to the user for testing. What happened next was a bit … unexpected.

Hi Steve,

It appears we are getting some special characters instead of the – for the Pending Timesheets Subject text. Not sure if we could encode this?

IMPORTANT – Timesheet Not Created – will effect pay

The customer interpreted the issue as a failure to HTML encode the content, except that:

  1. The dash doesn’t need to be HTML-encoded.
  2. I did HTML-encode the message and subject line, although the subject line doesn’t support HTML, so a failed encoding would show up as an explicit HTML entity not a random-seeming sequence of characters.

What it does look like is a Unicode representation issue. Was the dash somehow coming across as a multi-byte Unicode character to a client that didn’t support Unicode?

I tested again, and I wasn’t able to reproduce the problem, but I suspected I knew why it happened. A little investigation confirms that during typing, Outlook autoreplaces hyphen with en-dash when used in a stand-alone way.

Although the hyphen is a normal ASCII symbol (0x2D), the en-dash is represented by multi-byte Unicode code-point (0x2013, to be precise), so if anything non-Unicode compliant receives it, it will render as a sequence of characters.

When someone (the customer or one of his associates) typed the desired phrase “IMPORTANT – Timesheet Not Created … ” into Outlook, Outlook automatically replaced the hyphen with an en-dash, then I just copy/pasted that directly into my code (rather than retyping it, as Visual Studio does no such replacement).

How could I confirm my suspicion, since I couldn’t reproduce the issue myself? I used a hex editor (the 80s are back, baby!) to examine the actual content of the subject line from my source code. You can see in this hex screen the difference between what I copied from Outlook (top) and manually using the hyphen (bottom):

Hex comparison

Anyways the fix is easy: manually replace the en-dash with a regular hyphen.

I sent the email to myself, first with the original subject line (bottom) and then again with the hyphens. If you look closely, you can see that the dash looks different.

Subject line comparison

The hyphens should work fine in any situation.

What I never did figure out is … in this highly homogeneous Outlook-centric corporate environment, who was using a non-Unicode compliant email system and for what purpose?

Performance of array filter in JavaScript

Today a coworker remarked to me that he had been using the JavaScript native array method .filter, but found that writing a custom for loop to do the filtering was much faster. “How can this be?” we wondered, “Native is always faster!”

Since I’m skeptical by nature, my first move was to confirm my coworker’s claim in a controlled environment.

I created an array of a few million random numbers, and a function to filter then based on even or odd.

var vals = [];
for (var i = 0; i < 5000000; i++) {
    vals.push(Math.floor((Math.random() * 100) + 1));
}

var f = function(x) { return x % 2 === 0; };

Then I applied a performance measurement:

function measure(f) {
    var start = performance.now();
    f();
    var end = performance.now();
    var diff = end - start
    document.write(f + " took " + diff + " ms.<br />");    
}

I also wrote a completely naive for-loop based filter method which I attached to the Array prototype.

Array.prototype.naiveForFilter = function(f)   {    
    'use strict';
    var results = [];
    var len = this.length;
    for (var i = 0; i < len; i++) {        
        if (f(this[i])) {
            results.push(this[i]);
        }
    }
    return results;  
};

Finally, I compared the execution time of the two:

measure(function() { vals.filter(f) });
measure(function() { vals.naiveForFilter(f) });

The outcome (in Chrome) was shocking: the naive hand-rolled for loop ran in 1/5 the time of the native filter method. This seemed opposed to all possible common-sense. I asked my coworker if he had done any search, and he said there was an article from 2011, but he figured since it was so many years ago, it wouldn't still apply.

The article claims the slow-down is due to the additional safeties offered by the native method:

  1. It ignores deleted values and gaps in the array
  2. It optionally sets the execution context of the predicate function
  3. It prevents the predicate function from mutating the data

This still seems a little suspect, given that the native method was 5x slower. It also raises the question (just a curiosity) of how much impact each of those safeties costs. Is there one in particular that incurs a higher cost? To dig further, I pulled the Mozilla polyfill reference implementation for filter.

Adding this entry to my profiling, I found it about five times slower than the native method. Wow! So native, doing the same thing, is definitely faster. But still, the question is: where is the expense? I identified two possible cost centers visually in the polyfill: membership checking in the array using "in" and binding the context of the filter function using "call":

if (i in this) {
  if (f.call(thisArg, val, i, this)) {

I guessed that the context binding was the slow part. But to find out, I made multiple versions of the filter function, beginning the naive implementation (known to be very fast), and incrementally adding checks until it approached the complexity of the reference polyfill (known to be very slow). I tested all implementations on both browsers on my machine (Chrome and IE 11). I also compared to an implementation using ES6 foreach.

filter speed

The performance differences are by far most significant in Chrome, where the use of "in" to validate if an element is present in the array is an extremely expensive operation. Introducing this check consumes the vast majority of time in the implementation. Note: this is not the same as an element being undefined, which is much cheaper check.

Interestingly, in IE 11 the performance is much more normalized, and it appears that the function binding (.call) is the most expensive incremental addition to the function, with the in being fairly cheap. IE 11 is also exposed as having poor JavaScript optimization capability, as even a tight naive JS loop is much much slower than a more involved native implementation, whereas in Chrome the opposite is true. I'm assuming Edge is much more performant but haven't tested.

So, even today, if performance is critical, it might be faster to hand roll a custom-tailored function as opposed to using the built-in natives, as the built-in natives may have special support (like filter does for sparse arrays) that are known to be not needed in your specific case. But as Knuth says, "premature optimization is the root of all evil", so always default to using the native functions unless you're sure you need something different.

Unit Testing: Scenario

I was forwarded the following question:

Good afternoon! I was hoping that you might have some advice or guidance on unit test coding in C#? I have read a lot on this and researched it on the internet. The internship has me doing a lot of it right now. I am still confused about how to exactly to approach it, what would be a good strategy? I believe it is similar to validation. Any help would be appreciated.

To show a little bit about how unit tests can be applied in C#, consider this actual customer request which I handled today.

We would like to modify how the “Description/Keywords” field functions as follows:

Search by “Description/Keywords”

  1. A search for 1 keyword will display all firms that have the keyword in description field (as it currently functions)
  2. A search for 2 or more keywords will display all firms that have ANY or ALL keywords in description field, depending on user selection.
  3. We will be removing ”Starts with” and ”Contains” radio buttons for this item only.

The field search used to be based on a contains/starts with option that matched entire substrings. Now the user wants to match keywords in any order. Can we implement this change with the assistance of unit testing?

Start by declaring a method signature to accomplish the task desired:

public enum SEARCH_KEYWORDS { All, Any };
bool SearchKeywords(string haystack, 
    string needle, 
    SEARCH_KEYWORDS stype) { ... }

The body of the method may simply throw a NotImplementedException, or return false.

Next, we can create some unit tests describing functionality that is desired.

For example, it should match or not a single word, regardless of any/all.

[TestMethod()]
public void SearchKeywordsSingleWord()
{
    PublicSearchQueryBC target = new PublicSearchQueryBC();
    Assert.AreEqual(true, target.SearchKeywords("word", "word", PublicSearchQueryDTO.SEARCH_KEYWORDS.Any));
    Assert.AreEqual(true, target.SearchKeywords("word", "word", PublicSearchQueryDTO.SEARCH_KEYWORDS.All));
    Assert.AreEqual(false, target.SearchKeywords("words", "word", PublicSearchQueryDTO.SEARCH_KEYWORDS.Any));
    Assert.AreEqual(false, target.SearchKeywords("words", "word", PublicSearchQueryDTO.SEARCH_KEYWORDS.All));
    Assert.AreEqual(false, target.SearchKeywords("word", "words", PublicSearchQueryDTO.SEARCH_KEYWORDS.Any));
    Assert.AreEqual(false, target.SearchKeywords("word", "words", PublicSearchQueryDTO.SEARCH_KEYWORDS.All));
}

It should match or not a phrase, depending on the phrase and whether any/all is selected. (Note: you might instead wish to divide up the tests as “tests for any” and separately, “tests for all”)

[TestMethod()]
public void SearchKeywordsPhrase()
{
    PublicSearchQueryBC target = new PublicSearchQueryBC();
    Assert.AreEqual(true, target.SearchKeywords("lazy fox jumping bridge", "fox jumping lazy bridge", PublicSearchQueryDTO.SEARCH_KEYWORDS.All));
    Assert.AreEqual(true, target.SearchKeywords("lazy fox jumping bridge", "fox jumping lazy bridge", PublicSearchQueryDTO.SEARCH_KEYWORDS.Any));
            
    Assert.AreEqual(false, target.SearchKeywords("lazy fox jumping bridge", "lazy fox jumping goat bridge", PublicSearchQueryDTO.SEARCH_KEYWORDS.All));
    Assert.AreEqual(true, target.SearchKeywords("lazy fox jumping bridge", "lazy fox jumping goat bridge", PublicSearchQueryDTO.SEARCH_KEYWORDS.Any));

    Assert.AreEqual(false, target.SearchKeywords("lazy fox jumping bridge", "fox jumping lazy goat bridge", PublicSearchQueryDTO.SEARCH_KEYWORDS.All));
    Assert.AreEqual(true, target.SearchKeywords("lazy fox jumping bridge", "fox jumping lazy goat bridge", PublicSearchQueryDTO.SEARCH_KEYWORDS.Any));

    Assert.AreEqual(true, target.SearchKeywords("lazy fox goat jumping bridge", "fox jumping lazy  bridge", PublicSearchQueryDTO.SEARCH_KEYWORDS.All));
    Assert.AreEqual(true, target.SearchKeywords("lazy fox goat jumping bridge", "fox jumping lazy  bridge", PublicSearchQueryDTO.SEARCH_KEYWORDS.Any));

    Assert.AreEqual(false, target.SearchKeywords("lazy fox jumping bridge", "goat rubber ducky", PublicSearchQueryDTO.SEARCH_KEYWORDS.All));
    Assert.AreEqual(false, target.SearchKeywords("lazy fox jumping bridge", "goat rubber ducky", PublicSearchQueryDTO.SEARCH_KEYWORDS.Any));

}

You can think about other kinds of special cases that might be encountered. Should the search be case sensitive? Probably not, so better add tests to make sure it isn’t. Should the search consider punctuation? Definitely not, since it’s keywords. What about blank spaces? Blanks should never match (that’s my opinion), so add tests for it.

[TestMethod()]
public void SearchKeywordsPhraseCap()
{
    PublicSearchQueryBC target = new PublicSearchQueryBC();
    Assert.AreEqual(true, target.SearchKeywords("LAZY FOX jUMPING Bridge", "fox jumping lazy bridge", PublicSearchQueryDTO.SEARCH_KEYWORDS.All));
    Assert.AreEqual(true, target.SearchKeywords("LAZY FOX jUMPING Bridge", "fox jumping lazy bridge", PublicSearchQueryDTO.SEARCH_KEYWORDS.Any));

    Assert.AreEqual(false, target.SearchKeywords("LAZY FOX jUMPING Bridge", "fox jumping goat bridge", PublicSearchQueryDTO.SEARCH_KEYWORDS.All));
    Assert.AreEqual(true, target.SearchKeywords("LAZY FOX jUMPING Bridge", "fox jumping goat bridge", PublicSearchQueryDTO.SEARCH_KEYWORDS.Any));

}

[TestMethod()]
public void SearchKeywordsPunc()
{
    PublicSearchQueryBC target = new PublicSearchQueryBC();
    Assert.AreEqual(true, target.SearchKeywords("lazy, fox: jumping-bridge", "fox jumping lazy bridge", PublicSearchQueryDTO.SEARCH_KEYWORDS.All));
    Assert.AreEqual(true, target.SearchKeywords("lazy, fox: jumping-bridge", "fox-jumping. lazy, bridge", PublicSearchQueryDTO.SEARCH_KEYWORDS.All));
            

}


[TestMethod()]
public void SearchKeywordsBlanks()
{
    PublicSearchQueryBC target = new PublicSearchQueryBC();
    Assert.AreEqual(false, target.SearchKeywords("lazy fox jumping bridge", "", PublicSearchQueryDTO.SEARCH_KEYWORDS.All));
    Assert.AreEqual(false, target.SearchKeywords("lazy fox jumping bridge", "", PublicSearchQueryDTO.SEARCH_KEYWORDS.Any));

    Assert.AreEqual(false, target.SearchKeywords("", "", PublicSearchQueryDTO.SEARCH_KEYWORDS.All));
    Assert.AreEqual(false, target.SearchKeywords("", "", PublicSearchQueryDTO.SEARCH_KEYWORDS.Any));

    Assert.AreEqual(false, target.SearchKeywords("  ", "", PublicSearchQueryDTO.SEARCH_KEYWORDS.All));
    Assert.AreEqual(false, target.SearchKeywords("", "    ", PublicSearchQueryDTO.SEARCH_KEYWORDS.Any));

    Assert.AreEqual(false, target.SearchKeywords("", "lazy fox jumping bridge", PublicSearchQueryDTO.SEARCH_KEYWORDS.All));
    Assert.AreEqual(false, target.SearchKeywords("", "lazy fox jumping bridge", PublicSearchQueryDTO.SEARCH_KEYWORDS.Any));
            
}

Now run the tests. They should all (or almost, depending on the naiveness of the original implementation) fail. That’s fine. Next, go ahead and implement the method, and run the tests again.

no blank

Looks like a test is failing. Turns out my initial implementation didn’t handle blanks the way I expected. So I went ahead and fixed it, and ran the tests again.

fixed blanks

This is where the tests really start to shine. I fixed the blank issue, but, in the progress, accidentally introduced another bug! Once all the bugs are resolved, all tests should pass.

This is a good time to refactor the implementation and make it cleaner, if needed. The tests should be used to ensure functionality isn’t damaged. This is a pattern known as Red-Green-Refactor.

Let’s say you then send the implementation to QA, and a bug is reported. The first step is to add a new failing test reproducing the bug. Then, proceed to fix the bug. You’ll have confidence that the bug is fixed and no regression has occurred when all tests are again showing green.

Enforcing Single Application Instances in WPF with Named Pipes

In some cases, we may wish to prevent the user from starting multiple instances of applications. Instead, the existing (single) instance should be focused. The “single instance” part is pretty easy, but notifying the first instance to come to the front is more tricky. There are many attempts to demonstrate techniques for this around the net, but I found them difficult and unreliable in WPF. In particular, they mostly all depend on using WndProc broadcast messages which are only received when the target window is not minimized, and even then not reliably. Further, there would be no way to actually send any data (like a command line) from the new instance to the original instance.

Rather than continue to struggle with this unproductive path, I took a step back and thought about what I wanted to do: I want to send a message from one instance of an application to another. Isn’t that what named pipes are for?

I created an abstraction based on mutexs (to enforce single instancing) and named pipes (to communicate between instances) that easily allows an application to ensure only a single instance is used. The abstraction could easily be expanded to allow for passing data to the single instance.

Get it on GitHub

To use this abstraction, omit a StartupUri from App.xaml and override OnStartip in the App.xaml.cs:

public partial class App : Application
{
  protected override void OnStartup(StartupEventArgs e)
  {
    // Set a unique application ID
    Guid id = Guid.Parse("DC2A927C-AC89-4512-BB29-7AB0A18DE105");

    // Instantiate an SIA
    SingleInstanceApplication sia = new SingleInstanceApplication(id);            
    // Handle the ApplicationStarts event
    // When this event fires, initialize the application
    sia.ApplicationStarts += (sender, earg) =>
    {
      var mw = new MainWindow();
      mw.Show();
      // If another instance attempts to start, 
      // bring our window to the front
      sia.AnotherInstanceAttemptsToStart += 
        SingleInstanceApplication.BringToFrontWhenCalled(mw);
    };
    // Optionally handle AnotherInstanceAttemptsToStart, for example, 
    // to log other attempts
    sia.AnotherInstanceAttemptsToStart += (sender, earg) =>
    {
      Logger.LogInfo("### Captured another instance trying to start");
    };
    // Run the application and single instance protection
    sia.RunSingleInstance();            
  }

JavaScript scoping strikes again

We have a for loop that populates an observable array in knockout. For some reason, only the first item started being loaded, even though we hadn’t made any changes to the loop, and all the data was still being sent down.

The loop starts off:

for (i = 0; i < ts.Weeks.length; i++) {

I stepped through the loop using Chrome debugger and found that i=0 consistently, all the way until the very end of the body, when we added the data object to the observable array:

self.Weeks.push(week);

At this point, it jumped to i=10 right before going back to the top. Of course, with only a half-dozen data items, the loop then exited. But, nothing else in the loop modifies i, and where was 10 coming from anyways??

Since the array was observable, when it was updated with push, knockout also called all the various “computed” functions. Also since the for loop variable “i” was declared without var, it was actually in the global scope! One of those computed (I didn’t figure out which one), or a library that they used, also had an “i” that was in the global scope, and the two variables were clobbering each other.

Although I didn’t determine the other perpetrator code, simply updating the loop, by adding var, to read

for (var i = 0; i < ts.Weeks.length; i++) {

solved the issue by ensuring the variable i was function scoped.

It’s easy to forget in languages like C# that don’t do global scope by default… JavaScript does do global scope by default!

Arbitrary Clustering in SQL: Tumbling down the rabbit hole

Subtitle: “Things are not as they appear, nor are they otherwise.”

For a demo (in other words, nothing matters), I wanted to take a set of rows of sample data and cluster them into a few groups approximately the same size. I figured “I’ll use a random number”. My first flaw, of course, was to decided on a solution before I fully considered the meat of the problem. But anyways…

So I fool around with RAND() in SQL and I’m not getting good results. I look online and find this Microsoft article: Selecting Rows Randomly from a Large Table. Sounds like it might have something good. They advise a syntax like this for random [see aside, at bottom]:

SELECT * FROM Table1
  WHERE (ABS(CAST(
  (BINARY_CHECKSUM(NEWID()) *
  RAND()) as int)) % 100)

Of course replacing 100 with whatever upper bound is desired.

I’ll try it out with my table:

SELECT CustomerID, 
  (ABS(CAST((BINARY_CHECKSUM(NEWID())) as int)) % 3) 
  FROM dbo.Customers

Looks good! Eyeballing indicates an approximately equal distribute of 0, 1, and 2. No other values.

Now I want to substitute a named value for each possible random value, so I put this expression into a case statement, like:

SELECT CustomerID, 
  CASE (ABS(CAST((BINARY_CHECKSUM(NEWID())) as int)) % 3) 
    WHEN 0 THEN 'Alice'
    WHEN 1 THEN 'Bob'
    WHEN 2 THEN 'Charlie'	
   END Value,
 FROM dbo.Customers

Now things get interesting. Some of the values returned are NULL! How is that possible. Furthermore, if I investigate the distribution, e.g. with:

SELECT TBL.Value, COUNT(*) FROM
(
  SELECT CustomerID, 
    CASE (ABS(CAST((BINARY_CHECKSUM(NEWID())) as int)) % 3) 
      WHEN 0 THEN 'Alice'
      WHEN 1 THEN 'Bob'
      WHEN 2 THEN 'Charlie'	
    END Value	
  FROM dbo.Customers
) TBL
GROUP BY TBL.Value

I get a very non-uniform distribution. I tried it again on a table with more rows, to get a better feel for the distribution, and found an experimental result of:

Value Count Approx % of Rows
Alice 696 32%
NULL 621 29%
Bob 527 24%
Charlie 311 14%

Repeated runs returned similar results. Removing the case statement and just having the inner expression resulting in an approximately uniform distribution of 0, 1, and 2.

How can this be?

I thought about it for a while and concluded that the expression must be being re-evaluated for each “when” of the case statement. This is the only thing I could come up with that would allow for the NULL possibility. But would it generate the distribution being observed? If so, that would confirm the effect.

Since we observed the generator alone was approximately normal, we’ll call it f() and say that the range of f() is {0, 1, 2} with uniform probability.

If f() is being re-evaluated at each WHEN, we end up with a probability distribution as follow:
Probability Distribution

Grouping and summing the probabilities gives us:

Value Theoretical Probability Observed Approximate Probability
Alice (0) 1/3 = 33.3% 32%
NULL 8/27 = 29.6% 29%
Bob (1) 2/9 = 22.2% 24%
Charlie (2) 4/27 = 14.8% 14%

Theory confirms observation: f() is being re-evaluated at each WHEN.

But you already know what I’m going to tell you

Remember the whole intention of this, originally, was to arbitrarily (not necessarily randomly!) partition a set of records into several roughly equal sets. There is a much more straightforward approach if randomness is not required: simply use ROW_NUMBER().

SELECT CustomerID,
  CASE (CAST(ROW_NUMBER() OVER(ORDER BY CustomerID) AS int) % 3) 
    WHEN 0 THEN 'Alice'
    WHEN 1 THEN 'Bob'
    WHEN 2 THEN 'Charlie'	
  END Value
FROM dbo.Customers

Slight Aside

I went back and reviewed the original article and found that NEWID() is actually only advised in conjunction with other named columns. Otherwise, BINARY_CHECKSUM(*) is advised. If BINARY_CHECKSUM(*) is used, everything seems to work out fine! So, to be more specific, NEWID() (and thus, BINARY_CHECKSUM) is being re-evaluated at each row, but RAND() is not. Further, the article notes that RAND() is not needed when NEWID() or a column is specified. So, this whole problem originated in my skimming the article too quickly, and conflating two examples into one!

Posted in SQL

finally: the oft-neglected control structure

We’re all familiar with the try-catch idiom for handling exceptions. And of course, every primer on try-catch also introduces the finally block. It’s often unclear, however, when finally should be used. In other words, if I have a try-catch like so:

try {
  // do action
} catch {
 // handle error
} finally {
 // clean up
}

Why can’t I just

try {
  // do action
} catch {
 // handle error
}

// clean up

If this were an interview, we might answer mechanically: the finally will run even if the catch doesn’t catch the specific exception (if the catch is narrow), or if the catch rethrows the exception, or an exception occurs in the catch block. Putting the cleanup outside the finally will not be run in those cases.

But those cases are not too common, and then the finally is forgotten.

Consider the following case of a maintenance routine that requires a listening host be temporarily suspended. The maintenance routine may throw an exception, however, a higher-level exception handler (catch) already exists which implements the desired logging response:

StopHost();

DoMaintenance();

StartHost();

The problem, of course, is that when DoMaintenance throws an exception, it will bubble up to the catch handler and StartHost will never happen. This was an actual bug I encountered. The exception handler worked correctly, logging the fault, but the host was never restarted resulting in a system down until the next maintenance window.

I don’t really want to handle the exception at this level, but I do want to make sure that the host gets started again regardless of the outcome of the maintenance. Solution? finally!

StopHost();

try {
  DoMaintenance();
} finally {
  StartHost();
}

You don’t need a catch block to take advantage of finally.

MSDN Article on try-finally