Category Archives: .NET

Uncovering a Memory Leak using WinDbg

The Crash…

Recently, we’ve been having a Windows service hosting a WCF endpoint crashing increasingly frequently due to out of memory errors. Although the usage of the service had increased, it didn’t seem likely that the service would actually consume its entire address space in the course of legitimate operation, so I naturally assumed a memory leak.

First, I perused all static collections and looked for any possible leaks. Only one small leak was found through inspection, which I strongly doubted could be the actual cause. Another staff recommended that the SQL server might be consuming all available memory, and that an upper limit should be set. We did indeed set the limit. However, these two changes did not resolve the issue. One of my coworkers suggested investigating the crash memory dump using WinDbg.

Using WinDbg

I hadn’t used WinDbg before and getting started was, frankly, daunting. After several hours, however, I was able to get an effective flow. The first mistake I made was loading the wrong bitness of WinDbg. The application in question is 32-bit (due to its dependency on several legacy DLLs), but I was trying to use the 64-bit WinDbg. No dice (but some strange and confusing errors that slowed me down). Finally I got the right WinDbg and the right memory dump and I was ready to go.

After opening the dump, the first step is to load the reference symbols for Microsoft DLLs:

.sympath srv*

Next, in order to enable debugging of .NET memory dumps, a specific .NET plugin, psscor, needs to be loaded. Since this is a .NET 4 app, version 4 of psscor is needed.
I heavily leaned on the article Intro to WinDbg for .NET Developers. Note in the subsequent examples references to “k:\windbg” represent the working directory where I accumulated the dump and helper files.

.load k:\windbg\psscor4.dll
.symfix c:mylocalsymcache 
.cordll -ve -u -l

These commands will be sufficient for basic .NET debugging. But as I thrashed around for several hours exploring the dump, I found another plugin that extends psscor with additional capability. The SOSEX extension specifically includes reference tracking, which I found very helpful.

.load k:\windbg\sosex.dll

A helpful cheat sheet clearly describes all available commands at this point.

Another short article on debugging memory leaks got me on the right track. I also referenced a source describing how to access object values.

An overall view of memory usage, sorted from least total usage to most, is the first point of analysis:

!dumpheap -stat
      MT    Count    TotalSize Class Name
...[many small entries omitted]
71148b08    10520      1346560 System.Data.DataColumn
735ceb9c    17895      1431600 System.Runtime.Remoting.ServerIdentity
735f0764    13218      1506528 System.Int32[]
735f2388    10785      4252704 System.Collections.Hashtable+bucket[]
735d6a0c     3313      7084876 System.Decimal[]
711493a0       52     17129584 System.Data.DataRow[]
735ef46c    32056     21923408 System.String[]
735eed0c    12115     36322560 System.Object[]
735f35fc    59441     54614724 System.Byte[]
735ee918  3079311     64983264 System.String
71149898      932     90479536 System.Data.RBTree`1+Node[[System.Data.DataRow, System.Data]][]
735eba00  5610857    134660568 System.Guid
71148a30  2821656    191872608 System.Data.DataRow

I immediately noticed that the top entry was DataRow, and several other of the top entries seemed to be related (RBTree of DataRow, DataRow[], and DataColumn). Guid is also related, since all the rows in the database (and, thus, all DataRow in memory) are keyed on Guids. The application in fact uses DataTable heavily to bridge SQL and WCF. In no case, however, should the application be holding several million rows in active memory. According to this dump, more than 2 million DataRows are accumulated in process memory.

I suspected a leak where DataRows were being retained. However, no instance of DataTable, DataColumn, nor DataRow are being retained in any obvious or explicit cache or static table.

Next step is to investigate the details. Since DataRow is the largest, I selected it to investigate further. Note the “71148a30” refers to the MT (method table) from the previous output.

!dumpheap -mt 71148a30

This run outputs thousands of entries, but visually as they flow by, you can see that they are “clustered”, suggesting a relatively small number of collections with many many rows each. Supporting this visual analysis is the “52” instances of DataRow[] in the first listing.

Address       MT     Size
015540a8 71148a30       68     
0155412c 71148a30       68     
015541b0 71148a30       68     
01554234 71148a30       68     
015542b8 71148a30       68     
...[thousands of rows omitted]
3ad0c110 71148a30       68     
3ad0c17c 71148a30       68     
3ad0fd78 71148a30       68 

So we’ve confirmed that we have a variety (probably 52) DataTables in memory each with many, many rows. This latter part is unusual, since there are few places in the application which expect many, many rows. Most DataTables have 1 to 100 rows, not thousands+.

It might be useful, instead, to follow the chain of the 52 DataRow[] entries. Note the MT reference for them from the first table: 711493a0

!dumpheap -mt 711493a0
 Address       MT     Size
015ee00c 711493a0       12     
015f16e4 711493a0      524     
01a6506c 711493a0       12     
01c60bb4 711493a0      524     
...[couple dozen rows omitted, all size = 524]   
2c276c28 711493a0      524     
2c6a5278 711493a0      524     
3996c578 711493a0    65548     
40a98ed8 711493a0   262156     
53071010 711493a0 16777228     

Whoa what’s this. A single DataRow[] (that is, a single table) is taking a HUGE amount of memory. Naturally that’s a target of interest. However, the many smaller entries could be interesting as well. We can investigate both.

The first leak – many smaller objects

First, we’ll investigate the smaller, many instance.

We can use the gcroot command to find the path (or paths) from the object to the active, live object at the root of the application.

!gcroot 2c276c28 
    00c613ec (pinned handle)
    -> 024c34c8 System.Object[]
    -> 01bff620 System.Collections.Generic.Dictionary`2[[System.Object, mscorlib],[System.Collections.Generic.List`1[[Microsoft.Win32.SystemEvents+SystemEventInvokeInfo, System]], mscorlib]]
    -> 01bff968 System.Collections.Generic.Dictionary`2+Entry[[System.Object, mscorlib],[System.Collections.Generic.List`1[[Microsoft.Win32.SystemEvents+SystemEventInvokeInfo, System]], mscorlib]][]
    -> 01c0d9f0 System.Collections.Generic.List`1[[Microsoft.Win32.SystemEvents+SystemEventInvokeInfo, System]]
    -> 0dee8adc Microsoft.Win32.SystemEvents+SystemEventInvokeInfo[]
    -> 2c2214ac Microsoft.Win32.SystemEvents+SystemEventInvokeInfo
    -> 2c22148c Microsoft.Win32.UserPreferenceChangedEventHandler
    -> 2c27b3ac System.Windows.Forms.ContextMenuStrip
    -> 2c27b650 System.ComponentModel.EventHandlerList
    -> 2c27c650 System.ComponentModel.EventHandlerList+ListEntry
    -> 2c27c388 System.ComponentModel.EventHandlerList+ListEntry
    -> 2c27c354 System.ComponentModel.EventHandlerList+ListEntry
    -> 2c27c320 System.ComponentModel.EventHandlerList+ListEntry
    -> 2c27c300 System.EventHandler
    -> 2c27b178 Microsoft.Reporting.WinForms.ReportPanel
    -> 2c27acc8 Microsoft.Reporting.WinForms.WinRSviewer
    -> 2c279e64 System.Windows.Forms.Panel
    -> 2c279af0 System.Windows.Forms.TableLayoutPanel
    -> 2c2798a0 Microsoft.Reporting.WinForms.RVSplitContainer
    -> 2c277a3c System.Windows.Forms.Panel
    -> 2c2776d4 System.Windows.Forms.TableLayoutPanel
    -> 2c277484 Microsoft.Reporting.WinForms.RVSplitContainer
    -> 2c276e84 Microsoft.Reporting.WinForms.ReportViewer
    -> 2c277114 Microsoft.Reporting.WinForms.ReportHierarchy
    -> 2c277124 System.Collections.Generic.Stack`1[[Microsoft.Reporting.WinForms.ReportInfo, Microsoft.ReportViewer.WinForms]]
    -> 2c21f0ac Microsoft.Reporting.WinForms.ReportInfo[]
    -> 2c21efec Microsoft.Reporting.WinForms.ReportInfo
    -> 2c220fc4 Microsoft.Reporting.WinForms.LocalReport
    -> 2c28de40 Microsoft.Reporting.WinForms.SubreportProcessingEventHandler
    -> 2c218b40 <error>
    -> 2c22ce00 System.Data.DataSet
    -> 2c22ce6c System.Data.DataTableCollection
    -> 2c22ce90 System.Collections.ArrayList
    -> 2c2398f8 System.Object[]
    -> 2c22f884 System.Data.DataTable
    -> 2c22fa0c System.Data.RecordManager
    -> 2c276c28 System.Data.DataRow[]

That’s a lot to take in. The list goes from the bottom (the instance in question) to the top. On the way, it tells us this table is owned by a WinForm SSRS reporting component which in turn is held alive by an event handler.

You might recall this application is not a Desktop app; it is a headless Windows service. Nevertheless, we embed the Winforms SSRS component to be able to convert RDL into PDF as part of an automated process. This happens occasionally, and once the PDF is done no further work is done on the data. There is absolutely NO REASON multiple alive references should exist to this component. It’s a leak!

But why?

Looking into the code exposes the reason:

var sia = new WsbisReports.BridgeInventory.SI.SIA();
sia.PrepareReport(result, p);
// ... find destFilePath

Here SIA inherits from a class which encapsulates WinForms LocalReport. What’s missing? Dispose. LocalReport is a disposable class, but my consumers never call Dispose on it. Corrective action? Make the SIA (more correctly, the base RDLReport class in the application) implement IDisposable, have it Dispose the LocalReport object, and wrap all instances in a using statement to ensure disposal.

A discussion online indicates that, in additional to disposal, several other actions might be helpful:

  • report.ReleaseSandboxAppDomain();
  • Unsubscribe to any SubreportProcessing events
  • Clear Data Sources
  • Dispose the report

The first three actions can be completed in the new Dispose method once the encapsulating class implements IDisposable.

A few large objects

There are three larger DataTable objects in this dump. We’ll look at each one. We’ll start with the smallest of the three.

!gcroot 3996c578 
Thread 9e78:
    0b97ef3c 7072cf80 System.ServiceModel.Channels.ServerReliableChannelBinder`1+DuplexServerReliableChannelBinder`1[[System.__Canon, mscorlib],[System.__Canon, mscorlib]].OnSend(System.__Canon, System.ServiceModel.Channels.Message, System.TimeSpan)
        ebp-10: 0b97ef4c
            ->  39c11654 System.ServiceModel.Security.SecurityAppliedMessage
            ->  39c11158 System.ServiceModel.Dispatcher.OperationFormatter+OperationFormatterMessage
            ->  39c11178 System.ServiceModel.Dispatcher.OperationFormatter+OperationFormatterMessage+OperationFormatterBodyWriter
            ->  390f6220 System.Data.DataSet
            ->  390f628c System.Data.DataTableCollection
            ->  390f62b0 System.Collections.ArrayList
            ->  3922fa2c System.Object[]
            ->  3922e600 System.Data.DataTable
            ->  3922e788 System.Data.RecordManager
            ->  3996c578 System.Data.DataRow[]

Conclusion: This is an “in-flow” object being processed by WCF. Not a leak.

How about the medium-sized object?

!do 40a98ed8 
Name:        System.Data.DataRow[]
MethodTable: 711493a0
EEClass:     7322f4e4
Size:        262156(0x4000c) bytes
Array:       Rank 1, Number of elements 65536, Type CLASS

First glance, it appears that the table has allocated an array for up to 65,536 rows. That’s a lot of rows compared to the in-memory tables normally processed. We can trace the objects life through the garbage collector.

!gcroot 40a98ed8 
Thread 2668:
    1531daf0 07887608 WsbisWcfService.dll!Unknown
        ebp+e8: 1531dbdc
            ->  3a09b9d0 System.Data.DataTable
            ->  3a09bb58 System.Data.RecordManager
            ->  40a98ed8 System.Data.DataRow[]

This object is alive, with a reference being held by our service.

We can first use the owning DataTable to find out the actual number of rows (65,536 is just the allocated array size).

!mdt 3a09b9d0 
3a09b9d0 (System.Data.DataTable)
...[omitted rows]
    nextRowID:0xad77 (System.Int64)
...[omitted rows]

0xad77 = 44,407 in decimal. So the table has about 44 thousand rows. Still a lot of rows for what the app normally does, but not inconceivably huge.

What’s the location of the hold? Since I’m new to WinDbg, I wasn’t able to figure out how to get the method name from the address, but I was able to decompile the method:

!u 07887608
Normal JIT generated code
Begin 07886ef8, size 9bf
07886ef8 55              push    ebp
07886ef9 8bec            mov     ebp,esp
...[omitted rows]
07886f95 ff35b4dd6002    push    dword ptr ds:[260DDB4h] (" request received from ")
...[omitted rows]
07887603 e8f89149f9      call    00d20800 (WsbisClient.dll!Unknown, mdToken: 0600004d)
>>> 07887608 e8704ec86c      call    clr!IL_Rethrow (7450c47d)
0788760d c745e400000000  mov     dword ptr [ebp-1Ch],0
...[omitted rows]
0788769d 8b15dcdd6002    mov     edx,dword ptr ds:[260DDDCh] ("SYS_CHANGE_OPERATION")
...[omitted rows]

So the key thing I picked up in the giant code dump is the string constants, which let me identify the method in question. Further, I note that the instruction is a “rethrow”, so I tracked down the try-catch with a rethrow between the two string constants in the method. This lets me precisely identify where the execution pointer was in method was when the dump was created.

The method is called “GetChangeTable” and is used by the client to request a table of rows that have changed for local database synchronization purposes. However, 44,000 rows in a single request is unusually high for synchronization, which happens rather continually. Even when the client is disconnected for several days, a single batch of 44,000 is more than expected.

Let’s find out more before moving on.

Finding the data in a DataTable

In normal .NET code, we can access the data in a DataTable via an indexer on the Rows property, e.g. t.Rows[0][“column”]

However, looking at the memory dump for the DataTable, it is more difficult to find the actual data. I ended up referencing the actual .NET source code for DataRow to find out where the data is stored:

/// <devdoc>
///    <para>Gets
///       or sets all of the values for this row through an array.</para>
/// </devdoc>
public object[] ItemArray {
  get {
    int record = GetDefaultRecord();
    _table.recordManager.VerifyRecord(record, this);
    object[] values = new object[_columns.Count];
    for (int i = 0; i < values.Length; i++) {
      DataColumn column = _columns[i];
      VerifyValueFromStorage(column, DataRowVersion.Default, column[record]);
      values[i] = column[record];
    return values;

So the row's item array isn't real at all! The data is actually stored on the column.

!mdt 3a09b9d0 
3a09b9d0 (System.Data.DataTable)
...[omitted rows]
    columnCollection:3a09bb90 (System.Data.DataColumnCollection)
...[omitted rows]

!mdt 3a09bb90
3a09bb90 (System.Data.DataColumnCollection)
...[omitted rows]
_list:3a09bbc8 (System.Collections.ArrayList)
...[omitted rows]

!mdt -e:2 3a09bbc8
3a09bbc8 (System.Collections.ArrayList)
[0] 3a0dc77c (System.Data.DataColumn)
...[omitted rows]
    _columnName:3a0dc6ec (System.String) Length=20, String="SYS_CHANGE_OPERATION"
...[omitted rows]
    _storage:3a0dc908 (System.Data.Common.StringStorage)
...[omitted rows]

!mdt 3a0dc908
3a0dc908 (System.Data.Common.StringStorage)
    Column:3a0dc77c (System.Data.DataColumn)
...[omitted rows]
    values:40a18e98 (System.String[], Elements: 65536)

Let's stop here for a minute. We've found that the first column is named "SYS_CHANGE_OPERATION", which confirms that this table is a GetChangeRows synchronizer table (because that's the hard-coded column name in the SQL procedure that generates the change table result). We've further found the backing array of up to 65,536 values for the column. This column indicates a command code which describes what kind of operation will be done (update, insert, delete, or re-sync).

!mdt -e:2 -count:5 40a18e98
40a18e98 (System.String[], Elements: 65536)
[0] 3a0dc8e0 "R"
[1] 3a0dc9a8 "R"
[2] 3a0dca14 "R"
[3] 3a0dca80 "R"
[4] 3a0dcaec "R"

The elements are command code "R", which is used when the client needs to resync the entire table. That explains why there are so many rows! The server already has a mechanism that allows it to tell the client to request a subset of rows. Maybe to save memory, the server should ALWAYS require the client to request a subset of rows when a resync is performed? That will mean smaller DataTable allocations.

Looking at the source code reveals a rather silly "bug": The server has a maximum row limit. If the row limit is exceeded, it will refuse to send the table to the client and send a message telling the client to request fewer rows. However, the server still loads the ENTIRE table from the SQL procedure, and then counts the rows using Rows.Count. This means, at least temporarily, an unnecessary very large table will exist in memory! The SQL procedure should be modified to take the max rows as a parameter, and simply return a message to the service if there are too many rows (rather than returning the whole table, having the service instantiate it in memory, and then count the rows).

The "final" twist

We've uncovered a lot so far, but there's still that monster giant DataTable to investigate.

!do 53071010 
Name:        System.Data.DataRow[]
MethodTable: 711493a0
EEClass:     7322f4e4
Size:        16777228(0x100000c) bytes
Array:       Rank 1, Number of elements 4194304, Type CLASS

Wow! That's a huge table! Note that the number of rows FAR exceeds the maximum row limit (described previously) that should ever be sent! We can trace the objects life through the garbage collector.

!gcroot 53071010 
Finalizer Queue:
    -> 19d71aac System.Data.DataColumn
    -> 19d6c49c System.Data.DataTable
    -> 19d6c624 System.Data.RecordManager
    -> 53071010 System.Data.DataRow[]

Warning: These roots are from finalizable objects that are not yet ready for finalization.

Wait, what? I had always assumed objects had two states: alive (active reference) or ready for garbage collection. It turns out there is a third state: in the finalizer queue. Objects which have a finalizer are placed on the finalizer queue. The object must be finalized before it is eligible for garbage collection, even if no references exist. The finalizer, in turn, runs through the queue on a single thread in the background. Discussion on SO.

Generally, finalization queue is avoided by having the Dispose method call SuppressFinalize, exempting the object from the finalizer queue.

But disposing is only used when an object incurs unmanaged resources. Why is it happening here, for a DataTable?

According to one discussion:

The Dispose method in DataSet exists ONLY because of side effect of inheritance-- in other words, it doesn't actually do anything useful in the finalization. The class DataSet inherits from "System.ComponentModel.MarshalByValueCompenent" which implements the IDisposable interface because it is a component.

Great. So DataTable's have a finalizer, which means they get stuck in the finalization queue for a while before the garbage collector can dispose of them. But wait! When we return a DataTable via WCF, WCF will call dispose on both parameters that are passed in and returned parameters. So how can this be explained?

Looking at the service code, we already learned that when a too-large table is created, it is instantiated in memory and the rows counted. Then, if there are too many rows, the table is NOT returned via WCF to the client; instead, a message telling the client to "request fewer rows" is sent. What happens to the DataTable (the one with TONS of rows)? It simply goes out of scope. No Dispose is ever called. That's why the large table is waiting on the finalizer queue, blocking lots of memory.

This problem could be avoided by implementing the SQL-side row counting (as described above), and also by manually calling Dispose() if the service decides it will not be returning the table to the client.


Garbage collection isn't magic. Some actions should be taken to ensure long-running applications and services don't leak memory. WinDbg is an awesome tool for uncovering memory leaks.

Planned changes for this particular application. Some of these might seem obvious in hindsight, but at the time, I just assumed the overhead wouldn't be a problem.

  • In RDL handling
    • report.ReleaseSandboxAppDomain();
    • Unsubscribe to any SubreportProcessing events
    • Clear Data Sources
    • Dispose the report
  • In GetChangeTable
    • The server should ALWAYS require the client to request a subset of rows when a resync is performed
    • The SQL procedure should be modified to take the max rows as a parameter, and simply return a message to the service if there are too many rows (rather than returning the whole table, having the service instantiate it in memory, and then count the rows)
    • If the service generates a DataTable that it decides not to return to the client, Dispose() should be called
    • When DataTable is placed into a larger object for WCF serialization, that object should be marked with IDisposable and should, on Dispose, dispose the DataTable

A Truly Random() Heisenbug

A Heisenbug is a type of bug that disappears or alters its behavior when you attempt to debug it.

A coworker came to me today with an apparent odd behavior, which he suspected was the fault of a compiler optimization bug. Basically, the code worked through nested loops to generate a derived image from a base image.

Imagine something like:

for (...) {
  do {
    p1 = baseImage.getPixel(x1, y1);
    p2 = baseImage.getPixel(x2, y2);
    ... // do stuff with p1 and p2
  } while (...) // condition on p1 and p2

  newImage.setPixel(x, y, ...) // color based on p1 and p2 result

This code worked completely correctly, but was a little slow. Getting and setting pixels via the Image API is the slow part, so he wanted to introduce a cache (basically, read all the pixels into a 2-D array upfront). This seems straightforward.

double[,] cache = ... // get all pixel intensities into 2-D array
for (...) {
  do {
    p1 = cache[x1, y1];
    p2 = cache[x2, y2];
    ... // do stuff with p1 and p2
  } while (...) // condition on p1 and p2

  newImage.setPixel(x, y, ...) // color based on p1 and p2 result

However, when run, this produced plausible but visibly incorrect output. The output made using the cache version contained “block”-like visual artifacts, while the output from the original version was smooth.

He fiddled with it for a while and then asked for advice. He told me up front that he suspected the do-while loop was either incorrectly being optimized or itself had a bug.

My initial thought, of course, was that the cache wasn’t accurate. There were three ways the cache could be inaccurate that I identified:

  1. The cache did not actually contain the correct initial values in the correct ordering
  2. An essential part of the algorithm involved modifying the base image as the work progressed, and the use of cache was disrupting this process
  3. The cache was inadvertently being modified during the process

I wanted to verify that baseImage and newImage were entirely unrelated buffers, that no writing to baseImage occurred, and no writing to cache occurred. This was all verified with code inspection (he had already done this, and I did it too, confirming his result).

Then my coworker showed me something disturbing. He simply reintroduced one call to the Image API, discarding the result:

double[,] cache = ... // get all pixel intensities into 2-D array
for (...) {
  do {
    unusedGarbage = baseImage.getPixel(x1, y1);
    p1 = cache[x1, y1];
    p2 = cache[x2, y2];
    ... // do stuff with p1 and p2
  } while (...) // condition on p1 and p2

  newImage.setPixel(x, y, ...) // color based on p1 and p2 result

This immediately resolved the problem, even though the variable “unusedGarbage” was in fact never used, and NO other changes were made. Additionally, as extra proof, he showed an assertion against both the cache and original image:

double[,] cache = ... // get all pixel intensities into 2-D array
for (...) {
  do {
    op1 = baseImage.getPixel(x1, y1);
    op2 = baseImage.getPixel(x2, y2);
    p1 = cache[x1, y1];
    p2 = cache[x2, y2];

    Debug.Assert(p1 == op1);
    Debug.Assert(p2 == op2);
    ... // do stuff with p1 and p2
  } while (...) // condition on p1 and p2

  newImage.setPixel(x, y, ...) // color based on p1 and p2 result

These Asserts succeeded completely, proving the cache and image were identical.

Based on this research, he believed that the presence of the Image API call, or its absence, changed the way the compiler optimized the loop; and that the loop optimization using the cache must be incorrect.

I was totally baffled at this point. It took a minute of just thinking:
Computers Do Not Work That Way

After clearing my head of disbelief, I decided to see what else I could come up with. Console writing each p1 and p2 was far too numerous to view, but adding them up into a sum made for a quick poor-man’s check.

double[,] cache = ... // get all pixel intensities into 2-D array
int sum = 0;
for (...) {
  do {
    p1 = cache[x1, y1]; //  baseImage.getPixel(x1, y1);
    p2 = cache[x2, y2]; // baseImage.getPixel(x2, y2);
    sum += p1;
    sum += p2;
    ... // do stuff with p1 and p2
  } while (...) // condition on p1 and p2

  newImage.setPixel(x, y, ...) // color based on p1 and p2 result

We tried this with both the cache and Image API call. In the output image, we could see the distinct difference between the two. However, the calculated sum was identical in both cases! This suggested that the actual loop was running the same and producing the same results.

At this point, I started to wonder about the setting of the pixel on the new image. The … above is an over-simplification; the code was something more like:

double[,] cache = ... // get all pixel intensities into 2-D array
for (...) {
  do {
    p1 = cache[x1, y1]; //  baseImage.getPixel(x1, y1);
    p2 = cache[x2, y2]; // baseImage.getPixel(x2, y2);
    sum += p1;
    sum += p2;
    ... // do stuff with p1 and p2
  } while (...) // condition on p1 and p2

  Random r = new Random();
  if (p1 == p2)
     newImage.setPixel(x, y, r.randomColor())
     newImage.setPixel(x, y, specificColor)

We had ignored this section of code as not relevant to the output. After all, the behavior is simply based on the output of the do-while, and if the do-while is working the same, then certainly this code must behave the same. Right?

Noting that Random should not be re-instantiated during program run, I promoted the variable to before the loops and picked a fixed Random seed so that we could compare the output without the differences in Randomness.

Random r = new Random(1); // FIXED seed
double[,] cache = ... // get all pixel intensities into 2-D array
for (...) {
  do {
    p1 = cache[x1, y1]; //  baseImage.getPixel(x1, y1);
    p2 = cache[x2, y2]; // baseImage.getPixel(x2, y2);
    sum += p1;
    sum += p2;
    ... // do stuff with p1 and p2
  } while (...) // condition on p1 and p2

  if (p1 == p2)
     newImage.setPixel(x, y, r.randomColor())
     newImage.setPixel(x, y, specificColor)

In this case, both the cache and non-cache versions output was identical! The problem appeared to go away. I removed the fixed seed, and the problem remained solved.

How so?

When a Random() instant is created in .NET, it is seeded with the system time by default. In fact, the MSDN docs warn:

The default seed value is derived from the system clock and has finite resolution. As a result, different Random objects that are created in close succession by a call to the default constructor will have identical default seed values and, therefore, will produce identical sets of random numbers.

Fair enough, but how does this explain the difference in behavior? Why was the program working correctly originally given that it was performing this re-instantiation of Random all along?

In the original version of the program, the Image API calls in the inner loop, being quite slow, were sufficient to cause the initial seed (system time) to be DIFFERENT for each instantiation of Random, thus producing a plausible Random-ish sequence. However, when the cache was used, it was much much faster, causing repeated Random instantiations with the SAME system time, thus generating a series of identical values, appearing as visual “blocks” in the output image.

.NET Decimal Precision

I was confronted by a very unusual claim by a coworker: two .NET Decimals, containing the same value, running through the same block of code, were rendering differently (one as, say, “120” and one as “120.00”). One of the big advantages of Decimal is that, unlike Float or Double, a Decimal represents an exact base-10 value whereas Float and Double are base-2, and so can only approximate many common fractional base-10 values. Therefore, I don’t normally expect problems of this kind with Decimal.

I initially assumed that the coworker was confused; some other input must be coming in, or a different section of code executing, or a different string formatting code used. However, the problem clearly reproduced. Further, it occurred even when no fractional component existed. But stripping away the extraneous, the essence of the code was fairly straightforward:

decimal v = ...;
return v.ToString();

Detailed inspection revealed that in one case, the decimal value was loaded from the user input, and in another case, it was loaded from a database table. In order to test the significance of this, I made a test case which performed both side-by-side. At first, everything seems routine:

What happened next was anything but routine:

What would cause one decimal to render with 2 decimal places and another, the exact same value, to render without them? It turns out that the .NET Decimal implementation intentionally includes the concept of significant trailing zeros, which have no impact on calculations but can carry information about the value’s precision.

Although there is no direct method to expose the precision, the details of the significant trailing zeros arrangement are discussed in the Decimal.GetBits method documentation. In this case, it is clear that the same logical value can be represented with different exponents. In the case above, we can have a value of 120 with an exponent of 0, and a value of 12000 with an exponent of 2 (the exponent “indicates the power of 10 to divide the integer number”), so 12000 * 10-2 = 120.00.

This is indeed confirmed by analysis. The first three bytes contain the value, while the exponent is defined as “Bits 16 to 23” of the fourth byte.

Decimal.GetBits(vFromUser) = [120, 0, 0, 0]
Decimal.GetBits(vFromDb) = [12000, 0, 0, 131072]
131072 >> 16 = 2

This confirms the vFromDb value is represented as 12000 * 10-2 while vFromUser is represented as 120 * 100. Although these values are logically equal, the default implementation of Decimal.ToString() outputs the value with all significant zeros, including trailing zeros.

Although the Decimal class does not expose properties for the precision nor the scale, it is possible to take advantage of the helper SqlDecimal class to access these values. In this context, precision means the total number of digits and scale means the number of digits to the right of the decimal place.

var sFromDb = new System.Data.SqlTypes.SqlDecimal(vFromDb);

For vFromDb, this outputs a precision of 5 and a scale of 2; while for vFromUser this outputs a precision of 3 and a scale of 0.

Outlook, Unicode, and Copy-Paste

It seemed straightforward: the customer told me they wanted the system to generate an email, and that they would send me the content (a parameterized one-liner). I copy/pasted it into the application source code as a string and parameterized it as specified. I tested the email, and it worked correctly.

Aside: At this point, someone will be saying the email content should have been placed as a resource or template file, and not a string in the source code. Such a claim is correct, but the email content is very short, so it wasn’t worth the extra effort. Also, it would have had no impact on this particular situation.

Having tested it myself, I commit’d the change and sent it to the user for testing. What happened next was a bit … unexpected.

Hi Steve,

It appears we are getting some special characters instead of the – for the Pending Timesheets Subject text. Not sure if we could encode this?

IMPORTANT – Timesheet Not Created – will effect pay

The customer interpreted the issue as a failure to HTML encode the content, except that:

  1. The dash doesn’t need to be HTML-encoded.
  2. I did HTML-encode the message and subject line, although the subject line doesn’t support HTML, so a failed encoding would show up as an explicit HTML entity not a random-seeming sequence of characters.

What it does look like is a Unicode representation issue. Was the dash somehow coming across as a multi-byte Unicode character to a client that didn’t support Unicode?

I tested again, and I wasn’t able to reproduce the problem, but I suspected I knew why it happened. A little investigation confirms that during typing, Outlook autoreplaces hyphen with en-dash when used in a stand-alone way.

Although the hyphen is a normal ASCII symbol (0x2D), the en-dash is represented by multi-byte Unicode code-point (0x2013, to be precise), so if anything non-Unicode compliant receives it, it will render as a sequence of characters.

When someone (the customer or one of his associates) typed the desired phrase “IMPORTANT – Timesheet Not Created … ” into Outlook, Outlook automatically replaced the hyphen with an en-dash, then I just copy/pasted that directly into my code (rather than retyping it, as Visual Studio does no such replacement).

How could I confirm my suspicion, since I couldn’t reproduce the issue myself? I used a hex editor (the 80s are back, baby!) to examine the actual content of the subject line from my source code. You can see in this hex screen the difference between what I copied from Outlook (top) and manually using the hyphen (bottom):

Hex comparison

Anyways the fix is easy: manually replace the en-dash with a regular hyphen.

I sent the email to myself, first with the original subject line (bottom) and then again with the hyphens. If you look closely, you can see that the dash looks different.

Subject line comparison

The hyphens should work fine in any situation.

What I never did figure out is … in this highly homogeneous Outlook-centric corporate environment, who was using a non-Unicode compliant email system and for what purpose?

Unit Testing: Scenario

I was forwarded the following question:

Good afternoon! I was hoping that you might have some advice or guidance on unit test coding in C#? I have read a lot on this and researched it on the internet. The internship has me doing a lot of it right now. I am still confused about how to exactly to approach it, what would be a good strategy? I believe it is similar to validation. Any help would be appreciated.

To show a little bit about how unit tests can be applied in C#, consider this actual customer request which I handled today.

We would like to modify how the “Description/Keywords” field functions as follows:

Search by “Description/Keywords”

  1. A search for 1 keyword will display all firms that have the keyword in description field (as it currently functions)
  2. A search for 2 or more keywords will display all firms that have ANY or ALL keywords in description field, depending on user selection.
  3. We will be removing ”Starts with” and ”Contains” radio buttons for this item only.

The field search used to be based on a contains/starts with option that matched entire substrings. Now the user wants to match keywords in any order. Can we implement this change with the assistance of unit testing?

Start by declaring a method signature to accomplish the task desired:

public enum SEARCH_KEYWORDS { All, Any };
bool SearchKeywords(string haystack, 
    string needle, 
    SEARCH_KEYWORDS stype) { ... }

The body of the method may simply throw a NotImplementedException, or return false.

Next, we can create some unit tests describing functionality that is desired.

For example, it should match or not a single word, regardless of any/all.

public void SearchKeywordsSingleWord()
    PublicSearchQueryBC target = new PublicSearchQueryBC();
    Assert.AreEqual(true, target.SearchKeywords("word", "word", PublicSearchQueryDTO.SEARCH_KEYWORDS.Any));
    Assert.AreEqual(true, target.SearchKeywords("word", "word", PublicSearchQueryDTO.SEARCH_KEYWORDS.All));
    Assert.AreEqual(false, target.SearchKeywords("words", "word", PublicSearchQueryDTO.SEARCH_KEYWORDS.Any));
    Assert.AreEqual(false, target.SearchKeywords("words", "word", PublicSearchQueryDTO.SEARCH_KEYWORDS.All));
    Assert.AreEqual(false, target.SearchKeywords("word", "words", PublicSearchQueryDTO.SEARCH_KEYWORDS.Any));
    Assert.AreEqual(false, target.SearchKeywords("word", "words", PublicSearchQueryDTO.SEARCH_KEYWORDS.All));

It should match or not a phrase, depending on the phrase and whether any/all is selected. (Note: you might instead wish to divide up the tests as “tests for any” and separately, “tests for all”)

public void SearchKeywordsPhrase()
    PublicSearchQueryBC target = new PublicSearchQueryBC();
    Assert.AreEqual(true, target.SearchKeywords("lazy fox jumping bridge", "fox jumping lazy bridge", PublicSearchQueryDTO.SEARCH_KEYWORDS.All));
    Assert.AreEqual(true, target.SearchKeywords("lazy fox jumping bridge", "fox jumping lazy bridge", PublicSearchQueryDTO.SEARCH_KEYWORDS.Any));
    Assert.AreEqual(false, target.SearchKeywords("lazy fox jumping bridge", "lazy fox jumping goat bridge", PublicSearchQueryDTO.SEARCH_KEYWORDS.All));
    Assert.AreEqual(true, target.SearchKeywords("lazy fox jumping bridge", "lazy fox jumping goat bridge", PublicSearchQueryDTO.SEARCH_KEYWORDS.Any));

    Assert.AreEqual(false, target.SearchKeywords("lazy fox jumping bridge", "fox jumping lazy goat bridge", PublicSearchQueryDTO.SEARCH_KEYWORDS.All));
    Assert.AreEqual(true, target.SearchKeywords("lazy fox jumping bridge", "fox jumping lazy goat bridge", PublicSearchQueryDTO.SEARCH_KEYWORDS.Any));

    Assert.AreEqual(true, target.SearchKeywords("lazy fox goat jumping bridge", "fox jumping lazy  bridge", PublicSearchQueryDTO.SEARCH_KEYWORDS.All));
    Assert.AreEqual(true, target.SearchKeywords("lazy fox goat jumping bridge", "fox jumping lazy  bridge", PublicSearchQueryDTO.SEARCH_KEYWORDS.Any));

    Assert.AreEqual(false, target.SearchKeywords("lazy fox jumping bridge", "goat rubber ducky", PublicSearchQueryDTO.SEARCH_KEYWORDS.All));
    Assert.AreEqual(false, target.SearchKeywords("lazy fox jumping bridge", "goat rubber ducky", PublicSearchQueryDTO.SEARCH_KEYWORDS.Any));


You can think about other kinds of special cases that might be encountered. Should the search be case sensitive? Probably not, so better add tests to make sure it isn’t. Should the search consider punctuation? Definitely not, since it’s keywords. What about blank spaces? Blanks should never match (that’s my opinion), so add tests for it.

public void SearchKeywordsPhraseCap()
    PublicSearchQueryBC target = new PublicSearchQueryBC();
    Assert.AreEqual(true, target.SearchKeywords("LAZY FOX jUMPING Bridge", "fox jumping lazy bridge", PublicSearchQueryDTO.SEARCH_KEYWORDS.All));
    Assert.AreEqual(true, target.SearchKeywords("LAZY FOX jUMPING Bridge", "fox jumping lazy bridge", PublicSearchQueryDTO.SEARCH_KEYWORDS.Any));

    Assert.AreEqual(false, target.SearchKeywords("LAZY FOX jUMPING Bridge", "fox jumping goat bridge", PublicSearchQueryDTO.SEARCH_KEYWORDS.All));
    Assert.AreEqual(true, target.SearchKeywords("LAZY FOX jUMPING Bridge", "fox jumping goat bridge", PublicSearchQueryDTO.SEARCH_KEYWORDS.Any));


public void SearchKeywordsPunc()
    PublicSearchQueryBC target = new PublicSearchQueryBC();
    Assert.AreEqual(true, target.SearchKeywords("lazy, fox: jumping-bridge", "fox jumping lazy bridge", PublicSearchQueryDTO.SEARCH_KEYWORDS.All));
    Assert.AreEqual(true, target.SearchKeywords("lazy, fox: jumping-bridge", "fox-jumping. lazy, bridge", PublicSearchQueryDTO.SEARCH_KEYWORDS.All));


public void SearchKeywordsBlanks()
    PublicSearchQueryBC target = new PublicSearchQueryBC();
    Assert.AreEqual(false, target.SearchKeywords("lazy fox jumping bridge", "", PublicSearchQueryDTO.SEARCH_KEYWORDS.All));
    Assert.AreEqual(false, target.SearchKeywords("lazy fox jumping bridge", "", PublicSearchQueryDTO.SEARCH_KEYWORDS.Any));

    Assert.AreEqual(false, target.SearchKeywords("", "", PublicSearchQueryDTO.SEARCH_KEYWORDS.All));
    Assert.AreEqual(false, target.SearchKeywords("", "", PublicSearchQueryDTO.SEARCH_KEYWORDS.Any));

    Assert.AreEqual(false, target.SearchKeywords("  ", "", PublicSearchQueryDTO.SEARCH_KEYWORDS.All));
    Assert.AreEqual(false, target.SearchKeywords("", "    ", PublicSearchQueryDTO.SEARCH_KEYWORDS.Any));

    Assert.AreEqual(false, target.SearchKeywords("", "lazy fox jumping bridge", PublicSearchQueryDTO.SEARCH_KEYWORDS.All));
    Assert.AreEqual(false, target.SearchKeywords("", "lazy fox jumping bridge", PublicSearchQueryDTO.SEARCH_KEYWORDS.Any));

Now run the tests. They should all (or almost, depending on the naiveness of the original implementation) fail. That’s fine. Next, go ahead and implement the method, and run the tests again.

no blank

Looks like a test is failing. Turns out my initial implementation didn’t handle blanks the way I expected. So I went ahead and fixed it, and ran the tests again.

fixed blanks

This is where the tests really start to shine. I fixed the blank issue, but, in the progress, accidentally introduced another bug! Once all the bugs are resolved, all tests should pass.

This is a good time to refactor the implementation and make it cleaner, if needed. The tests should be used to ensure functionality isn’t damaged. This is a pattern known as Red-Green-Refactor.

Let’s say you then send the implementation to QA, and a bug is reported. The first step is to add a new failing test reproducing the bug. Then, proceed to fix the bug. You’ll have confidence that the bug is fixed and no regression has occurred when all tests are again showing green.

Enforcing Single Application Instances in WPF with Named Pipes

In some cases, we may wish to prevent the user from starting multiple instances of applications. Instead, the existing (single) instance should be focused. The “single instance” part is pretty easy, but notifying the first instance to come to the front is more tricky. There are many attempts to demonstrate techniques for this around the net, but I found them difficult and unreliable in WPF. In particular, they mostly all depend on using WndProc broadcast messages which are only received when the target window is not minimized, and even then not reliably. Further, there would be no way to actually send any data (like a command line) from the new instance to the original instance.

Rather than continue to struggle with this unproductive path, I took a step back and thought about what I wanted to do: I want to send a message from one instance of an application to another. Isn’t that what named pipes are for?

I created an abstraction based on mutexs (to enforce single instancing) and named pipes (to communicate between instances) that easily allows an application to ensure only a single instance is used. The abstraction could easily be expanded to allow for passing data to the single instance.

Get it on GitHub

To use this abstraction, omit a StartupUri from App.xaml and override OnStartip in the App.xaml.cs:

public partial class App : Application
  protected override void OnStartup(StartupEventArgs e)
    // Set a unique application ID
    Guid id = Guid.Parse("DC2A927C-AC89-4512-BB29-7AB0A18DE105");

    // Instantiate an SIA
    SingleInstanceApplication sia = new SingleInstanceApplication(id);            
    // Handle the ApplicationStarts event
    // When this event fires, initialize the application
    sia.ApplicationStarts += (sender, earg) =>
      var mw = new MainWindow();
      // If another instance attempts to start, 
      // bring our window to the front
      sia.AnotherInstanceAttemptsToStart += 
    // Optionally handle AnotherInstanceAttemptsToStart, for example, 
    // to log other attempts
    sia.AnotherInstanceAttemptsToStart += (sender, earg) =>
      Logger.LogInfo("### Captured another instance trying to start");
    // Run the application and single instance protection

Monitoring for File Changes

The mission seems simple: In a desktop application, start an external app (such as Word, Excel, Acrobat, etc) to view or edit a certain file, and wait for the application to exit or modify the file, and then load the modified file back into the application.

This turned out to be more complicated than expected.

Attempt 1: Process WaitForExit

The initial, obvious solution appears to be to start the file using Process.Start, which returns a Process object. In turn, the Process object has a WaitForExit method. However, this idea ran into a significant impediment: Both Windows and applications re-use processes. In some cases, Process.Start returns null. Some proposed solutions recommend attaching to the existing process, but, this requires knowing what specific process it is and then is still dependent on that process itself not being further reused.

Even in those cases where Process.Start returned a value, sometimes the process exited almost immediately when it found another instance of the correct application running and simply handed the file off to it. Thus, the WaitForExit method call would complete very quickly even though the application was still in use.

Attempt 2: Wait for Application Window Re-Activation

Since file editing is expected to open in another window, and continue until that file is closed, it makes some sense to attach an Activated event listener to the calling application’s window. When the event is called, the handler can be de-registered and the application may assume work on the file has completed.

This, however, also doesn’t work, as some applications go through phases on start up and cause the activated event to be fired.

I didn’t pursue further “hacks” (such as waiting a few seconds before enabling the activated event) as the whole idea seems pretty unreliable, since other applications are not modal.

New Approach

Ok, forget the idea of waiting for an application to close. Instead, continue to monitor the file in the background watching for changes. If any changes are detected, load the file back into the application.

Attempt 3: FileSystemWatcher

.NET has a class apparently just for this kind of purpose, the FileSystemWatcher. Provide a directory and/or filename to the watcher, and it will trigger an event when anything happens to the file.

Seems easy and fun, but there’s a few problems. It can be difficult to use, and when notifications come in, sometimes the file would still be locked by the editing application, causing exceptions when the application tried to load the file back in. In order to handle this, a timer and queue must be manually created. Even worse, the FileSystemWatcher sometimes fails to notify of changes all together.

There are various projects which wrap or extend the FileSystemWatcher to make it usable. Some of these might be sufficient to address these issues.

[My] Solution: FileChangeMonitor

Given that significant custom infrastructure was needed regardless (timer, queue), I decided to complete a custom file change monitoring solution. The monitoring system allows individual files to be added to an internal list. Using a timer, the list is monitored for file changes using a hash. If a change is detected in a readable file (not locked), one event per file is generated.

Additionally, the class supports an arbitrary data “tag” to accompany each file which will be provided when a change event occurs, and it also supports a synchronizing object for performing change events on the UI thread, if desired.

Complete Class Source File

The critical signatures are:

/// <summary>
/// When a change occurs, the event call will be synchronized to this 
/// object's thread, if given.
/// </summary>
public ISynchronizeInvoke SynchronizingObject { get; set; }

/// <summary>
/// Fired when one of the monitored files changes content
/// </summary>
public event EventHandler<FileChangedEventArgs> FileChanged;

/// <summary>
/// Adds this file to be monitored for changes.  If the file is already 
/// being monitored, does nothing.
/// If data is specifed, that object is stored and will be provided in 
/// the event args when the file is updated.
/// </summary>
public void Add(string filename, object data = null);

/// <summary>
/// Remove this file from monitoring
/// </summary>        
public void Remove(string filename);

/// <summary>
/// Remove all files from monitorings
/// </summary>
public void Clear();

Console.WriteLine vs System.Diagnostics.Debug.WriteLine considerations

Using console output to debug or trace programs is a common technique for debugging and analysis. In desktop apps (such as WPF), the developer can use Console.WriteLine or System.Diagnostics.Debug.WriteLine to generate the output to the debug window. Since they both perform the same result, it might seem like they are interchangable. Not at all!


Consider the following test case: One thousand entries will be written, either with Console.WriteLine or Debug.WriteLine. The amount of time the thousand lines takes will be measured. All times are in milliseconds.

Scenario Console.WriteLine time Debug.WriteLine time
Debug, in Visual Studio 3,229 280
Release, outside Visual Studio 1 0

In both cases, outside of Visual Studio in release mode, there is almost no performance impact for using either operation. Within Visual Studio in debug mode, we see that Console.WriteLine is much slower than Debug.WriteLine, by a little more than an order of magnitude.

The reason that Debug.WriteLine is 0 milliseconds in this test is that Debug.WriteLine is actually removed by the compiler in release mode, and therefore, the compiler was also able to determine the 1000-iteration for loop was now no-op and remove it as well.


Console.WriteLine is difficult to extend (for example, to send the messages to a database or multiple outputs).

Debug.WriteLine can be extended easily by adding Listeners which can handle, log, or transmit debug messages.

One Case For Console.WriteLine

Higher-order functions.

Consider the case of a method that takes an Action (which seems to match the signature or either Console or Debug WriteLine methods).

Specifically, consider the method:

void DoOutput(Action<string> outputFunc) { ... }

Passing in Console.WriteLine is no problem for this method. However, try to pass in Debug.WriteLine:


Since the compiler removes the Debug.WriteLine call in release mode, there will be no value to provide for this parameter.

This is easily worked around by wrapping the Debug.WriteLine call in a lambda:

DoOutput(x => { System.Diagnostics.Debug.WriteLine(x); });

The difference here is that in both debug and release mode, there is a method to pass as the argument to DoOutput. In debug, that method will call Debug.WriteLine. In release, that method will be empty (like writing x => { }).

Is the performance and capability improvement of Debug.WriteLine worth the occasional need to lambda-wrap it? Probably.


Leave Console.WriteLine in console applications where it belongs, and use Debug.WriteLine everywhere else.

Generating compiler warnings on NotImplementedException

In .NET, whenever an interface is generated, the method stubs are filled in with “NotImplementedException”s. We might also use these exceptions elsewhere, for example, constructing pre-test class methods, or deferring implementation of some functionality. For actual circumstances where some behavior cannot be performed, use NotSupportedException instead.

It’s easy to do a search for NotImplementedException, or use the VS Code Analysis tool to detect uses, but these both require a special effort, and NotImplementedExceptions may be lost or accidentally slip into production builds.

To avert this, here is an easy way to cause the compiler to generate a warning everytime it encounters a NotImplementedException. Add this code to a file anywhere in your project.

namespace System
    [Obsolete("Replace NotImplementedException with NotSupportedException, or implement")]
    public class NotImplementedException : Exception
        public NotImplementedException() : base() { }
        public NotImplementedException(string message) : base(message) { }

We use the System namespace to “override” the default implementation of NotImplementedException. We use the Obsolete attribute because it generates a compiler warning automatically. In other words, these are bad things to do in general, but serve a pragmatic purpose here.

Unfortunately, we cannot make partial classes across assemblies, so there will be an additional warning that the NotImplementedException defined here conflicts with the mscorlib.dll’s implementation. However, since the overall goal is to generate warnings, this is acceptable. These warnings will disappear once all NotImplementedException references are removed.

Every case where a NotImplementedException is used, whether by an auto-generated stub or some other reason, will produce this compiler warning:

Warning	27	'System.NotImplementedException' is obsolete: 'Replace NotImplementedException with NotSupportedException, or implement'

In addition, when in a code file using NotImplementedException, a warning marker will be shown. Mousing over suggests to change the exception.


Implementing the feature, or changing the exception to NotSupportedException, removes the warning.


In the official MSDN documentation for .NET DataTable, it says: Do not throw a NullReferenceException within the RowChanged event handler. If a NullReferenceException is thrown within the RowChanged event of a DataTable, then the DataTable will be corrupted. I found, … Continue reading