.NET Decimal Precision

I was confronted by a very unusual claim by a coworker: two .NET Decimals, containing the same value, running through the same block of code, were rendering differently (one as, say, “120” and one as “120.00”). One of the big advantages of Decimal is that, unlike Float or Double, a Decimal represents an exact base-10 value whereas Float and Double are base-2, and so can only approximate many common fractional base-10 values. Therefore, I don’t normally expect problems of this kind with Decimal.

I initially assumed that the coworker was confused; some other input must be coming in, or a different section of code executing, or a different string formatting code used. However, the problem clearly reproduced. Further, it occurred even when no fractional component existed. But stripping away the extraneous, the essence of the code was fairly straightforward:

decimal v = ...;
return v.ToString();

Detailed inspection revealed that in one case, the decimal value was loaded from the user input, and in another case, it was loaded from a database table. In order to test the significance of this, I made a test case which performed both side-by-side. At first, everything seems routine:
decimals

What happened next was anything but routine:
decimals2

What would cause one decimal to render with 2 decimal places and another, the exact same value, to render without them? It turns out that the .NET Decimal implementation intentionally includes the concept of significant trailing zeros, which have no impact on calculations but can carry information about the value’s precision.

Although there is no direct method to expose the precision, the details of the significant trailing zeros arrangement are discussed in the Decimal.GetBits method documentation. In this case, it is clear that the same logical value can be represented with different exponents. In the case above, we can have a value of 120 with an exponent of 0, and a value of 12000 with an exponent of 2 (the exponent “indicates the power of 10 to divide the integer number”), so 12000 * 10-2 = 120.00.

This is indeed confirmed by analysis. The first three bytes contain the value, while the exponent is defined as “Bits 16 to 23” of the fourth byte.

Decimal.GetBits(vFromUser) = [120, 0, 0, 0]
Decimal.GetBits(vFromDb) = [12000, 0, 0, 131072]
131072 >> 16 = 2

This confirms the vFromDb value is represented as 12000 * 10-2 while vFromUser is represented as 120 * 100. Although these values are logically equal, the default implementation of Decimal.ToString() outputs the value with all significant zeros, including trailing zeros.

Although the Decimal class does not expose properties for the precision nor the scale, it is possible to take advantage of the helper SqlDecimal class to access these values. In this context, precision means the total number of digits and scale means the number of digits to the right of the decimal place.

var sFromDb = new System.Data.SqlTypes.SqlDecimal(vFromDb);
Console.WriteLine(sFromDb.Precision);
Console.WriteLine(sFromDb.Scale);

For vFromDb, this outputs a precision of 5 and a scale of 2; while for vFromUser this outputs a precision of 3 and a scale of 0.

Outlook, Unicode, and Copy-Paste

It seemed straightforward: the customer told me they wanted the system to generate an email, and that they would send me the content (a parameterized one-liner). I copy/pasted it into the application source code as a string and parameterized it as specified. I tested the email, and it worked correctly.

Aside: At this point, someone will be saying the email content should have been placed as a resource or template file, and not a string in the source code. Such a claim is correct, but the email content is very short, so it wasn’t worth the extra effort. Also, it would have had no impact on this particular situation.

Having tested it myself, I commit’d the change and sent it to the user for testing. What happened next was a bit … unexpected.

Hi Steve,

It appears we are getting some special characters instead of the – for the Pending Timesheets Subject text. Not sure if we could encode this?

IMPORTANT – Timesheet Not Created – will effect pay

The customer interpreted the issue as a failure to HTML encode the content, except that:

  1. The dash doesn’t need to be HTML-encoded.
  2. I did HTML-encode the message and subject line, although the subject line doesn’t support HTML, so a failed encoding would show up as an explicit HTML entity not a random-seeming sequence of characters.

What it does look like is a Unicode representation issue. Was the dash somehow coming across as a multi-byte Unicode character to a client that didn’t support Unicode?

I tested again, and I wasn’t able to reproduce the problem, but I suspected I knew why it happened. A little investigation confirms that during typing, Outlook autoreplaces hyphen with en-dash when used in a stand-alone way.

Although the hyphen is a normal ASCII symbol (0x2D), the en-dash is represented by multi-byte Unicode code-point (0x2013, to be precise), so if anything non-Unicode compliant receives it, it will render as a sequence of characters.

When someone (the customer or one of his associates) typed the desired phrase “IMPORTANT – Timesheet Not Created … ” into Outlook, Outlook automatically replaced the hyphen with an en-dash, then I just copy/pasted that directly into my code (rather than retyping it, as Visual Studio does no such replacement).

How could I confirm my suspicion, since I couldn’t reproduce the issue myself? I used a hex editor (the 80s are back, baby!) to examine the actual content of the subject line from my source code. You can see in this hex screen the difference between what I copied from Outlook (top) and manually using the hyphen (bottom):

Hex comparison

Anyways the fix is easy: manually replace the en-dash with a regular hyphen.

I sent the email to myself, first with the original subject line (bottom) and then again with the hyphens. If you look closely, you can see that the dash looks different.

Subject line comparison

The hyphens should work fine in any situation.

What I never did figure out is … in this highly homogeneous Outlook-centric corporate environment, who was using a non-Unicode compliant email system and for what purpose?