From C# in Depth, Fourth Edition by Jon Skeet

This article gives a quick recap on string formatting in .NET before going into depth on using the new interpolated string literal feature of C# 6.


Save 37% on C# in Depth, Fourth Edition. Just enter code fccskeet into the discount code box at checkout at manning.com.


A recap on string formatting in .NET

You may well have been using strings for many years – almost certainly as long as you’ve used C#. Still, to understand how the new interpolated string literal feature in C# 6 works, it’s best to have all that knowledge uppermost in your mind. Please bear with me as we go over the basics of how .NET handles string formatting. I promise we’ll get onto the new stuff soon.

Simple string formatting

If you’re like me, you like experimenting with new languages by writing trivial console applications which do nothing useful, but give the confidence and firm foundation to move on to more impressive feats.

As such, I can’t remember how many languages I’ve used to implement the functionality shown below: ask the user their name, and then say hello to them.

  Console.Write("What's your name? "); string name = Console.ReadLine(); Console.WriteLine("Hello, {0}!", name);  

The last line is the most relevant one for this article. It uses an overload of Console.WriteLine that accepts a composite format string which includes format items, and then arguments to replace those format items. In the example above, there’s one format item – {0} – which is replaced by the value of the name variable. The number in the format item specifies the index of the argument you want to fill the “hole” (where 0 represents the first of the values, 1 represents the second, and so on).

This pattern is used in various APIs, and the canonical example is the static Format method in the string class, which does nothing but format the string appropriately.

Custom formatting with format strings

To be clear, my motivation for including this in the article is as much for my future self as for you, dear reader. If MSDN displayed the number of times I’ve visited any given page, the number for the page on composite format strings would be frightening. I keep forgetting exactly what goes where and what terms to use, and I figure if I’ve always got it to hand in hard copy, maybe I’ll start remembering it better. I hope you find it helpful in the same way.

Each format item in a composite format string specifies the index of the argument to be formatted, but it can also specify more options for how to format the value:

  • An alignment, which specifies a minimum width and whether the value should be left or right aligned. Right-alignment is indicated by a positive value; left-alignment is indicated by a negative value.
  • A format string for the value. This is probably used most often for date and time values or numbers. For example, to format a date according to ISO-8601, you could use a format string of yyyy-MM-dd. To format a number as a currency value, you could use a format string of C. The meaning of the format string depends on the type of value being formatted, and you need to look up the relevant documentation to choose the right format string.

Figure 1 shows all the parts of a composite format string you could use to display a price.


Figure 1 A composite format string with a format item to display a price


The alignment and the format string are independently optional: you can specify either, both, or neither. A comma in the format item indicates an alignment, and a colon indicates a format string. If you need a comma in the format string, it’s fine; there’s no concept of a second alignment value.

As a concrete example to expand on later, let’s use see the code from figure 1 in a broader context, showing different lengths of results to demonstrate the point of alignment. Listing 1 displays a price ($95.25), tip ($19.05), and total ($114.30), lining up the labels on the left and the values on the right.

The output – on a machine in the US English culture – would look like this:

  Price:    $95.25 Tip:      $19.05 Total:   $114.30  

To make the values line up, right-aligned (or left-padded with spaces, to look at it the other way round) the code uses an alignment value of 9. If we had a huge bill (a million dollars, for example) the alignment would have no effect; it only specifies a minimum width. If you wanted to write code which right-aligned every possible set of values, you would have to work out how wide the biggest one would be first. It’s unpleasant code, and I’m afraid nothing in C# 6 makes it easier.


Listing 1 Displaying a price, tip and total with values aligned

  decimal price = 95.25m; decimal tip = price * 0.2m;                        Console.WriteLine("Price: {0,9:C}", price); Console.WriteLine("Tip:   {0,9:C}", tip); Console.WriteLine("Total: {0,9:C}", price + tip);  

❶   20% tip


When I said it would be the output of listing 1 on a machine in the US English culture, the part about the culture was important. On a machine using a UK English culture, it’d use £ signs instead. On a machine in the French culture, the decimal separator would become a comma, the currency sign would become a Euro symbol, and it’d be at the end of the string instead of the start! Such are the joys of localization, which we’ll look at next.


Localization

In broad terms, localization[1] is the task of making sure your code does the right thing for all users, no matter where they are in the world. Anyone who claims that localization is simple is either much more experienced at it than I, or they haven’t done enough of it to see how painful it can be. For a round world, it certainly seems to have a lot of nasty corner cases to handle. Localization is a pain in all programming languages, but each has a slightly different way of addressing the problems.

 

In .NET, the most important type to know about for localization purposes is CultureInfo. This is responsible for the cultural preferences of a language (such as English), or a language in a location (such as “French in Canada”) or a variant of a language in a location (such as “simplified Chinese as used in Taiwan”). These cultural preferences include various translations (the words used for the days of the week, for example), how text is sorted, how numbers are formatted (whether to use a period or comma as the decimal separator) and much more.

Often you won’t see CultureInfo in a method signature, but instead the IFormatProvider interface, which CultureInfo implements. Most formatting methods have overloads with an IFormatProvider as the first parameter, before the format string itself. For example, consider these two signatures from string.Format:

 

  static string Format(IFormatProvider provider, string format, params object[] args) static string Format(string format, params object[] args)  

Usually if you provide overloads which differ only by a single parameter, that parameter’s the last one… and you might expect the provider parameter to come after args. That wouldn’t work because args is a parameter array (it uses the params modifier).

Even though the parameter is of type IFormatProvider, the value you pass in as an argument is almost always a CultureInfo. For example, if you want to format my date of birth for US English—“June 19, 1976”—you could use this code:

 

  var usEnglish = CultureInfo.GetCultureInfo("en-US"); var birthDate = new DateTime(1976, 6, 19); string formatted = string.Format(usEnglish, "Jon was born on {0:d}", birthDate);  

 

Here d is the standard date/time format specifier for “short date”, which in US English corresponds to “month/day/year”. My date of birth would be formatted as “6/19/1976” for example. In British English, the short date format is “day/month/year”, and the same date would be formatted as “19/06/1976”. Notice how it’s not only the ordering which is different: the month is 0-padded to two digits in the British formatting too.

Other cultures can use entirely different formatting – it can be instructive to see how different they are. For example, you could format the same date in every culture .NET knows about like this:

 

  var cultures = CultureInfo.GetCultures(CultureTypes.AllCultures); var birthDate = new DateTime(1976, 6, 19); foreach (var culture in cultures) {     string text = string.Format(culture, "{0,-15} {1,12:d}", culture.Name, birthDate);     Console.WriteLine(text); }  

 

The output for Thailand shows that I was born in 2519 in the Thai Buddhist calendar, and the output for Afghanistan shows that I was born in 1355 in the Islamic calendar. This example also shows a negative alignment value to left-align the culture name, as it keeps the date right-aligned.

 

Formatting with the default culture

If you don’t specify a format provider, or if you pass null as the argument corresponding to an IFormatProvider parameter, CultureInfo.CurrentCulture is used as a default. What that means depends on your context: it can be set on a per thread basis, and some web frameworks set it before processing a request on a thread.

All I can advise about using the default is to be careful: make sure you know that the value in your specific thread is appropriate. It’s particularly worth checking the exact behavior if you start parallelizing operations across multiple threads, for example. If you don’t want to rely on the default culture, you’ll need to know the culture of the end-user you need to format the text for, and do it explicitly.

Formatting for machines

We’ve assumed that you’re trying to format the text for an end-user. That’s often not the case. For machine-to-machine communication (such as in URL query parameters to be parsed by a web service) you should use the invariant culture, which is obtained via the static CultureInfo.InvariantCulture property.

For example, suppose you were using a web service to fetch the list of best sellers from a publisher. The web service might use a URL of https://manning.com/webservices/bestsellers, but allow a query parameter called date to allow you to find out the best-selling books on a date[2]. I’d expect that query parameter to use an ISO-8601 format (year-first, using dashes between the year, month and day) for the date. For example, if you want to retrieve the best-selling books as of the start of March 20th 2017, you want to use a URL of https://manning.com/webservices/bestsellers?date=2017-03-20. To construct that URL in code, in an application allowing the user to pick a specific date, you might write something like this:

 

  string url = string.Format(     CultureInfo.InvariantCulture,     "{0}?date={1:yyyy-MM-dd}",     webServiceBaseUrl,     searchDate);  

 

Most of the time, you shouldn’t be directly formatting data for machine-to-machine communication. I advise you to avoid string conversions wherever you can; they’re often a code smell which shows you’re either not using a library or framework properly or you have data design issues (such as storing dates in a database as text instead of as a “native” date/time type). Having said that, you may well find yourself building strings manually like this more often than you’d like; pay attention to which culture you should be using.

Okay, that was a long introduction. But with all this formatting information buzzing around your brain, and somewhat-ugly examples niggling at you, you’re in the right frame of mind to welcome interpolated string literals in C# 6. All those calls to string.Format look unnecessarily long-winded, and it’s annoying having to look between the format string and the argument list to see what goes where. Surely, we can make our code clearer than that…


Introducing interpolated string literals

Interpolated string literals in C# 6 allow you to format values in a much simpler way. The concepts of a format string and arguments still apply, but with interpolated string literals you specify the values and their formatting information inline, leading to code which is much easier to read. If you look through your code and find a lot of calls to string.Format using hard-coded format strings, you’ll love interpolated string literals.

String interpolation isn’t a new idea. It’s been in many programming languages for a long time, but I’ve personally never felt it was as neatly integrated as it is in C#. It’s particularly remarkable when you consider that adding a feature into a mature language is harder than building it into the first version.

Now we’ll look at some examples before exploring interpolated verbatim string literals, how localization can be applied using FormattableString, and a closer look at how the compiler handles interpolated string literals.

Simple interpolation

The simplest way to demonstrate interpolated string literals in C# 6 is to show you the equivalent to the first example from earlier, where we asked the user for the name. The code doesn’t look hugely different – in particular, only the last line has changed at all:

 

C# 5 – old-style style formatting

C# 6 – interpolated string literal

  Console.Write("What's your name? "); string name = Console.ReadLine(); Console.WriteLine("Hello, {0}!", name);  
  Console.Write("What's your name? "); string name = Console.ReadLine(); Console.WriteLine($"Hello, {name}!");  

The interpolated string literal is shown in bold. It starts with a $ before the opening double quote – this makes it an interpolated string literal rather than a regular one as far as the compiler is concerned. It contains {name} instead of {0} for the format item. The text in the braces is an expression which is evaluated and formatted within the string. As we’ve now provided all the information we need, the second argument to WriteLine isn’t required any more.

Not quite equivalent…

As with expression-bodied members, this doesn’t look like a huge improvement. For a single format item, there’s not a lot to be confused by in the original code. The first couple of times you see this it might even take you a little longer to read an interpolated string literal than a string formatting call. I was skeptical about how much I’d like them… but now I often find myself converting pieces of old code to use them almost automatically, and I find the readability improvement is often significant.

tip: think about your commit history

It’s easy to get carried away when updating old code to take advantage of new features. You’re diving into a bug, and suddenly you find a whole file which feels like it was written in 2005. At that point, you should decide: are you going to fix the bug first then update the code, or the other way around? Try to keep the two activities separate, to create a clear purpose for each in your commit history. It’s like the single responsibility principle, but applied to version control.

Now that we’ve seen the simplest example, let’s do something a bit more complex. We’ll follow the same sequence as before, first looking at controlling the formatting of values more carefully, and then considering localization.

Format strings in interpolated string literals

Good news! There’s nothing new to learn here. If you want to provide an alignment or a format string with an interpolated string literal, you do it the same way as with a normal composite format string: you add a comma before the alignment, and a colon before the format string.

Our earlier composite formatting example changes in the obvious way, as shown in listing 2.


Listing 2 Aligned values using interpolated string literals

  decimal price = 95.25m; decimal tip = price * 0.2m;            Console.WriteLine($"Price: {price,9:C}"); Console.WriteLine($"Tip:   {tip,9:C}"); Console.WriteLine($"Total: {price + tip,9:C}");  

❶  20% tip


Note how in the last line, the interpolated string doesn’t contain a variable for the argument – it performs the addition of the tip to the price. The expression can be any expression that computes a value. (You can’t call a method with a void return type, for example.) If the type of the expression isn’t already string, then either the ToString method inherited from System.Object is called, or the IFormattable.ToString method is called if the execution-time the value implements the IFormattable interface.

Interpolated verbatim string literals

You’ve no doubt seen verbatim string literals before: they start with @ before the double quote. Within a verbatim string literal, backslashes and line breaks are included in the string. For example, in the verbatim string literal @"c:\Windows" the backslash is a backslash – it isn’t the start of an escape sequence. The literal is only terminated by a double quote[3]. Verbatim string literals are typically used for:

  • Strings breaking over multiple lines[4]
  • Regular expressions (which use backslashes for escaping, quite separate from the escaping the C# compiler uses in regular string literals)
  • Hard-coded Windows file names

A quick example of each of these:

 

  string sql = @"                                        SELECT City, ZipCode                                 FROM Address                                         WHERE Country = 'US'";                             Regex lettersDotDigits = new Regex(@"[a-z]+\.\d+");  string file = @"c:\users\skeet\Test\Test.cs"          

 SQL is easier to read when split over multiple lines

❷   Backslashes in are common in regular expressions

❸   Windows file name

 

Verbatim string literals can be interpolated as well – you put a $ in front of them, like you would to interpolate a regular string literal. Our earlier multi-line output could be written using a single interpolated verbatim string literal using the code in listing 3.


Listing 3 Aligned values using a single interpolated verbatim string literal

  decimal price = 95.25m; decimal tip = price * 0.2m;                          Console.WriteLine($@"Price: {price,9:C} Tip:   {tip,9:C} Total: {price + tip,9:C}");  

❶  20% tip


I probably wouldn’t do this, personally – it’s just not as clean as using three separate statements. I’m only using it here as a simple example of what’s possible. Consider it for places where you’re already using verbatim string literals sensibly.

Note The order of the symbols matters. $@"Text" is a valid interpolated verbatim string literal, but @$"Text" isn’t. I admit I haven’t found a good mnemonic device to remember this –try it whichever way you think is right, and change it if the compiler complains!

This is convenient, but I’ve only shown the surface level of what’s going on.

Compiler handling of interpolated string literals (part 1)

The compiler transformation here is simple. It converts the interpolated string literal into a call to string.Format; it extracts the expressions from the format items and passes them as arguments after the composite format string. The expression is replaced with the appropriate index – and the first format item becomes {0}, the second becomes {1} and so on.

To make this clearer, let’s consider a trivial example – this time separating the formatting from the output, for clarity.

  int x = 10; int y = 20; string text = $"x={x}, y={y}"; Console.WriteLine(text);  

This is handled by the compiler as if you had written the following code instead:

  int x = 10; int y = 20; string text = string.Format("x={0}, y={1}", x, y); Console.WriteLine(text);  

The transformation is that simple. If you want to go deeper and verify it for yourself, you could use a tool such as ildasm to look at the IL that the compiler generated.

One side-effect of this transformation is that unlike regular or verbatim string literals, interpolated string literals don’t count as constant expressions. While there are cases where the compiler could reasonably consider them to be constant (if they don’t have any format items, or if all the format items are string constants without any alignment or format strings) these are corner cases which complicate the language for little benefit.

That’s all for this article.


For more, check out the whole book on liveBookhere and see this Slideshare presentation for more info.


[1] Or globalization. Microsoft uses the two terms in a slightly different way to other industry bodies, and the difference is subtle. Experts, please forgive the hand-waving here: the big picture is more important than the fine details of terminology this once.

[2] This is a fictional web service as far as I’m aware.

[3] If you want to include a double quote in the verbatim string literal, you must double it. It gets hard to read quickly.

[4] Be careful about which exact characters end up in your string though. While the difference between “carriage-return” and “carriage-return line-feed” separators is irrelevant in most code, it’s significant in verbatim string literals.