Data types

Value and reference types

There are two types of memory the .NET Framework uses to store data - the Stack and the Managed Heap.

The Stack

The Stack is used for saving data of simple variables such as a number, letter or a logical state (true/false).

The Stack uses a FILO (First In Last Out) method. This means that the first variable stored on the Stack is the last one to be released. In other words, once a current method finishes, all of its variables can be safely released from the memory.

The allocation and release of memory is handled directly by the CPU. This gives the Stack quite a good performance.

There is unfortunately one limitation with the Stack. In order to save a new variable on the stack, the .NET Framework needs to know exactly how much space it needs to allocate for it. There is no way to expand the space once it has been allocated. All variables stored on the Stack therefore share one limitation – a predefined maximum range of its contents.

For example an integer is a variable that can store an integral number. Once you declare a new integer variable, the .NET Framework allocates exactly 32 bits of memory on the Stack for its contents:

int number = 5;

Inside the number variable we chose to store the number 5. 32 bits for an integral number translates to a range from -2,147,483,648 to 2,147,483,647. These are all the numbers that we could have saved as the content of the new variable. Anything greater would not fit into the pre-allocated space.

The Stack

An integer is just one of many types that can be stored on the Stack. There are many more, all commonly known as Value types.

The following table lists all of the C# Value types that are used to store integral numbers.

Shortcut.NET typeRange
sbyte System.SByte -128 -> 127
byte System.Byte 0 -> 255
short System.Int16 -32,768 -> 32,767
ushort System.UInt16 0 -> 65,535
int System.Int32 -2,147,483,648 -> 2,147,483,647
uint System.UInt32 0 -> 4,294,967,295
long System.Int64 -9,223,372,036,854,775,808 -> 9,223,372,036,854,775,807
unlong System.UInt64 0 -> 18,446,744,073,709,551,615
char System.Char Unicode 16-bit character (U+0000 -> U+FFFF)

The table shows the C# keyword used to declare the variable as well as its corresponding .NET Framework type. For variable declaration you can use the C# keyword and the .NET Framework type interchangeably. In fact, the C# keyword is just a shortcut that gets translated to the whole .NET Framework type during the compilation. For their simplicity, developers usually use these shortcuts, rather than spelling out the whole .NET Framework type.

The last column of the table also shows what exactly you can store within each of these variable types. This all comes from the amount of space located on the Stack.

Similarly, there are several types that can be used for storing numbers with a floating decimal point. The following table lists them again by their C# keyword, .NET Framework type and adds their range.

Shortcut.NET typeRangePrecision
float System.Single ±1.5e-45 to ±3.4e38 7 digits
double System.Double ±5.0e-324 to ±1.7e308 15 - 16 digits
decimal System.Decimal 2 x size of double Optimized for financials

Since all of the data inside computer memory are stored as zeros and ones, to achieve such a large range of float and double, some sacrifices had to be made to the precision of these numbers. This is true for basically all other languages and you can find more details elsewhere. Wikipedia, for example, has nice articles for both, float as well as double.

As the range for float and double are quite large, yet their precision is limited, they are best to be used for storing numbers from physics, chemistry or anything else coming from the nature. These numbers usually do not require such a precision. There is hardly anything measuring 0.25 EXACTELY on any scale in the nature.

For financials and other numbers requiring an exact precision on our human made decimal scales, a separate decimal type has been created. The space allocated for decimal in the memory is two times the space allocated for double. It also has a different structure inside, allowing it to reserve more space for precision. Use this type for any human made numerical data such as prices, salaries, stock indexes, and everything else that has a finite number of decimals.

There are still a couple more Value types we have not touched on. The following table shows two of them.

Shortcut.NET typeRange
bool System.Boolean true, false
DateTime System.DateTime 1. 1. 0001 (AM) -> 31. 12. 9999 (PM)

There are even more Value types in the .NET Framework. You can also create your own types. Anything declared by “struct” or “enum” will be treated as a Value type.

The Managed Heap

The Managed Heap still has the same limitation of the Stack, but it can get away with allocating as much memory as the computer allows. This way it can easily store any data without knowing its size beforehand – it simply reserves a little bit more space – just in case. This makes it perfect for storing variables of undetermined size, such as objects, collections of objects, strings, etc.

While all the data gets stored on the Managed Heap, there is also a corresponding reference saved on the Stack. The purpose of the reference is to point to the data on the Managed Heap.

The following example shows the creation of a new customer object on the Managed Heap and saving an object reference on the Stack in a form of a ”customer” variable.

object customer = new Customer("John", 30);

It is possible to keep more references to the same data on the Managed Heap. The following example creates another reference variable, called “cust”, on the Stack. The reference points to exactly the same customer object:

object cust = customer;

The Stack and the Managed Heap

Once a memory space has been allocated on the Managed Heap, it stays allocated until it is explicitly freed. Unlike in C/C++, the .NET Framework includes a tool for automatic memory management – the Garbage Collector. The Garbage Collector automatically scans the Managed Heap and tries to find data that already has no reference pointing to it. Developers in the .NET Framework therefore do not need to care about releasing the memory on the Managed Heap by themselves. The only thing they need to remember is to make sure a reference gets cleared once the data on the Managed Heap is no longer required.

The reference itself gets cleared automatically when it runs out of scope - a method where it has been declared as a local variable finishes. It can also be cleared manually by assigning null to the reference variable itself:

customer = null;
cust = null;

Since all the variables that get stored on the Managed Heap require a reference, they are called Reference types.

The .NET Framework types System.Object (C# shortcut “object”) and System.String (C# shortcut “string”) are perhaps the most commonly used Reference types.

There are again many more Reference types in the .NET Framework. Basically anything that is declared by the keyword “class”, “interface” or “delegate” is a Reference type. We will see how to create our own classes later.

Arrays (structures built by items of a same datatype organized in rows and columns) are also Reference types in the .NET Framework. This is true even if the array itself is built by items of a Value type. We will touch on arrays later.

String – the hybrid

Before we leave the topic of Value and Reference types, let us take a closer look on the string type. We already know the string is a Reference type. There is no way to say how long a text we will try to save inside it. The text itself than gets stored on the Managed Heap and a reference is placed on the Stack.

We learned that we can have more than one reference pointing to the same data on the Managed Heap. The following example should therefore create two references to the same string data:

string text1 = "First text";
string text2 = text1;

This however is not true!

The string data is indeed really stored on the Managed Heap but unlike any other reference types, once we create another reference to its data, the data itself gets automatically duplicated on the Managed Heap and a new reference is made to point to this duplicate. This makes the string behave just as it was a Value type.

Working with Value and Reference types

A picture is worth 1000 words and an example is worth even more (in fact we already have something more than 1300 words typed here on the difference between Value and Reference types). Let us see the difference in using Value and Reference types in code.

The first example works with a Value type – integer:

int number1 = 1;
int number2 = number1;

number2 = 2;

Console.WriteLine("Number1: {0}", number1);
Console.WriteLine("Number2: {0}", number2);

Console.ReadLine();

The example creates two integers on the Stack. Integer number1 keeps its original value of 1. Integer number2 gets its initial value 1 changed to value 2. The console writes the following output:

Number1: 1
Number2: 2

For the second example we will use a Reference type – .NET Framework object for working with strings, the StringBuilder:

StringBuilder stringBuilder1 = new StringBuilder();
stringBuilder1.AppendLine("First line in StringBuilder");

StringBuilder stringBuilder2 = stringBuilder1;
stringBuilder2.AppendLine("Second line in StringBuilder");

Console.WriteLine("StringBuilder1");
Console.WriteLine(stringBuilder1);

Console.WriteLine();

Console.WriteLine("StringBuilder2");
Console.WriteLine(stringBuilder2);

Console.ReadLine();

The example first creates a new instance of the StringBuilder class. The instance itself is stored on the Managed Heap and a reference “stringBuilder1” is stored on the Stack. The AppendLine method adds a new string line to the object data: “First line in StringBuilder”.

A new reference “stringBuilder2”, for the same StringBuilder object, is then created. The AppendLine method gets called again through this new reference to add yet another new string line to the object data: “Second line in StringBuilder”.

The console writes the following output:

StringBuilder1
First line in StringBuilder
Second line in StringBuilder

StringBuilder2
First line in StringBuilder
Second line in StringBuilder

You can see that even though this example is very similar to the first one, with reference types we were always working with just one object data stored on the Managed Heap. With integers, we were working with two separate data containers and that is why the console in the first example wrote out two different values.

The third example uses string. String is a Reference type but as we have already discussed, it behaves just like a Value type. The following example demonstrates this:

string text1 = "First text";
string text2 = text1;

text2 = "Second text";

Console.WriteLine("Text1: {0}", text1);
Console.WriteLine("Text2: {0}", text2);

Console.ReadLine();

If we were to interpret the behavior as though string was an ordinary Reference type, we would expect only one instance of text being created on the Managed Heap. Both “text1” and “text2” would then be references to the same data. The initial state of the data would contain text “First text” and then it would get changed through the “text2” reference to “Second text”. With this logic we would expect the console to write out the following:

Text1: Second text
Text2: Second text

This is not correct though! We have already mentioned that the string, even though it is a Reference type, always creates a new copy of its data on the Managed Heap with every reference we create to it. With this in mind we should not be surprised that the real output of the example above is:

Text1: First text
Text2: Second text

Text1 and text2 both point to a different data on the Managed Heap.

Casting

Suppose we have declared an integer variable. By that time the .NET Framework already created a necessary space in the Stack to hold the data for this new variable. Let us assume now that we need to store a larger number than the integer allows into this variable. We would like to change its type to long.

Changing the type of a variable, once it has been declared, is not possible in the .NET Framework. The necessary space on the Stack has already been allocated and cannot be changed. To achieve a similar result we can declare a new variable with the sufficient size and copy the contents of the original variable to it. This process is called casting.

There are two types of casting – implicit and explicit.

Implicit casting is done by the .NET Framework if the cast itself is considered safe. Safe in this context means that the contents of the original variable is sure to fit into the space declared by the new variable. For example an integer will always fit inside a long as long has a much greater range:

int i = 5;
long l = i;

The cast itself happens on the second line of the code above. There is nothing special you have to do to initiate the cast. It is done automatically for you – that is why it is called an implicit cast.

If you need to copy the contents of a variable that has a larger size reserved on the Stack into a variable of a smaller size, an explicit cast has to be used. The following example shows explicitly casting a long variable into an integer variable:

long l = 5;
int i = (int)l;

Again, the cast happens on the second line. This time the developer needs to specify the cast explicitly.

Even though there are times when a long would not fit into an integer, this example is not one of them. The number 5 can be represented by an integer safely and no data loss will occur.

Sometimes however it is not possible to do a safe cast. Consider the following example:

double d = 1.25;
int i = (int)d;

The double variable “d” consists of an integral number 1 and then 25 behind the decimal point. An integer supports only integral types. If we attempt to explicitly cast a double of 1.25 into an integer variable, a data loss will occur. The new integer variable will only contain the number 1. The rest will be cut off by the cast.

It is important to have this behavior in mind. Do not try to do an explicit cast unless you are absolutely sure that your data can be safely represented in the new datatype or unless the data loss that will occur is actually something that you want to achieve. The last example could for instance be intentionally used to round the number down to its closer integral number. This is also known as flooring the number.

Reference types can use casting as well. A reference variable pointing to a particular object can be cast to a reference variable of a different type. If the type of the target reference variable is an object predecessor of the type used to declare the source reference variable, an implicit cast will occur. In any other situation, an explicit cast needs to be used. If an explicit cast ends up invalid, an exception will be thrown.

Let us look at this in a simple example. Suppose we have a reference type called Person. We then have two other reference types – Male and Female. Both Male and Female inherit from Person. If we create an instance of object Male, we can safely use a Person-typed variable to reference it. This cast will be implicit:

Male male = new Male();
Person person = male;

If we were then to cast the original Male object (now referenced by a Person-typed variable) back to a variable of type Male, we would need to use an explicit cast. This is because Person is not a parent of Male – it is the other way around. In our example, because the real object now being referenced by Person-typed variable, is in fact the Male object, the cast will succeed:

Male male2 = (Male)person;

If however we tried to cast the same Male object, now still referenced as a Person-typed variable, into a Female-typed variable, the program would throw an exception:

Female female = (Female)person;

Male and Female, even though they share a common predecessor, are in fact completely different types. Many times you find that to be true even outside the world of object oriented programming… :)

Implicit types

In order to avoid always specifying the type of a variable, we can use the “var” keyword. This will create a variable which type will automatically be determined by whatever we try to save inside it. This, for example, will create a variable of type StringBuilder as we are trying to use it to hold a reference to a StringBuilder object:

var stringBuilder = new StringBuilder();

As with all other variables in C#, once a variable is declared, its type cannot be changed. This is true even if we use the keyword “var” to declare it.

Because the type of the resulting variable is determined by what we are trying to save inside it, wherever a “var” keyword is used for variable declaration, the variable also has to be initialized. Doing only the declaration would result in a compiler error:

var stringBuilder;

The “var” keyword is mostly used (with a small exception of anonymous objects) for the sake of convenience.

Implicit typing of literals

Literal is a number, string or any other data that is usually stored inside a variable. The way C# works internally is that any time you type a literal inside your code, it is treated as if it was a variable of a certain type.

If you type a number 10 for example, it will be treated as an integer. If you write a number that is too large to fit into an integer (ie.: 1 x 10^12) C# will treat it as a long variable.

If you type a number with a decimal point, it will be treated as a double.

If you type any text inside the quotation marks (“text”), it will be treated as a string.

With this in mind you can use the “var” keyword and know exactly what type your variable will be:

var integerVariable = 10;
var longVariable = 1000000000000;
var doubleVariable = 1.2;
var stringVariable = “Text”;

You can influence the resulting type by the use of explicit casting. In the following example number 1 will be saved as float, double and a decimal respectively:

var floatVariable = (float)1;
var doubleVariable = (double)1;
var decimalVariable = (decimal)1;

The same can be achieved by using a simpler syntax – adding one letter shortcut behind the number literal itself:

var floatVariable = 1F;
var doubleVariable = 1D;
var decimalVariable = 1M;

It is important to realize that not only a number itself is considered a literal but a result of their computations is also a literal. In the following example the result will be of type integer as both individual numbers are also integers:

var result = 10 * 10;

This brings us to an interesting point where even many seasoned developers make mistakes. Consider the following example:

var result = 5 / 2;

In this example, following the same logic as in the previous one, the result variable will be of type integer. This is true because both 5 and 2 were integers as well.

As integer can only store an integral number, the result of the computation above will be 2 (instead of 2.5 as many developers expect). You can verify the result on the console if you like:

Console.WriteLine(5 / 2);

To force C# to produce a double number instead, at least one of the source numbers has to be double as well. Combining all that we have learned so far, we can achieve this by either of these ways:

Console.WriteLine(5D / 2);
Console.WriteLine((double)5 / 2);
Console.WriteLine(5.0 / 2);

Arrays

Arrays are data structures where items of a single datatype are organized in rows and columns, possibly in even more dimensions.

A one dimensional (containing only rows) string array can be declared and initialized like this:

string[] array = new string[] { "One", "Two", "Three" };

To read a value of each particular row an indexer can be used. The following example shows how to write out the content of the first row to the console:

Console.WriteLine(array[0]);

All arrays in C# are zero based. That is why we use number 0 to access the first row. Number 1 would correspond to the second row.

Indexers can also be used to set a value of a particular array item. The following example changes the content of the second row from “Two” to “Test”:

array[1] = "Test";

Even though arrays are Reference types, the .NET Framework will require you to provide the number of items you plan to use your array for. With this information, the .NET Framework can allocate a sufficient block of memory on the Managed Heap so that all the array items can stay close together. This way the .NET Framework can provide a much better performance compared to a situation where it would have to relocate the entire array in memory every time you decide to add another item to it.

In the first example we not only declared an array but we also initialized it with three values. This is how the .NET Framework knew how big an array it should create. If you need to declare an array without initializing it with concrete values right away, you can have it initialized as an empty array. You still have to specify the number of items you plan to use your array for though:

string[] anotherArray = new string[2];

The last example shows how to create and work with two dimensional arrays:

string[,] twoDimentional = new string[5, 2];
twoDimentional[0, 0] = "Test";
Console.WriteLine(twoDimentional[0, 0]);

Nullable types

Most .NET Framework Value types do not have a concept of “no value”. Take a decimal for example. It can contain a negative decimal number, zero or a positive decimal number. There is nothing you can set it to indicate it does not have any value at all.

In a reality you can have a database that contains a decimal column called salary. The column can either contain the salary as a decimal number or it can have a null value to indicate, for example, that the HR department has yet to decide the salary.

The .NET Framework’s System.Decimal type has no concept of null. In fact, in the .NET Framework, as we have discussed before, null is mostly used only for Reference types and means that the reference to an actual object has been cleared.

To solve this, the .NET Framework introduced the Nullable type. This type actually acts as a wrapper and every Value type can be put inside it. The Nullable type itself is also a Value type. There are two ways you can declare a new Nullable type:

decimal? nullableDecimal = null;
Nullable<decimal> nullableDecimal = null;

Both ways are identical while the first one is preferred for its easier syntax.

Once the type is inside this Nullable wrapper, it gains two read-only properties. The Value property gives you access to the underlying type and its data. The HasValue property indicates whether there is or in fact is not any underlying data.

Assigning a new value to the type inside the Nullable wrapper is as easy as assigning the value to the Nullable wrapper itself. This all works because of implicit casting that we have discussed earlier:

nullableDecimal = 5;

Clearing the value is done by assigning a NULL to the Nullable wrapper:

nullableDecimal = null;

Once a new value (or the NULL value) is set, the HasValue and Value properties get updated accordingly.

Trying to access the Value property when in fact there is no underlying data will throw an exception. Because of this it is always beneficial to first test whether the wrapper contains any underlying data. This can be done by testing the HasValue property or comparing the wrapper directly to the null value:

if (decimalWrapper.HasValue)
{
    // Yes it has a value, now you can access the Value property
    Console.WriteLine(decimalWrapper.Value);
}

if (decimalWrapper != null)
{
    // Yes it has a value, now you can access the Value property
    Console.WriteLine(decimalWrapper.Value);
}

The Nullable wrapper is mostly clever enough and acts as if you were actually working with the underlying datatype. These two examples are therefore identical:

if (nullableDecimal > 0)
    Console.WriteLine("It is greater than 0.");

if (nullableDecimal.Value > 0)
    Console.WriteLine("It is greater than 0.");

Even though both of these examples work the same, many developers prefer to always use the Value property when reading the value. This way they can be sure that they have the underlying value and not a NULL value. Consider the following example:

decimal? nullableDecimal = null;

if (nullableDecimal > 0)
    Console.WriteLine("Is greater than 0.");
else
    Console.WriteLine("Is lower than 0.");

Since the developer forgot that there is no value inside the nullableDecimal, the code will, perhaps unexpectedly, write out “Is lower than 0”.

If the developer used the Value property, an exception would be thrown. That could alert the developer that something is wrong in perhaps a more informative way:

decimal? nullableDecimal = null;

if (nullableDecimal.Value > 0)
    Console.WriteLine("Is greater than 0.");
else
    Console.WriteLine("Is lower than 0.");

Generic list

Even though you can use arrays to store a collection of types, sometimes it is better to use a generic list.

List, as opposed to an array, does not force you to specify its capacity beforehand. It also makes adding, removing or inserting items into the list much easier – it provides methods for these kinds of operations.

The word “generic” denotes that you can specify the type of data the list will contain.

The following example shows how to declare and initialize a generic list of strings, how to add, remove and insert new items, how access the items through an indexer and how to find out the total count of items:

List<string> listOfStrings = new List<string>();

listOfStrings.Add("First string");
listOfStrings.Add("Last string");
listOfStrings.Add("Redundant string");

listOfStrings.Insert(1, "Middle string");

listOfStrings.RemoveAt(3);

Console.WriteLine("Count: {0}", listOfStrings.Count);

Console.WriteLine(listOfStrings[0]);
Console.WriteLine(listOfStrings[1]);
Console.WriteLine(listOfStrings[2]);

Console.ReadLine();

The output of this example is as follows:

Count: 3
First string
Middle string
Last string

From the performance point of view it is important to understand that the list is just a wrapper around an array. When the list is created, it tries to guess the best possible initial size of its underlying array. If the list then realizes that we need to insert more items than it had initially expected, it will create another internal array and copy everything that it has so far in it. While the list is much more flexible to use in ordinary parts of the code, sometimes, where performance is a key issue, more efficient arrays should be used.

Continue to: Basic constructs

Go up to: Basics


Should you have any questions or found a mistake that needs correcting, feel free to send an email to: info [at] mycsharp [dot] net


Advertisements :