In the next, 4th post from .NET Internals series, we’re going to meet a new friend called Garbage Collector, discuss this guy’s main responsibilities and see what is memory allocation in .NET applications and how it works.
Ready? Let’s start then! 😉
What (or who) is Garbage Collection (Collector)?
As you could read in the previous posts on internal details of .NET, there are a lot of conditions and assumption of memory organization in the framework. Data is stored on the proper structures, sometimes boxed and unboxed, but what’s the best about it all is that it’s almost completely invisible for us – developers.
In fact, providing this invisibility is a goal of Garbage Collector (GC). It should become our close friend (it’s not that scary as you could have heard – we’ll see that soon 🙂 ), because this automatic memory manager does a lot of stuff for us, including:
- allowing us to write code without having to release memory (still remember the meme about C++?),
- performing heap allocations in an efficient way,
- reclaiming the objects that are no longer used and clearing their memory,
- ensuring that every new managed object gets clean content to begin with, so there is no work to be done in an object’s constructor to allocate the memory for all its members,
- providing memory safety, so one object cannot use the memory reserved for another object.
The term ‘garbage collection’ was firstly used in LISP programming language in 1959 and since then it represents the concept of automatic memory management in programming languages and frameworks.
Does GC allocate or deallocate the memory?
There’s often this confusion between allocating and deallocating the memory – what is actually responsible for that? In principle, the goal of garbage collector is to deallocate the memory – as its name suggests, it collects (frees, reclaims) the garbage (not-anymore-needed objects).
On the other hand, when there’s any memory allocation request sent by an application, such query is transferred to CLR, which is actually responsible for allocating the memory. However, as we’ll see in the coming posts, the deallocation of memory (which is much more complex process that allocating it) can be triggered on various conditions and it can happen that allocation of the memory requires deallocation to be performed first. It doesn’t mean that this is a sequence of events (i.e. before allocation the deallocation is performed), but that the number of allocations done by the CLR can affect the way and frequency of garbage collections (we’d like to have available memory when an allocation is to be done).
That’s how the CLR ‘asks’ GC for help (or ‘communicates with GC’ by notifying it about allocations) in actually allocating the memory (for instance by making GC running a collection or heap compaction process more frequently). Because of that reason, it’s usually simplified that garbage collector handles both allocation and deallocation of memory.
As we’ll see below, allocation of memory is as simple as 1 + 2 = 3 🙂 – continue reading to get to know why!
Memory allocation in .NET
As we read in the first post of the series, each process has its own virtual address space. Garbage collector is responsible for allocating the memory on the managed heaps (more about heaps here) created in this process’s space. It means that garbage collector only takes care of reference type objects.
Value types are stored on the stack in LIFO order and their memory is reclaimed as soon as the method in which these value types are defined exits. It would make no sense if GC had to manage these theoretically simple and (in a perfect world) locally declared variables.
Next object pointer
To make it simple, as soon as managed heap is created, it contains a pointer to the memory address where the next object will be allocated. This pointer is simply a number and is referred to as next object pointer. Initially, its value is set to managed heap’s base address:
As soon as the first-ever reference object is to be allocated on this particular heap, the memory is allocated in the place where the next object pointer currently points to. State of the heap after first allocation would be something like that:
As presented above, the next object pointer is moved to the address just after the allocated object. It ensures that there are no unnecessary memory gaps on the heap.
Complexity of memory allocation
In reality, the next object pointer’s value is a number in hexadecimal format (e.g. 0xF7279). When a new allocation is to be made, the only operation to perform is an addition operation. It is known how much space is needed for the new object (more details in the next section), so in order to reserve the proper memory chunk, number of bytes necessary for a new object is added to the next object pointer’s current value.
This is how the reservation of managed memory works. Because of that it can be said that allocating the memory for reference types is almost as fast as allocating variables on the stack.
If you’d like to see how you can check the memory address of an object, I recommend checking this article.
How is an object’s size calculated?
In the comments under the previous post, Jelena asked for explanation on how the size of an object is calculated. Here it comes 😉
As we all know from the second post, object is sometimes allocated on the Small Object Heap (SOH) or Large Object Heap (LOH) depending on its size. For that reason before the allocation is made, the object’s size must be determined. How is this size calculated then?
It would make things easy to assume that size of an object is everything it contains (also other objects’ sizes). In reality, it’s not counted this way. Objects contained in another objects are allocated separately on the heap, so their actual sizes are not included in the parent object’s size.
Very good example is given in Chris Farrell’s and Nick Harrison’s book – Under the Hood of .NET Memory Management. If we consider the following code:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
class MyClass | |
{ | |
string Test="Hello world Wazzup!"; // 19 characters | |
byte[] data=new byte[86000]; // 86000 bytes (>85K – goes on LOH) | |
} |
That’s how the heaps’ state looks like as soon as as instance of MyClass is created:
The size of MyClass object instance will include:
- general class stuff (e.g. some metadata – you can check it yourself in IL code with ILSpy as we did with boxing and unboxing),
- memory required to store the pointers (memory addresses) to the string and bytes array.
string Test will be allocated on the SOH, and byte[] data will be allocated on the LOH (it’s more than 85K).
In general, the size of an object is calculated by adding sizes of:
- general class data,
- pointers (for reference types class members) containing only addresses in memory (hexadecimal numbers),
- value types (for value types class members) – more details here.
Summary
I hope this post makes the role of garbage collector a bit more clear for you. We’ve also seen how easy the memory allocation is and what’s the general way of calculating an object’s size.
Even we don’t perform the actual allocations ourselves manually and may think like this guy:
We’ll see if that’s true next week, when we’ll dig into more complex topic, the real purpose of the GC – releasing the memory!
Stay tuned!