Garbage Collection in Data Structure
Garbage collection (GC) is a dynamic technique for memory management and heap allocation that examines and identifies dead memory blocks before reallocating storage for reuse. Garbage collection's primary goal is to reduce memory leaks. Garbage collection frees the programmer from having to deallocate and return objects to the memory system manually. Garbage collection can account for a considerable amount of a program's total processing time, and as a result, can have a significant impact on performance. Stack allocation, region inference, memory ownership, and combinations of various techniques are examples of related techniques.
The basic principles of garbage collection are finding data objects in a program that cannot be access
ed in the future and reclaiming the resources used by those objects. Garbage collection does not often handle resources other than memory, such as network sockets, database handles, user interaction windows, files, and device descriptors. Methods for managing such resources, especially destructors, may be sufficient to manage memory without the requirement for GC. Other resources can be associated with a memory sector in some GC systems, which, when collected, causes the task of reclaiming these resources.
Many programming languages, such as RPL, Java, C#, Go, and most scripting languages, require garbage collection either as part of the language specification or effectively for practical implementation (for example, formal languages like lambda calculus); these are referred to as garbage-collected languages. Other languages, such as C and C++, were designed for use with manual memory management but included garbage-collected implementations. Some languages, such as Ada, Modula-3, and C++/CLI, allow for both garbage collection and manual memory management in the same application by using separate heaps for collected and manually managed objects; others, such as D, are garbage-collected but allow the user to delete objects manually and completely disable garbage collection when speed is required.
Garbage collection's dynamic approach to automatic heap allocation addresses common and costly faults that, if left undiscovered, can lead to real-world programmer problems.
Allocation errors are costly because they are difficult to detect and correct. As a result, many programmers regard garbage collection as an essential language feature that simplifies the programmer's job by reducing manual heap allocation management.
Now let us have a look at some of the most famous and commonly implemented Garbage Collection techniques.
Mark and Sweep
The Mark Sweep algorithm is as straightforward as its name suggests. It consists of two phases: a mark phase and a sweep phase. The collector crawls across all the roots (global variables, local variables, stack frames, virtual and hardware registers, and so on) and marks every item it meets by setting a bit anywhere around that object during the mark phase. It also walks across the heap during the sweep phase, reclaiming memory from all the unmarked items.
The fundamental algorithm is outlined in pseudo-code in Python below. The collector is assumed to be single-threaded in this example, although there might be several mutators. While the collector is running, all mutator threads are paused. This stop-the-world technique may appear inefficient, but it vastly simplifies the collector implementation because mutators cannot affect the state beneath it.
It is evident from the pseudo-code that mark-sweep does not immediately identify rubbish. Instead, it first recognizes all items that aren't rubbish, such as living things, before concluding that everything else is garbage. The process of marking is a cyclical one. We recurse into its child fields after detecting a live reference, and so on. Because of the time cost and risk for stack overflow, recursive procedure calls aren't a suitable way for marking. That's why we're utilizing a stack that's explicitly defined. The space and time overhead of the marking phase are both made obvious by this technique. The size of the longest path that must be traced via the object graph determines the maximum depth of the candidate's stack.
Theoretically, the worst case is equal to the number of nodes on the heap. However, most real-world applications yield rather shallow stacks. Despite this, a secure GC system must deal with unusual scenarios. We use the mark() right after adding a new object to the candidates in our implementation to keep the stack size under control. The problem with marking is that GC is required exactly because there is little memory, yet auxiliary stacks demand more space. Large applications might lead the trash collector to run out of memory.
There are a variety of approaches to detect overflow. One advantage of using an explicit stack is that an overflow may be immediately identified and a recovery procedure initiated. Using an inline check-in for each push is a straightforward approach ( ). Using a guard page and triggering recovery after trapping the guard violation exception might be a somewhat more efficient solution. Both techniques' tradeoffs must be considered in the context of the underlying operating system and hardware. The is-full test will probably cost a few instructions (test followed by a branch) in the first technique, but it will be performed every time we inspect an object. The second technique necessitates catching access violation exceptions, which are often costly but uncommon.
Sweep() is a simple function with a straightforward implementation. It linearly traverses the heap, freeing any objects that aren't tagged. Our heap layout does face parseability restrictions as a result of this. The next object(address) implementation must be able to return the heap's next object. In most cases, the heap just has to be parseable in one way. In most GC-enabled language runtimes, an object's data is often tagged with an object header. The header provides details about the item, such as type, size, hashcode, mark bits, sync block, etc.
The header of an object is usually placed before the object's data. As a result, the object's reference points to the middle of the allocated heap cell immediately after the object header, rather than the first byte. This makes it easier to parse the heap from the top down. In most cases, free(address) will fill the freed cell with a predetermined filler pattern that the heap parsing algorithm recognizes.
Advantages of Mark and Sweep Algorithm
The method of reference counting is really easy. It is based on counting how many pointer references each allocated object has. It's a straightforward, inherently incremental solution because the program's memory management overhead is distributed. Aside from memory management, reference counting is widely used in operating systems as a resource management tool for managing system resources such as files, sockets, etc.
Each allocated object in the reference counting technique has a reference count field. The memory manager is in charge of ensuring that the reference count of each object is equal to the number of direct pointer references to that object at all times. Below is a simplified version of the algorithm.
The inability to recover cyclic storage is the most significant disadvantage of reference counting. Cyclic data structures such as doubly-linked lists and non-basic graphs cannot be successfully recovered using a simple reference counting technique and will leak memory.
Advantages of Reference Counting
So, in this article, we have understood the Garbage collection in data structure and the importance of garbage collection to make different data structures more efficient. We also understood the two major garbage collection algorithms named Mark and Sweep and Reference Counting and the working of both these algorithms, along with the prominent advantages of these garbage collection algorithms mentioned above.