Extract bits in C

Bit manipulation is a fundamental aspect of programming, particularly in systems programming, embedded systems, and performance-critical applications. One common operation in bit manipulation is extracting specific bits from data. In the C programming language, extracting bits is efficiently handled using bitwise operators, allowing programmers to interact directly with the binary representation of data. This operation is crucial for tasks such as parsing binary file formats, managing hardware registers, optimizing data storage, and implementing encryption algorithms.

At its core, extracting bits involves isolating specific bits within an integer or a similar data type. The most basic form of this operation is extracting a single bit, which can be achieved using a combination of bitwise Shift ('>>') and bitwise AND ('&') operations. For instance, to determine the value of a particular bit in a number, the bitwise shift operator can be used to move the target bit to the least significant position, followed by an AND operation with '1' to isolate the Bit's value.

More complex scenarios involve extracting multiple bits, often contiguous, to obtain a subfield from a data word. This is common in networking, where headers in data packets contain various fields packed into a binary structure. Here, a mask can be used to clear unwanted bits, leaving only the bits of interest. This mask is created by setting the required number of lower bits to '1' and applying a bitwise AND operation after shifting the data to the right.

The manipulation of bits in C is not merely about accessing or modifying data at a low level; it also involves considerations of efficiency and portability. For example, endianness (the byte order in which data is stored) can affect how bits are extracted and interpreted, especially in systems with different hardware architectures. Thus, a thorough understanding of the underlying hardware and careful handling of data representation is essential.

Approach-1: Bitwise Operations

Extracting a single-bit

To extract a single bit from a specific position within an integer, we use a combination of the bitwise right Shift (>>) and bitwise AND (&) operations.

Program:

Output:

 
Extracting single Bit:
Number: 172 (Binary: %08b), Position: 172
Shifted Number: 21 (Binary: %08b)
Extracted Bit: 1
Extracting single Bit:
Number: 218 (Binary: %08b), Position: 218
Shifted Number: 6 (Binary: %08b)
Extracted Bit: 0
Extracting multiple bits:
Number: 172 (Binary: %08b), Start: 172, Number of Bits: 2
Mask: 15 (Binary: %08b)
Shifted Number: 43 (Binary: %08b)
Extracted Bits: 11 (Binary: %04b)
Extracting multiple bits:
Number: 218 (Binary: %08b), Start: 218, Number of Bits: 1
Mask: 7 (Binary: %08b)
Shifted Number: 109 (Binary: %08b)
Extracted Bits: 5 (Binary: %03b)
Results:
Single Bit from number1 at position 3: 1
Single Bit from number2 at position 5: 0
4 Bits from number1 starting at position 2: 11
3 Bits from number2 starting at position 1: 5   

Explanation:

The provided C code demonstrates the extraction of single and multiple bits from an integer using bitwise operations. It includes two primary functions: one for extracting a single bit and another for extracting multiple bits, along with logging-like output to show the internal state and results of these operations.

  • Function to Extract a Single Bit
    The function extract_single_bit(int number, int position) takes two arguments: number (the integer from which a bit is to be extracted) and position (the bit position to extract). The function uses bitwise operations to isolate and return the value of the Bit at the specified position.
    Right Shift (>>): The input number is right-shifted by position places, moving the target bit to the least significant Bit (LSB) position. This operation effectively discards all bits to the right of the target bit.
    Bitwise AND (&): The result of the right Shift is then AND with 1 (binary 0001). This operation clears all bits except the LSB, isolating the Bit of interest. After that, the extracted Bit is logged and returned by the function.
    The logging statements in the function print the original number in both decimal and binary formats, the shifted number, and the extracted Bit. This provides a clear view of the bit manipulation process and the intermediate values.
  • Function to Extract Multiple Bits
    The function extract_multiple_bits(int number, int start, int num_bits) extracts a range of bits from the input integer. The arguments are number (the integer to extract bits from), start (the starting bit position), and num_bits (the number of bits to extract).
    Mask Creation: The function creates a mask with num_bits consecutive 1s using the expression (1 << num_bits) - 1. This mask will later be used to isolate the desired bits from the shifted number.
    Right Shift (>>): Similar to the single-bit extraction, the function shifts the input number right by start positions. This operation aligns the target bit range with the LSB.
    Bitwise AND (&): The mask is applied to the shifted number using the AND operation. This step zeros out all bits except those corresponding to the mask, effectively extracting the desired bit range.
    The function logs the original number, the mask, the shifted number, and the extracted bits in both decimal and binary forms. This helps illustrate how the bitwise operations isolate the specific bits.
  • Main Function and Test Cases
    The main function sets up two example integers (number 1 and number 2) and tests the extraction functions. It demonstrates extracting a single bit from different positions and extracting multiple bits. The results, including the values of the extracted bits, are printed out, providing an end-to-end example of the bit extraction process.

Complexity Analysis:

Time Complexity

Time complexity measures the number of operations the code performs relative to the size of the input. In this case, the input size relates to the number of bits in the integer.

Extracting a single-bit

The function extract_single_bit(int number, int position) involves:

Right Shift Operation (>>): This operation shifts the bits of the integer to the right. The time complexity of a right shift operation is O(1) because it is a basic operation performed by the processor in constant time, regardless of the number of bits shifted.

Bitwise AND Operation (&): This operation also has a time complexity of O(1), as it simply compares corresponding bits of the number and the mask (which, in this case, is 1).

The overall time complexity of extracting a single bit is O(1) because both the right Shift and bitwise AND operations are performed in constant time.

Extracting Multiple Bits

The function extract_multiple_bits(int number, int start, int num_bits) involves:

Mask Creation: The mask is created using (1 << num_bits) - 1. This involves a bit shift and a subtraction operation. The bit shift's complexity is O(1) because it depends on the processor's word size and not on the actual value of num_bits.

Right Shift Operation (>>): Similar to the single-bit extraction, this operation shifts the integer by a constant number of positions, resulting in O(1) complexity.

Bitwise AND Operation (&): Applying the mask to the shifted number to isolate the bits of interest is also an O(1) operation.

Thus, the time complexity for extracting multiple bits is also O(1), as all operations (bitwise shifts, mask creation, and bitwise AND) are constant time.

Space Complexity

Space complexity refers to the amount of memory the algorithm requires relative to the input size.

Extracting a single-bit

The space complexity for extracting a single bit is O(1) because only a few additional integer variables are used (number, position, and extracted_bit). These variables occupy a constant amount of space, regardless of the input size.

Extracting Multiple Bits

Similarly, the space complexity for extracting multiple bits is O(1). The function uses a few integer variables (number, start, num_bits, mask, and extracted_bits), all of which consume constant memory space. The size of these variables does not depend on the number of bits extracted or the size of the integer input.

Main Function

The main function initializes a few integers and is called the Bit extraction function. The memory used for these integers and function calls is also O(1). The functions themselves do not create significant additional data structures or allocate dynamic memory, ensuring that the overall space complexity remains constant.

Approach-2: Unions for Type Punning in C

Unions in C are a special data structure that allows different data types to share the same memory location. This means that a union can store different types of data, but only one type at a time. The size of its largest member determines the size of the union, and any member can be accessed at any time, with the underlying bytes of memory interpreted according to the member's data type.

This feature of unions can be used for type punning, a technique where the same data is accessed or interpreted in different ways through different types. Type punning can be particularly useful in low-level programming, such as systems programming, embedded systems, and when working with hardware interfaces, where direct manipulation of the underlying data representation is required.

How does unions work?

In a union, all members start at the same memory address, which means they overlap. Accessing one member after writing to another can change the interpretation of the data, providing a view into the raw bits that compose the value.

Program:

Output:

 
Using union to interpret float as int:
Float value: 3.141590
Interpreted as integer: 1078530000
Raw bytes: d0 0f 49 40 
Using union to interpret int as float:
Integer value: 1078530016
Interpreted as float: 3.141594
Raw bytes: e0 0f 49 40 
Byte array from float 3.141590:
d0 0f 49 40 
Float value obtained from a byte array:
Float value: 3.141590   

Explanation:

  • Union Definition and Purpose
    The union DataUnion is defined with three members: an integer (i), a float (f), and a character array (bytes). All these members occupy the same memory space, which allows different interpretations of the same binary data. The size of the union is determined by its largest member, which in this case is the bytes array of 4 bytes. This setup enables us to view and manipulate the same data through different formats.
    Function Descriptions
    Printing a Float as an Integer
    The printFloatAsInt function demonstrates how a floating-point number is stored in memory and how it can be interpreted as an integer. It sets the float value in the union and prints it. Due to the shared memory space, the float's bit pattern is also interpreted as an integer. This function outputs:
    The integer representation of the float's bit pattern shows how the bytes are interpreted differently. The raw byte values provide a hexadecimal view of the float's bit pattern.
    This is useful for understanding how floating-point numbers are represented in binary and how their bit patterns can be viewed as integers.
    Printing an Integer as a Float
    The printIntAsFloat function reverses the process by setting an integer value in the union and then interpreting it as a float. This function:
    Displays the integer value.
    Shows the float value that results from interpreting the same bit pattern as a float. Prints the raw byte representation of the integer.
    This function illustrates how an integer's bit pattern can be reinterpreted as a floating-point number, which helps in understanding data encoding and conversions between types.
  • Byte Array Conversion Functions
    The convertBytesToFloat and convertFloatToBytes functions facilitate the conversion between float values and byte arrays. These functions use memcpy to transfer data between the union's bytes array and other data types:
    1. The convertFloatToBytes function copies the bytes representing a float value into a byte array. It is useful for tasks such as serialization or data transmission where raw byte formats are needed.
    2. The convertBytesToFloat function copies a byte array into the union and retrieves the corresponding float value. This function is helpful for reconstructing data from a byte stream or interpreting data read from binary files.
  • Main Function
    The main function serves as a practical demonstration of the previously defined functions:
    After that, it uses printIntAsFloat to demonstrate how an integer (1078530016) can be viewed as a float, including its raw byte representation. Next, it converts a float to a byte array and prints the byte values.
    Finally, it converts the byte array back to a float and prints the result, showcasing the round-trip conversion between data types and byte representations.

Complexity Analysis:

Time Complexity

Time complexity measures how an algorithm or program's runtime scales with the size of its inputs.

Union Operations:

Setting and Accessing Union Members: Operations such as assigning a value to a union member (data.f = value) and accessing a union member (data.i or data.bytes) are performed in constant time, O(1). This is because accessing any member of the union involves direct memory access without iterative or complex operations.

Printing Operations:

Printing Values: The printf function is used to output values. The time complexity of printf is dependent on the complexity of formatting and output operations. However, in the context of this code, each printf statement is executed in constant time, O(1), given that the operations involved (printing integers, floats, and bytes) are simple and do not depend on the size of the data.

Byte Manipulation Functions:

memcpy Function: The memcpy function is used to copy bytes between the union's bytes array and other variables. Its time complexity is O(n), where n is the number of bytes copied. In this code, memcpy always operates on a fixed size (4 bytes), so the complexity can be considered O(1) in practice.

All operations performed by the union manipulation, printf, and memcpy functions are constant time operations due to the fixed size of the data involved (4 bytes for the union members). Therefore, the overall time complexity of the code is O(1), meaning the execution time is constant and does not scale with input size, as the operations are performed on fixed-size data.

Space Complexity

Space complexity measures the amount of memory the program uses relative to the size of its inputs.

Union Storage:

The DataUnion union contains three members (int, float, and char[4]), but they all share the same memory. The size of the union is determined by its largest member, which in this case is 4 bytes (the size of the char[4] array). Thus, the space complexity for the union is O(1) because the memory required is constant and fixed.

Temporary Variables:

Functions like printFloatAsInt and printIntAsFloat use a constant amount of memory for local variables. The space required for these variables does not depend on the size of the input, so their space complexity is O(1).

Byte Arrays:

The byte array used in the conversion functions (char byteArray[4]) is of fixed size (4 bytes), so the space required for these arrays is constant, O(1).

The fixed-size union and temporary variables dominate the program's space complexity. Since no dynamic memory allocation is performed and the space used is constant regardless of input size, the overall space complexity of the code is O(1).