# Advanced Table Manipulation in MATLAB

## Introduction:

In the realm of data analysis and manipulation, MATLAB offers a treasure trove of tools and functionalities to work with tables efficiently. As we delve deeper into advanced topics, we uncover the versatility and power that MATLAB provides for sorting, filtering, grouping, aggregating, and joining tables. In this comprehensive guide, we will explore these advanced topics, empowering you to master the art of table manipulation in MATLAB.

Topic Description
Sorting and Filtering
Sorting Data Use the sortrows function to sort a table based on one or more columns. Specify column(s) and order (ascending or descending).
Filtering Data Utilize logical indexing to filter rows based on specific conditions. Use comparison operators for creating logical conditions.
Grouping and Aggregating
Grouping Data Use findgroups to create unique identifiers for groups based on columns.
Aggregating Data Employ split applies to apply functions (e.g., mean, sum, median) to groups of data. Obtain aggregate values per group.
Joining Tables
Inner Join Use innerjoin to combine two tables based on matching values in specified columns, retaining only matching rows.
Outer Join Employ outerjoin to merge tables while keeping all rows from both tables, filling missing values with NaN or specified values.
Removing Duplicates
Identifying Duplicates Use find groups followed by splitapply to find rows with duplicate values, creating groups based on identical rows.
Removing Duplicates Utilize a unique function to remove duplicate rows while retaining the first occurrence, providing a clean dataset.
Concatenation
Vertical Concatenation Stack tables on top of each other to combine rows. Use [T1; T2] for vertical concatenation with the same variables.
Horizontal Concatenation Combine tables side by side to add more columns. Use [T1, T2] for horizontal concatenation with different rows but same vars.
Missing Data Handling
Detecting Missing Data Use ismissing to find NaN, undefined, or other missing values. Provides a logical array indicating missing values.
Handling Missing Data Fill missing values with fill missing using methods like 'constant', 'linear', or 'previous'. Interpolate or replace.

## Sorting and Filtering Tables

Sorting and filtering tables are essential operations when you need to organize and extract specific subsets of your data. MATLAB provides a range of functions to sort tables based on column values and filter rows based on specific conditions.

## Sorting Tables

Sorting allows you to arrange the rows of a table in a specified order based on the values in one or more columns. MATLAB's sortrows function is a powerful tool for this purpose.

### Sorting by a Single Column

Sorting the T table by the 'Age' column in ascending order organizes the rows from the youngest to the oldest individuals. This operation is particularly useful for gaining insights into trends or patterns based on age.

Example:

Output:

Explanation:

Creating Sample Table T:

• We create a sample table T with columns 'Name', 'Age', and 'Height'.
• The values for each column are provided in the cell arrays Name, Age, and Height.

Displaying the Original Table T:

• The script displays the original table T using disp(T), showing the initial data.

Sorting Table T by 'Age' Column:

We use the sortrows(T, 'Age') function to sort table T by the 'Age' column in ascending order.

The result is stored in the variable sorted_age_asc.

Displaying the Sorted Table:

• We display the sorted table sorted_age_asc using disp(sorted_age_asc), showing the table sorted by 'Age' in ascending order.

### Sorting by Multiple Columns

Sorting by multiple columns allows for more nuanced organization of the data. In the example above, the T table is sorted first by the 'Group' column in ascending order and then by the 'Score' column in descending order. This arrangement helps identify the highest-scoring individuals within each group.

Example:

Output:

Explanation:

Creating a Sample Table:

• First, we create a sample table T with columns 'Name', 'Group', and 'Score'. Each row represents a person with their name, group, and a score.

Displaying the Original Table:

• We display the original table T to see the data before sorting.

Sorting the Table:

Next, we use the sortrows function to sort the table T:

• We specify 'Group' as the first column to sort by, in ascending order ('ascend').
• We specify 'Score' as the second column to sort by, in descending order ('descend').

This means that MATLAB will first sort the rows based on the 'Group' column in alphabetical order ('A', 'B', 'C', etc.), and within each group, it will sort the rows based on the 'Score' column in descending order (highest score first).

Displaying the Sorted Table: Finally, we display the sorted table sorted_group_score to see the result of the sorting.

## Filtering Tables

Filtering tables allows you to extract rows that meet specific criteria, such as selecting individuals of a certain age range or isolating data from a particular group.

### Simple Filtering

Simple filtering operations, as demonstrated above, enable the extraction of subsets of data based on straightforward conditions. Filtering by 'Age' greater than 30 isolates individuals who fall within a specific age group for further analysis.

Example:

Output:

Explanation:

Creating a Sample Table T:

• We create a table T with columns 'Name', 'Age', and 'Height'.
• Each row represents a person with their name, age, and height.

Displaying the Original Table T:

• We display the original table T to see the data before filtering.

Filtering the Table by Age:

• We filter the table T to include only individuals with an 'Age' greater than 30.
• The result is stored in the variable filtered_older_than_30.

Displaying the Filtered Table:

• We display the filtered table filtered_older_than_30 to see the individuals whose age is greater than 30.

### Complex Filtering

Complex filtering operations allow for the combination of multiple conditions using logical operators such as & (AND) and | (OR). In the example above, individuals aged between 25 and 35 with a 'Score' greater than 80 are filtered from the T table, providing a focused subset of the data for analysis.

Example:

Output:

Explanation:

Creating a Sample Table T:

• We create a table T with columns 'Name', 'Age', and 'Height'.
• Each row represents a person with their name, age, and height.

Displaying the Original Table T:

• We display the original table T to see the data before filtering.

Filtering the Table by Age and Height:

• We filter the table T to include individuals whose:
• 'Age' is between 25 and 35, and
• 'Height' is greater than 160.
• The result is stored in the variable filtered_complex.

Displaying the Filtered Table:

• We display the filtered table filtered_complex to see the individuals who meet these conditions.

## Grouping and Aggregating Data

Grouping and aggregating data allows you to summarize information based on specific categories or groups within your dataset. This is particularly useful for generating insights from large datasets or performing statistical analyses.

### Grouping Data

Grouping divides the table into smaller subsets based on the values in one or more columns. MATLAB's findgroups function is often used to create group indices, which can then be used for various operations.

### Creating Group Indices

Creating group indices provides a way to identify and separate distinct categories within a dataset. These group indices are then utilized for subsequent aggregation operations.

Example:

Output:

Explanation:

Creating Group Indices based on 'Category':

• It uses the findgroups function to create group indices based on the values in the 'Category' column.
• The result is stored in the variable group_indices.

Displaying the Group Indices:

• Finally, the script displays the group indices group_indices to show the grouping of rows based on the 'Category' column.

## Aggregating Data

Aggregating data involves applying functions to the groups created, such as calculating sums, averages, counts, or other statistical measures. MATLAB's splitapply and group summary functions are valuable tools for aggregation.

### Using splitapply

The splitapply function applies a specified function (@mean in this case) to each group of data defined by the group_indices. This results in the calculation of the average 'Score' for each distinct 'Category' in the dataset.

Output:

Explanation:

Creating a Sample Table T:

• The script creates a table T with columns 'Name', 'Category', and 'Score'.
• Each row represents a person with their name, category, and score.

Creating Group Indices based on 'Category':

• It uses the findgroups function to create group indices based on the values in the 'Category' column.
• These group indices identify distinct categories within the 'Category' column.

Calculating Average Scores for Each Category using splitapply:

• The splitapply function applies the @mean function (which calculates the mean) to the 'Score' column of table T.
• It groups the scores based on the previously created group_indices for each category.
• The result is an array avg_scores containing the average score for each category.

Displaying the Average Scores:

• The script displays the average scores for each category, showing the mean score for each unique category in the 'Category' column of the table T.

## Joining Tables: Merging and Concatenating

Joining tables involves combining information from multiple tables based on common variables or keys. MATLAB offers functions to merge, join, or concatenate tables, enabling you to create comprehensive datasets from disparate sources.

### Inner Join

An inner join combines rows from two tables based on a common key, retaining only the rows that have matching values in both tables.

Example:

Output:

Explanation:

Creating Sample Tables T1 and T2:

• Two sample tables T1 and T2 are created with columns 'ID', 'Name', and 'Score'.
• These tables represent some hypothetical data with IDs, names, and scores.

Inner Join Operation:

• The innerjoin function combines data from T1 and T2 based on matching 'ID' values.
• It creates a new table, inner_join_table, with rows in which the 'ID' values exist in both T1 and T2.

Displaying the Result:

• The script displays the resulting inner join table, inner_join_table, which shows the combined data from T1 and T2 for the common 'ID' values.

### Outer Join

An outer join combines rows from two tables, keeping all rows from both tables and filling in missing values with NaNs (for numeric columns) or empty strings (for character columns) where there are no matches.

Example:

Output:

Explanation:

Creating Sample Tables T1 and T2:

• Two sample tables T1 and T2 are created with columns 'ID', 'Name', and 'Score'.
• These tables represent hypothetical data with IDs, names, and scores.

Displaying the Original Tables T1 and T2:

• The script displays the original tables T1 and T2 to show the data before the outer join operation.

Outer Join Operation:

• The outerjoin function combines data from T1 and T2 based on matching 'ID' values.
• It creates a new table, outer_join_table, with rows in which the 'ID' values exist in either T1, T2, or both.

## Concatenating Tables

Concatenating tables allows you to combine tables vertically (stacking rows) or horizontally (concatenating columns).

### Vertical Concatenation

Vertical concatenation stacks the rows of T1 on top of T2, creating a larger table with combined data from both sources.

Example:

Output:

Explanation:

Creating Sample Tables T1 and T2:

• Two sample tables T1 and T2 are created with columns 'ID', 'Name', and 'Score'.
• These tables represent hypothetical data with IDs, names, and scores.

Renaming Variables in T2:

• We rename the variables in T2 to match the variables in T1 using T2.Properties.VariableNames.
• This ensures that both tables have the same variable names for vertical concatenation.

Displaying the Original Tables T1 and T2:

• The script displays the original tables T1 and T2 to show the data before the vertical concatenation.

Vertical Concatenation:

• The vertical concatenation [T1; T2] now works because both tables have the same variable names.
• It stacks the rows of T1 on top of T2.

### Horizontal Concatenation

Horizontal concatenation combines the columns of T1 and T2 side by side, creating a wider table with additional variables or attributes.

Example:

Output:

Explanation:

Horizontal Concatenation:

• The horizontal concatenation [T1, T2] now works because both tables have unique variable names.
• It combines the columns of T1 with the columns of T2 side by side.

## Working with Missing Data

Dealing with missing data is a common challenge in data analysis. MATLAB provides functions to handle missing values, such as detecting, removing, or filling in missing data points.

### Detecting Missing Data

The missing function identifies the locations of missing values in the T table and marks them as true in the resulting logical array.

Example:

Output:

Explanation:

• We create a sample table T with columns 'ID', 'Name', and 'Score', intentionally including missing values (NaN).
• The ismissing(T) function detects these missing values and creates a logical matrix where 1 indicates a missing value, and 0 indicates a non-missing value.
• The resulting missing_values matrix helps to identify and handle missing data in the table.

### Removing Rows with Missing Data

The rmmissing function removes entire rows from the T table if they contain any missing values, providing a cleaned dataset with complete observations.

Example:

Output:

Explanation:

• We create a sample table T with columns 'ID', 'Name', and 'Score', intentionally including missing values (NaN).
• The rmmissing(T) function removes rows with any missing values from T and creates a new table T_cleaned.
• The resulting T_cleaned table contains only the rows without any missing values, effectively cleaning the data.

## Handling Duplicate Data

Duplicate data can skew analysis results and lead to inaccuracies. MATLAB provides functions to identify and remove duplicate rows from tables.

### Identifying Duplicate Rows

The duplicated rows function flags rows in the T table that are exact duplicates of other rows, providing insights into potential data redundancy.

Example:

Output:

Explanation:

Creating Sample Table T with Potential Duplicate Rows:

• We create a sample table T with columns 'ID', 'Name', and 'Score'.
• Some rows are intentionally duplicated to demonstrate the identification of duplicate rows.

Displaying the Original Table T:

• The script displays the original table T to show the data with potential duplicate rows.

Identifying Duplicate Rows:

• We first use the findgroups function to create group indices based on the rows of table T.
• Next, we use splitapply to calculate the count of each group, which corresponds to the number of occurrences of each ID.
• Rows with ID values occurring more than once are considered duplicates.

Displaying the Indices of Duplicate Rows:

• The script displays the indices of the duplicate rows by finding the indices where the count of ID is greater than 1.

### Removing Duplicate Rows

The unique function removes duplicate rows from the T table, retaining only the first occurrence of each unique row. This ensures a clean dataset with no duplicate entries.

Example:

Output:

Explanation:

Creating Sample Table T with Potential Duplicate Rows:

• We create a sample table T with columns 'ID', 'Name', and 'Score'.
• Some rows are intentionally duplicated to demonstrate the removal of duplicate rows.

Displaying the Original Table T:

• The script displays the original table T to show the data with potential duplicate rows.

Removing Duplicate Rows:

• The unique(T) function removes duplicate rows from table T, keeping the first occurrence of each unique row.
• This function returns a new table T_unique with only the unique rows.

## Summary:

Mastering advanced table manipulation techniques in MATLAB opens up a world of possibilities for data analysis and exploration. Sorting, filtering, grouping, aggregating, joining, and handling missing or duplicate data are crucial skills for extracting valuable insights from your datasets.

• With the diverse range of functions and operations provided by MATLAB, you have the tools to manipulate, transform, and analyze tables of any size and complexity effectively.
• Whether you are working with experimental results, survey data, or financial records, the capabilities of MATLAB's table functionalities empower you to gain deeper insights and make informed decisions.

Incorporate these advanced topics into your MATLAB workflow to streamline your data analysis processes, discover meaningful patterns, and unlock your data's full potential.

Next TopicMatlab Smoothing