MySQL ROLLUP

The ROLLUP in MySQL is a modifier used to produce the summary output, including extra rows that represent super-aggregate (higher-level) summary operations. It enables us to sum-up the output at multiple levels of analysis using a single query. It is mainly used to provide support for OLAP (Online Analytical Processing) operations.

The ROLLUP modifier can only be used with the GROUP BY query in MySQL.

Syntax:

The following are the syntax to use the ROLLUP modifier:

SELECT 
    column1, column2, column3, ...
FROM 
    table_name
GROUP BY
    column1, column2,... WITH ROLLUP; 

In this syntax, we need to specify the column names displayed in the query result in the SELECT clause. Next, we will mention the table name. After that, we have specified the GROUP BY clause, including the column names based on which we want to aggregate data. Finally, we specify the WITH ROLLUP modifier to get the super-aggregate output in an additional row.

We have already learned that the GROUP BY query is applied with aggregate functions such as MAX, MIN, SUM, COUNT, AVG, etc. that groups the output rows by single or multiple columns. The ROLLUP modifier is an option to use GROUP BY query that includes extra fields for representing the subtotals. These additional rows are called super-aggregate rows, which is the combination of the grand-total rows. Thus, the ROLLUP modifier allows us to create multiple grouping of set rows within a single query based on the columns specified in the GROUP BY clause in MySQL.

MySQL ROLLUP Explanation

If we want to understand the ROLLUP modifier, we must have to know what is grouping set. A grouping set is a set of columns that we want to group to get the resultant output. For example, suppose we have a table "sales" that contains the following data:

If we want to summarize the result per year, we will use the simple GROUP BY clause as below:

SELECT Year, SUM(Sale) AS Total_Sales 
FROM sales 
GROUP BY Year;

It will give the below output that shows the total (aggregate) sale for each year:

In the above query, the grouping set is denoted by the column name Year. If we need to generate more than one grouping sets together in a single query, we may use the UNION ALL operator as follows:

SELECT Year, SUM(Sale) AS Total_Sales 
FROM sales 
GROUP BY Year
UNION ALL
SELECT NULL, SUM(Sale) AS Total_Sales 
FROM sales;

In this query, we can see the NULL column. It is because the UNION ALL clause requires all queries to have the same number of columns. So to fulfill this requirement, we have added NULL in the select list of the second query.

When we execute the query, we get the following output:

The NULL in the output of the Year column represents the grand-total super-aggregate value. Since this query is able to generate the total sales in each year and also grand-total aggregate sales, however, it has two problems:

It makes the query quite lengthy.
It decreases the query's performance because the database engine internally executes two separate queries and combines the resultant sets into a single output.

To solve these issues, MySQL allows us to use the ROLLUP clause, which provides both levels of analysis in one query. The ROLLUP clause is an extension of the GROUP BY clause that produces another row and shows the grand-total (super-aggregate) values.

Let us see the result after adding a WITH ROLLUP modifier to the GROUP BY clause that shows the grand total over all year values:

SELECT Year, SUM(Sale) AS Total_Sales 
FROM sales 
GROUP BY Year WITH ROLLUP;

When we execute the command, we will get the output as follows:

In this output, we can see the NULL value in the Year column that identifies the supper-aggregate line. It clearly shows that the ROLLUP clause not only generates the subtotals but also gives the grand total of the total sales in all year.

If the GROUP BY clause has more than one column, the ROLLUP modifier has a more complex effect. In this case, the ROLLUP modifier assumes a hierarchy among the columns specified in the GROUP BY clause. Each time there is a change in column value, the query generates an extra super-aggregate summary row at the end of the result.

For example, assumes we have specified the three columns in the GROUP BY clause as below:

The ROLLUP modifier assumes the hierarchy as below:

And generates the following grouping set:

(c1, c2, c3)
(c1, c2)
(c1)
()

See the below query to explain it more clearly:

SELECT Year, Country, Product, SUM(Sale) AS Total_Sales
FROM sales
GROUP BY Year, Country, Product;

Without ROLLUP, a summary of the sales table based on multiple columns specified in the GROUP BY clause look like the below output. Here we will get the summary values only at the Year/Country/Product level of analysis.

With ROLLUP added, the query produces several extra rows:

SELECT Year, Country, Product, SUM(Sale) AS Total_Sales
FROM sales
GROUP BY Year, Country, Product WITH ROLLUP;

See the below output:

The above output generates information at four levels of analysis, which are explained below:

First, each set of product rows for a given year and country generates an extra super-aggregate summary row that shows the total for all products. It will set the product column to NULL.
Next, each set of rows for a given year generates an extra super-aggregate summary row that shows the total for all countries and products. It will set the Country and Products columns to NULL.
Finally, for all other rows, it generates an extra super-aggregate summary row that shows the grand total for all columns. It will set the Year, Country, and Products columns to NULL.

If we change the order of the columns specified in the GROUP BY column, we will get a different result:

SELECT Year, Country, Product, SUM(Sale) AS Total_Sales
FROM sales
GROUP BY Country, Year, Product WITH ROLLUP;

See the following output:

GROUPING() Function

The GROUPING() function is used to check whether NULL in the result set represents the regular grouped value, a super-aggregate value, or grand totals. When NULL occurs in a supper-aggregate row, it returns one. Otherwise, it returns 0.

We can use the GROUPING() function in the select list, HAVING clause, and ORDER BY clause.

See the below query:

SELECT Year, Country, Product, SUM(Sale) AS Total_Sales,
         GROUPING(Year),
         GROUPING(Country),
         GROUPING(Product)
       FROM sales
       GROUP BY Year, Country, Product WITH ROLLUP; 

We will get the following output where the GROUPING(Year) returns one when NULL in the Year column occurs in a super-aggregate row. Otherwise, it will return zero.

Similarly, the GROUPING(Country) returns one when NULL in the Country column occurs in a super-aggregate row. Otherwise, it will return zero.

Also, the GROUPING(Product) returns one when NULL in the Product column occurs in a super-aggregate row. Otherwise, it will return zero.

We can also use the GROUPING() function to substitute the meaningful labels for super-aggregate NULL values instead of displaying it directly.

The below query explains how to combine the IF() function with the GROUPING() function to substitute labels for the super-aggregate NULL values in Year, Country, and Product columns:

SELECT
         IF(GROUPING(Year), 'All years', year) AS year,
         IF(GROUPING(Country), 'All countries', country) AS country,
         IF(GROUPING(Product), 'All products', product) AS product,
         SUM(Sale) AS Total_Sales
       FROM sales
       GROUP BY Year, Country, Product WITH ROLLUP;

We will get the output as below:

If we have multiple arguments in the GROUPING() function, it will return output representing a bitmask that combines the results for each expression. Here, the lowest-order bit produces the result for the right-most argument. The below example will be evaluated like this:

Example: GROUPING (Year, Country, Product)

  result for GROUPING(Product)
+ result for GROUPING(Country) << 1
+ result for GROUPING(Year) << 2

If any arguments have a super-aggregate NULL value, the result of such GROUPING() is non-zero. In this case, it will return only the super-aggregate rows and filter the regular grouped rows using the following query:

mysql> SELECT Year, Country, Product, SUM(Sale) AS Total_Sale
       FROM sales
       GROUP BY Year, Country, Product WITH ROLLUP
       HAVING GROUPING(Year, Country, Product) <> 0;

It will give the below output: