agg() function in Python
Because of the fantastic ecosystem for the data-centric Python packages, Python is considered as one of the great programming languages for doing data analysis. Panda is one of such packages that is provided to us in Python, and it makes it very easy to import and analyze the data.
In this tutorial, we are going to discuss the agg() function given to us in the panda series, and we will use it with series data given to us.
Introduction: Pandas agg() function
We use the agg() function of Pandas to pass a single function or list of functions that are to be applied on a given data series or sometimes even separately to each element of the data series. In the case where we pass a list of functions inside the agg() function, multiple results will be returned by it.
In this section, we are going to look at the syntax and parameters that we have to use inside the agg() method and return type of the function.
The seriesGiven is the data series given to us in the program.
We have to provide the following parameters inside the agg() method.
The return type of agg() method is not fixed, and it always depends upon the return type of the function that we have passed as a parameter inside the agg() method.
Working with agg() function
Till now, we have learned about the introduction and syntax for using agg() function provided to us in Pandas. To learn about and understand the working of the agg() method, we will use this function in below examples.
Passing a single function inside agg() method
In this example, we will create a random array by numpy module, and then we will make it a data series using the Pandas function. After that, we will use the agg() function and pass a lambda function, as an argument inside it, and thus it will add 3 to each value given in the series. As we are applying the function to a series, the return type we get through the agg() function is also series. Now, let's understand this implementation through the following example.
Example 1: Look at the following Python program:
Data series of elements before operation: 0 -0.510111 1 -0.732670 2 -0.451550 3 -0.435085 4 0.082848 5 -1.051242 6 0.203565 7 -1.014079 8 -0.232350 9 -0.325640 10 0.528320 11 -1.472293 12 -0.639487 13 -2.490666 14 -0.242837 15 0.854955 16 1.076247 17 1.491347 18 -1.767788 19 -0.205003 dtype: float64 Data series of elements after operation: 0 2.489889 1 2.267330 2 2.548450 3 2.564915 4 3.082848 5 1.948758 6 3.203565 7 1.985921 8 2.767650 9 2.674360 10 3.528320 11 1.527707 12 2.360513 13 0.509334 14 2.757163 15 3.854955 16 4.076247 17 4.491347 18 1.232212 19 2.794997 dtype: float64
First, we have imported pandas and numpy modules inside our program to use its functions.
Then, we have a created an array of 20 elements that we have randomly generated using the randn() function of the numpy module. After that, we made the array into the series form using the series() function of the panda module.
Then, we have used the agg() function on the series and passed the lambda function as an argument in it. We have passed an argument in the agg() method to add 3 to each value of the series. In last, we printed the data series in the output (both before the operation performed on it and after the operation performed on it).
As we can see in the output, 3 is added to each value of the series after we have performed an operation on it.
Passing a list of functions inside agg() method:
In this example, after creating the data series, we will pass a list of functions as the arguments inside the agg() function instead of passing a single function parameter in it. When we pass a list of Python's default functions as arguments in the agg() method, multiple results are returned by it into multiple variables. Let's understand the implementation of this method through the following example.
Example 2: Look at the following Python program:
Data Series before operation: 0 1.324659 1 -1.632943 2 -0.451046 3 -0.119475 4 -1.476469 5 1.550481 6 -0.345283 7 -0.391220 8 1.183295 9 0.945834 10 0.426908 11 -1.373141 12 -1.360714 13 1.029160 14 -0.305868 15 0.520776 16 0.519891 17 0.581810 18 -0.200537 19 2.175055 dtype: float64 Minimum value in the data series = -1.6329428122607905 Maximum value in the data series = 2.175055294872539, Sorted data series after operation: [-1.6329428122607905, -1.476468968840359, -1.3731412602339488, -1.3607141137838996, -0.45104603430414114, -0.3912204479169106, -0.34528253055365704, -0.3058683242351637, -0.20053665016862435, -0.1194753076622943, 0.4269084920204909, 0.519891496565306, 0.5207757216248261, 0.5818098237803292, 0.9458337130436504, 1.02915996695176, 1.1832945335240084, 1.324659481096391, 1.5504805147479754, 2.175055294872539]
After creating a data series as we did in the previous example, we have created a list where we have multiple functions' names in it. In this example, instead of giving a single function as an argument, we have passed multiple default functions inside the agg() function. After passing these functions as arguments, we have printed the data series before operation and after operation in the output.
As we look at the output, we can see that multiple results have been returned by agg() function. This is because we have passed multiple functions as parameters inside it. The max(), min() and sorted() have been returned into different variables i.e., seriesResult1, seriesResult2 and seriesResult3 respectively.