Python Pandas Data operations

In Pandas, there are different useful data operations for DataFrame, which are as follows :

Row and column selection

We can select any row and column of the DataFrame by passing the name of the rows and column. When you select it from the DataFrame, it becomes one-dimensional and considered as Series.

Filter Data

We can filter the data by providing some of the boolean expression in DataFrame.

Note: If we want to pass the boolean results into a DataFrame, then it shows all the results.

Null values

A Null value can occur when no data is being provided to the items. The various columns may contain no values which are usually represented as NaN. In Pandas, several useful functions are available for detecting, removing, and replacing the null values in Data Frame. These functions are as follows:

isnull(): The main task of isnull() is to return the true value if any row has null values.

notnull(): It is opposite of isnull() function and it returns true values for not null value.

dropna(): This method analyzes and drops the rows/columns of null values.

fillna(): It allows the user to replace the NaN values with some other values.

replace(): It is a very rich function that replaces a string, regex, series, dictionary, etc.

interpolate(): It is a very powerful function that fills null values in the DataFrame or series.

String operation

A set of a string function is available in Pandas to operate on string data and ignore the missing/NaN values. There are different string operation that can be performed using .str. option. These functions are as follows:

lower(): It converts any strings of the series or index into lowercase letters.

upper(): It converts any string of the series or index into uppercase letters.

strip(): This function helps to strip the whitespaces including a new line from each string in the Series/index.

split(' '): It is a function that splits the string with the given pattern.

cat(sep=' '): It concatenates series/index elements with a given separator.

contains(pattern): It returns True if a substring is present in the element, else False.

replace(a,b): It replaces the value a with the value b.

repeat(value): It repeats each element with a specified number of times.

count(pattern): It returns the count of the appearance of a pattern in each element.

startswith(pattern): It returns True if all the elements in the series starts with a pattern.

endswith(pattern): It returns True if all the elements in the series ends with a pattern.

find(pattern): It is used to return the first occurrence of the pattern.

findall(pattern): It returns a list of all the occurrence of the pattern.

swapcase: It is used to swap the case lower/upper.

islower(): It returns True if all the characters in the string of the Series/Index are in lowercase. Otherwise, it returns False.

isupper(): It returns True if all the characters in the string of the Series/Index are in uppercase. Otherwise, it returns False.

isnumeric(): It returns True if all the characters in the string of the Series/Index are numeric. Otherwise, it returns False.

Count Values

This operation is used to count the total number of occurrences using 'value_counts()' option.

Plots

Pandas plots the graph with the matplotlib library. The .plot() method allows you to plot the graph of your data.

.plot() function plots index against every column.

You can also pass the arguments into the plot() function to draw a specific column.


Next TopicData Processing




Latest Courses