15 Statistical Hypothesis Tests in Python

There are hundreds of statistical tests used for testing hypotheses. However, only a handful of them are required for machine learning projects. In this tutorial, we will see some of the most important hypothesis tests that one must know if one wants to work in the fields related to statistical modelling. We will implement these tests in Python programming language.

Every hypothesis test mentioned below contains the following information related to the test:

What is the test called?
What are we checking in the test?
What are the key assumptions for implementing the test?
How to interpret the test results?
How to implement the test in Python?

Note that these assumptions are very important. If the assumptions like the expected distribution of the data sample or the size of the sample required are violated, the results of the test will not be accurate. The interpretation based on these results will be highly unreliable. Hence, keeping these assumptions in check before applying the tests is very important.

Data samples often require to be sufficiently large to reveal how they're distributed for analysis and illustrative of the domain.

In some circumstances, it is possible to adjust the data so that it conforms to the assumptions. To provide just two instances, this may be done by eliminating outliers from a distribution that is almost normal in order to make it more normal or by adjusting the degrees of freedom in a test when the variance of the given data samples is different.

Finally, there could be several tests available for a certain issue, like normalcy. With statistics, we cannot obtain precise solutions to questions; rather, we obtain probabilistic ones. As a result, by thinking about the same subject in several ways, we might come up with various responses. Consequently, many tests may be required to address some data-related queries we may have.

Normality Tests

In this section, we will see the tests that are used to test if the given data sample has Gaussian distribution or not. The assumption that the data follows a Gaussian distribution forms a basic requirement for many statistical modeling techniques. Hence, these tests are very important.

Shapiro-Wilk Test

Hence, this test tests if the given data sample has Gaussian or Normal distribution.

Assumptions

The observations of every sample are independent in nature, and they are identically distributed. The abbreviation of this assumption is IID.

Interpretation

H0: The sample follows a Gaussian distribution

H1: the given sample does not follow a Gaussian distribution.

Code

# Python program to perform the Shapiro-Wilk Normality Test

# Importing the required modules
from scipy.stats import shapiro

# Creating a dataset
data = [0.863, 2.717, 0.221, -0.965, -0.255, -1.476, 0.560, -1.578, -2.637, -1.969]

# Performing the test
stat, p = shapiro(data)
print(f'The statistic value is: {stat}, and the p-value is {p}')

# Checking if the p-value is less than the level of significance 0.05
if p < 0.05:
 print('The data follows a Gaussian distribution')
else:
 print('The data does not follows a Gaussian distribution')

Output:

The statistic value is: 0.9621855020523071, and the p-value is 0.8104783892631531
The data does not follow a Gaussian distribution.

D'Agostino's K^2 Test

This test tests whether the given data sample is Gaussian or not.

Assumptions

The observations of every sample are independent in nature, and they are identically distributed.

Interpretation

H0: The sample follows a Gaussian distribution

H1: the given sample does not follow a Gaussian distribution.

Code

# Python program to perform the D'Agostino's K^2 Normality Test

# Importing the required modules
from scipy.stats import normaltest

# Creating a dataset
data = [0.863, 2.717, 0.221, -0.965, -0.255, -1.476, 0.560, -1.578, -2.637, -1.969]

# Performing the test
stat, p = normaltest(data)
print(f'The statistic value is: {stat}, and the p-value is {p}')

# Checking if the p-value is less than the level of significance 0.05
if p < 0.05:
 print('The data follows a Gaussian distribution')
else:
 print('The data does not follow a Gaussian distribution')

Output:

The statistic value is: 1.0653637027947445, and the p-value is 0.5870285334466323
The data does not follow a Gaussian distribution

Anderson-Darling Test

This test tests whether the given data sample is Gaussian or not.

Assumptions

The observations of every sample are independent in nature, and they are identically distributed.

Interpretation

H0: The sample follows a Gaussian distribution

H1:the given sample does not follow a Gaussian distribution.

Code

# Python program to perform the Anderson-Darling Normality Test

# Importing the required modules
from scipy.stats import anderson

# Creating a dataset
data = [0.863, 2.717, 0.221, -0.965, -0.255, -1.476, 0.560, -1.578, -2.637, -1.969]

# Performing the test
res = anderson(data)
print(f'The statistic value is: {res.statistic}')
print()
# For each significance level, checking if the hypothesis holds or not
for i in range(len(res.critical_values)):
  sig_level, critical_value = res.significance_level[i], res.critical_values[i]
  
  # Comparing each statistic with the critical value corresponding to the ith significance level
  print(f"The critical value at {sig_level}% is {critical_value}")
  if res.statistic > critical_value:
    print(f'The data follow a Gaussian distribution at {sig_level}%')
  else:
    print(f'The data does not follow a Gaussian distribution at {sig_level}%')
  print()

Output:

The statistic value is: 0.20692157645671116

The critical value at 15.0% is 0.501
The data does not follow a Gaussian distribution at 15.0%

The critical value at 10.0% is 0.57
The data does not follow a Gaussian distribution at 10.0%

The critical value at 5.0% is 0.684
The data does not follow a Gaussian distribution at 5.0%

The critical value at 2.5% is 0.798
The data does not follow a Gaussian distribution at 2.5%

The critical value at 1.0% is 0.95
The data does not follow a Gaussian distribution at 1.0%

Correlation Tests

Now we will see the tests which compare the two samples and tell if they are related or not.

Pearson's Correlation Coefficient

This test tests whether the given two data samples have a linear relationship or not.

Assumptions

The observations of every sample are independent in nature, and they are identically distributed.
The observations of every sample follow the normal distribution.
The observations in every sample have the same variance.

Interpretation

H0: the given two samples are not dependent, i.e., they are independent.

H1: there is some sort of dependency between the given samples.

Code

# Python program to implement Pearson's Correlation test

# Importing the required classes
from scipy.stats import pearsonr

# Creating two random data samples
sample1 = [0.843, 3.817, 0.221, -0.445, -0.455, -1.236, 0.660, -1.428, -1.337, -1.769]
sample2 = [0.363, 3.317, 0.165, -7.525, -0.565, -1.546, 3.450, -1.558, -3.577, -1.279]

# Implementing the Pearson Test
stat, p = pearsonr(sample1, sample2)
print(f'The statistic value is: {stat}, and the p-value is {p}')

# Checking if the p-value is less than the level of significance 0.05
if p < 0.05:
 print('Both data samples are dependent on each other')
else:
 print('The data samples are independent of each other')

Output:

The statistic value is: 0.6135196215696078, and the p-value is 0.05922727627191346
The data samples are independent of each other

Spearman's Rank Correlation

This is a step ahead of the Pearson test. It tests if the given samples have a monotonic relationship. The relationship can be linear or non-linear.

Assumptions

The observations of every sample are independent in nature, and they are identically distributed.
The observations of both samples are ranked.

Interpretation

H0: the given two samples are not dependent, i.e., they are independent.

H1: there is some sort of dependency between the given samples.

Code

# Python program to implement Spearman's Rank Correlation test

# Importing the required classes
from scipy.stats import spearmanr

# Creating two random data samples
sample1 = [0.843, 3.817, 0.221, -0.445, -0.455, -1.236, 0.660, -1.428, -1.337, -1.769]
sample2 = [0.363, 3.317, 0.165, -7.525, -0.565, -1.546, 3.450, -1.558, -3.577, -1.279]

# Implementing the Spearman's Test
stat, p = spearmanr(sample1, sample2)
print(f'The statistic value is: {stat}, and the p-value is {p}')

# Checking if the p-value is less than the level of significance 0.05
if p < 0.05:
 print('Both data samples are dependent on each other')
else:
 print('The data samples are independent of each other')

Output:

The statistic value is: 0.6969696969696969, and the p-value is 0.02509667588225183
Both data samples are dependent on each other

Kendall's Rank Correlation

This is a step ahead of the Pearson test. It tests if the given samples have a monotonic relationship.

Assumptions

The observations of every sample are independent in nature, and they are identically distributed.
The observations of both samples are ranked.

Interpretation

H0: the given two samples are not dependent.

H1: there is some sort of dependency between the given samples.

Code

# Python program to implement the Kendall's Rank Correlation test

# Importing the required classes
from scipy.stats import kendalltau

# Creating two random data samples
sample1 = [0.843, 3.817, 0.221, -0.445, -0.455, -1.236, 0.660, -1.428, -1.337, -1.769]
sample2 = [0.363, 3.317, 0.165, -7.525, -0.565, -1.546, 3.450, -1.558, -3.577, -1.279]

# Implementing the Kendall rank Test
stat, p = kendalltau(sample1, sample2)
print(f'The statistic value is: {stat}, and the p-value is {p}')

# Checking if the p-value is less than the level of significance 0.05
if p < 0.05:
 print('Both data samples are dependent on each other')
else:
 print('The data samples are independent of each other')

Output:

The statistic value is: 0.5111111111111111, and the p-value is 0.04662257495590829
Both data samples are dependent on each other

Chi-Squared Test

Pearson's test can only be used with numerical values. Spearman's and Kendall's rank correlation tests can be used for ordinal data. Ordinal data is categorical data that have a certain order. But for nominal data (categorical data with no order), these tests cannot be used. To test the dependency or the relationship between the nominal data, we use the Chi-Squared test.

Assumptions

The observations which will be used for the calculation of the contingency table should be independent.
Each cell of the contingency table should contain more than 25 observations.

Interpretation

H0: the given two samples are not dependent.

H1: there is some sort of dependency between the given samples.

Code

# Python program to implement the Chi-Squared test

# Importing the required classes
from scipy.stats import chi2_contingency

# Creating a sample observations table
table = [[30, 25, 34, 26, 31],[25, 29, 31, 34, 32]]

# Implementing the Chi-squared Test
stat, p, dof, expected_freq= chi2_contingency(table)

print("Expected Frequencies are", expected_freq)
print(f'The statistic value is: {stat}, and the p-value is {p}')

# Checking if the p-value is less than the level of significance 0.05
if p < 0.05:
 print('The data samples are dependent on each other')
else:
 print('The data samples are independent of each other')

Output:

Expected Frequencies are [[27.03703704 26.54545455 31.95286195 29.49494949 30.96969697]
 [27.96296296 27.45454545 33.04713805 30.50505051 32.03030303]]
The statistic value is: 1.8882030380034551, and the p-value is 0.7563117707680647
The data samples are independent of each other

Stationary Tests

Time series is a very important topic. The models performed on time series require the time series data to be stationary. Therefore, to apply any model, we need to first check if the time series data is stationary or not. Now we will see tests to check the stationarity of the data.

Augmented Dickey-Fuller Unit Root Test

Through this test, we check whether the given time series data has a unit modulus root. Or, in more technical terms, is the data autoregressive or not? The autoregressive time series is stationary. If the time series has a unit modulus root, then it is not stationary.

Assumptions

The observations should be in a temporal order.

Interpretation

H0: the time series has a unit root (the series is not stationary).

H1: The unit modulus root is not present (the series is stationary).

Code

# Python program to implement the Augmented Dickey-Fuller unit root test

# Importing the required classes
from statsmodels.tsa.stattools import adfuller

# Creating a time series data
time_series = [2, 4, -1, -2, 5, 8, -4, -9, 9, 10]

# Implementing the Augmented Dickey-Fuller unit root test
stat, p, lag, o, c, t = adfuller(time_series)

print("The order of the autoregressive model is", lag)
print(f'The statistic value is: {stat}, and the p-value is {p}')

# Checking if the p-value is less than the level of significance 0.05
if p < 0.05:
 print('The given time series is stationary')
else:
 print('The given time series is not stationary')

Output:

The order of the autoregressive model is 1
The statistic value is: -10.232070586545865, and the p-value is 4.998574442108246e-18
The given time series is stationary

Kwiatkowski-Phillips-Schmidt-Shin

This test tests if the given time series has a stationary trend or not. If the series is trend-stationary, then that means the series is deterministic.

Assumptions

The observations should be in temporal order.

Interpretation

H0: the given time series has a stationary trend.

H1: the given time series does not have a stationary trend.

Code

# Python program to implement the Kwiatkowski Phillips Schmidt Shin test

# Importing the required classes
from statsmodels.tsa.stattools import kpss

# Creating a time series data
time_series = [2, 4, -1, -2, 5, 8, -4, -9, 9, 10]

# Implementing the Kwiatkowski Phillips Schmidt Shin Test
stat, p, lag, c= kpss(time_series)

print("The order of the autoregressive model is", lag)
print(f'The statistic value is: {stat}, and the p-value is {p}')

# Checking if the p-value is less than the level of significance 0.05
if p < 0.05:
 print('The given time series is stationary')
else:
 print('The given time series is not stationary')

Output:

The order of the autoregressive model is 0
The statistic value is: 0.09930151338766009, and the p-value is 0.1
The given time series is not stationary

Parametric Statistical Hypothesis Tests

Now we will see the parametric tests. In these tests, we test if a certain parameter of one or more samples is equal to or different from a value or from each other.

Student's t-test

In this test, the parameter is the mean of the given samples. We check if the means of the two samples are independent on, in other words, significantly different from each other.

Assumptions

The observations of every sample are independent in nature, and they are identically distributed.
The observations of both samples follow the normal distribution.
The observations of both the samples have the same variance.

Interpretation

H0: the mean values of the given samples are equal.

H1: the mean values of the given samples are not equal.

Code

# Python program to implement the Student's t-test

# Importing the required classes
from scipy.stats import ttest_ind

# Creating two random data samples
sample1 = [0.843, 3.817, 0.221, -0.445, -0.455, -1.236, 0.660, -1.428, -1.337, -1.769]
sample2 = [0.363, 3.317, 0.165, -7.525, -0.565, -1.546, 3.450, -1.558, -3.577, -1.279]

# Implementing the Kwiatkowski Phillips Schmidt Shin Test
stat, p = ttest_ind(sample1, sample2)

print(f'The statistic value is: {stat}, and the p-value is {p}')

# Checking if the p-value is less than the level of significance 0.05
if p < 0.05:
 print('The given samples have unequal mean values')
else:
 print('The given samples have equal mean values')

Output:

The statistic value is: 0.6713796580759667, and the p-value is 0.5105037120903526
The given samples have equal mean values

Paired Student's t-test

In this test also, the parameter is mean. However, this test is used when the two samples are paired. Two samples are said to be paired if both values are observed using the same sample before and after a certain treatment.

Assumptions

The observations of every sample are independent in nature, and they are identically distributed.
The observations of both samples follow the normal distribution.
The observations of both samples have the same variance.
The observations are paired for each sample.

Interpretation

H0: the mean values of the paired samples are equal.

H1: the mean values of the paired samples are not equal.

Code

# Python program to implement the Paired Student's t-test

# Importing the required classes
from scipy.stats import ttest_rel

# Creating two random data samples
sample1 = [0.843, 3.817, 0.221, -0.445, -0.455, -1.236, 0.660, -1.428, -1.337, -1.769]
sample2 = [0.363, 3.317, 0.165, -7.525, -0.565, -1.546, 3.450, -1.558, -3.577, -1.279]

# Implementing the Paired Student's t-test
stat, p = ttest_rel(sample1, sample2)

print(f'The statistic value is: {stat}, and the p-value is {p}')

# Checking if the p-value is less than the level of significance 0.05
if p < 0.05:
 print('The paired samples have unequal mean values')
else:
 print('The paired samples have equal mean values')

Output:

The statistic value is: 0.9502747511161275, and the p-value is 0.36679175997294733
The paired samples have equal mean values

Analysis of Variance Test (ANOVA)

In this test, we use variance to determine if two or more samples are different from each other or the same.

Assumptions

The observations of every sample are independent in nature, and they are identically distributed.
The observations of both samples follow the normal distribution.
The observations of both samples have the same variance.

Interpretation

H0: the mean values of the given samples are equal.

H1: the given one or more than one mean values of the given multiple samples are not equal.

Code

# Python program to implement the Analysis of Variance Test

# Importing the required classes
from scipy.stats import f_oneway

# Creating two random data samples
sample1 = [0.843, 3.817, 0.221, -0.445, -0.455, -1.236, 0.660, -1.428, -1.337, -1.769]
sample2 = [0.363, 3.317, 0.165, -7.525, -0.565, -1.546, 3.450, -1.558, -3.577, -1.279]
sample3 = [-0.308, 0.656, 0.918, -2.148, -0.413, 0.329, 0.157, 0.369, -0.850, -1.304]

# Implementing the Analysis of the Variance Test
stat, p = f_oneway(sample1, sample2, sample3)

print(f'The statistic value is: {stat}, and the p-value is {p}')

# Checking if the p-value is less than the level of significance 0.05
if p < 0.05:
 print('The samples have unequal mean values')
else:
 print('The samples have equal mean values')

Output:

The statistic value is: 0.3557581063875854, and the p-value is 0.7038772383760818
The samples have equal mean values

Nonparametric Statistical Hypothesis Tests

Mann-Whitney U Test

This test will test if the samples taken from two independent population data are equal or not.

Assumptions

The observations of every sample are independent in nature, and they are identically distributed.
The observations of both samples are ranked.

Interpretation

H0: the distributions underlying the independent samples are equal.

H1: the distributions underlying the independent samples are not equal.

Code

# Python program to implement the Mann-Whitney U Test

# Importing the required classes
from scipy.stats import mannwhitneyu

# Creating two random data samples
sample1 = [0.843, 3.817, 0.221, -0.445, -0.455, -1.236, 0.660, -1.428, -1.337, -1.769]
sample2 = [0.363, 3.317, 0.165, -7.525, -0.565, -1.546, 3.450, -1.558, -3.577, -1.279]

# Implementing the Mann-Whitney U Test
stat, p = mannwhitneyu(sample1, sample2)

print(f'The statistic value is: {stat}, and the p-value is {p}')

# Checking if the p-value is less than the level of significance 0.05
if p < 0.05:
 print('The samples have different distributions')
else:
 print('The samples have same distributions')

Output:

The statistic value is: 60.0, and the p-value is 0.47267559351158717
The samples have the same distributions

Wilcoxon Signed-Rank Test

This test tests if the distributions of the given two or more paired observation samples are equal or not.

Assumptions

The observations of every sample are independent in nature, and they are identically distributed.
The observations of both samples are ranked.
Observations of each sample are paired.

Interpretation

H0: the distributions underlying the independent samples are equal.

H1: the distributions underlying the independent samples are not equal.

Code

# Python program to implement the Wilcoxon Signed-Rank Test

# Importing the required classes
from scipy.stats import wilcoxon

# Creating two random data samples
sample1 = [0.843, 3.817, 0.221, -0.445, -0.455, -1.236, 0.660, -1.428, -1.337, -1.769]
sample2 = [0.363, 3.317, 0.165, -7.525, -0.565, -1.546, 3.450, -1.558, -3.577, -1.279]

# Implementing the Wilcoxon Signed-Rank Test
stat, p = wilcoxon(sample1, sample2)

print(f'The statistic value is: {stat}, and the p-value is {p}')

# Checking if the p-value is less than the level of significance 0.05
if p < 0.05:
 print('The samples have different distributions')
else:
 print('The samples have the same distributions')

Output:

The statistic value is: 15.0, and the p-value is 0.232421875
The samples have the same distributions

Kruskal-Wallis H Test

This test tests if the distributions of the given two or more observation samples are equal or not.

Assumptions

The observations of every sample are independent in nature, and they are identically distributed.
The observations of both samples are ranked.

Interpretation

H0: the distributions underlying the independent samples are equal.

H1: the distributions underlying the independent samples are not equal.

Code

# Python program to implement the Kruskal-Wallis H Test

# Importing the required classes
from scipy.stats import kruskal

# Creating two random data samples
sample1 = [0.843, 3.817, 0.221, -0.445, -0.455, -1.236, 0.660, -1.428, -1.337, -1.769]
sample2 = [0.363, 3.317, 0.165, -7.525, -0.565, -1.546, 3.450, -1.558, -3.577, -1.279]

# Implementing the Kruskal-Wallis H Test
stat, p = kruskal(sample1, sample2)

print(f'The statistic value is: {stat}, and the p-value is {p}')

# Checking if the p-value is less than the level of significance 0.05
if p < 0.05:
 print('The samples have different distributions')
else:
 print('The samples have the same distributions')

Output:

The statistic value is: 0.5714285714285694, and the p-value is 0.4496917979688917
The samples have the same distributions

Friedman Test

This test tests if the distributions of the given two or more paired observation samples are equal or not.

Assumptions

The observations of every sample are independent in nature, and they are identically distributed.
The observations of both samples are ranked.
Observations of each sample are paired.

Interpretation

H0: the distributions underlying the independent samples are equal.

H1: the distributions underlying the independent samples are not equal.

Code

# Python program to implement the Friedman Test

# Importing the required classes
from scipy.stats import friedmanchisquare

# Creating two random data samples
sample1 = [0.843, 3.817, 0.221, -0.445, -0.455, -1.236, 0.660, -1.428, -1.337, -1.769]
sample2 = [0.363, 3.317, 0.165, -7.525, -0.565, -1.546, 3.450, -1.558, -3.577, -1.279]
sample3 = [-0.308, 0.656, 0.918, -2.148, -0.413, 0.329, 0.157, 0.369, -0.850, -1.304]

# Implementing the Friedman Test
stat, p = friedmanchisquare(sample1, sample2, sample3)

print(f'The statistic value is: {stat}, and the p-value is {p}')

# Checking if the p-value is less than the level of significance 0.05
if p < 0.05:
 print('The samples have different distributions')
else:
 print('The samples have the same distributions')

Output:

The statistic value is: 2.4000000000000057, and the p-value is 0.3011942119122012
The samples have the same distributions

Summary

You learned about the primary hypothesis tests in this tutorial that you may apply in a machine learning project.

In particular, you discovered:

The many test types to employ depending on the situation, including normality checks, correlations between variables, and the paired natures of the sample.
The main presumptions underlying each test, as well as how to evaluate the results.
How to use the Python API for executing the test?

Next TopicClone the Linked List with Random and Next Pointer in Python

← prev next →