SciPy Stats

The scipy.stats contains a large number of statistics, probability distributions functions. The list of statistics functions can be obtained by info(stats). A list of a random variable can also be acquired from the docstring for the stat sub-package.

Sr.	Function	Description
1.	rv_continuos	It is a base class to construct specific distribution classes and instances for continuous random variable.
2.	rv_discrete	It is a base class to construct specific distribution classes and instances for discrete random variables.
3.	rv_histogram	It can be inherited from rv_continuous class. It generates a distribution given by a histogram.

Normal Continuous Random Variable

There are two general distribution classes which have been implemented for encapsulating continuous random variables and discrete random variable. Here we will discuss about the continuous Random Variables:

from scipy.stats import norm
import numpy as np
print(norm.cdf(np.array([3,-1., 0, 1, 2, 4, -2, 5])))

Output:

[0.9986501  0.15865525 0.5        0.84134475 0.97724987 0.99996833
 0.02275013 0.99999971]

In the above program, first, we need to import the norm module from the scipy.stats, then we passed the data as Numpy array in the cdf() function.

To get the median of the distribution, we can use the Percent Point Function (PPF), this is the inverse of the CDF.

We can generate the sequence of the random numbers; the size argument is necessary to pass the size parameter.

from scipy.stats import norm
print(norm.rvs(size = 4))

Output:

[-0.42700905  1.0110461   0.05316053 -0.45002771]

The output can vary when we run the program every time. We can use the seed() function to generate the same random numbers.

Descriptive Statistics

The descriptive statistics describe the values of observation in a variable. There are various stats such as Min, Max, and Variance, that take the Numpy array as input and returns the particular results. Some essential functions provide by scipy.stats package are described in the following image.

Sr.	Function	Description
1.	describe()	Computes various descriptive statistics of the input array.
2.	gmean()	Computes geometric mean along with the specified.
3.	hmean()	Calculates the harmonic mean along the specified axis.
4.	kurtosis()	Computes the Kurtosis.
5.	mode()	Returns the mode value.
6.	skew()	Tests the skewness of the data
7.	zscore()	It calculates the z score of each value in the sample, relative to the sample mean and standard deviation.

Let us consider the following program:

import scipy as sp
import numpy as np
from scipy.stats import norm
number_of_data = 100
random_data_set = sp.randn(number_of_data)
print(random_data_set.mean())
print(sp.median(random_data_set))
min_max = np.array([random_data_set.min(),random_data_set.max()])
print(min_max)
sp.stats.describe(random_data_set)

Output:

0.006283818005153084
-0.03008382588766136
[-2.1865825   2.47537921]

DescribeResult(nobs=100, minmax=(-2.1865824992721987, 2.475379209985273), mean=0.006283818005153084, variance=1.0933102537156147, skewness=0.027561719919920322, kurtosis=-0.6958272633471831)

T-Test

The t-test is used to compare two averages (means) and tells that if these averages are different from each other. The t-test is also described as significant in the differences between the groups.

T-score

The t-score is a ratio between two groups and the difference within the groups. The smaller the t-score shows that the groups are relatively similar, and the more significant t-score indicates, the more difference between the groups.

Comparing two samples

The two samples are given that can come either from the same or from difference distributions and we want to test whether these samples have the same statistical properties.

from scipy import stats
rvs = stats.norm.rvs(loc = 6, scale = 10, size = (50,2))
print(stats.ttest_1samp(rvs,5.0))

Output:

Ttest_1sampResult(statistic=array([0.42271098, 1.1463823 ]), pvalue=array([0.67435547, 0.25720448]))

In the above output, a p-value is a probability that the results from your sample data occurred by chance. P-values are from 0% to 100%.

SciPy Linear Regression

Linear regression is used to find the relationship between the two variables. The SciPy provides linregress() function to perform linear regression. The syntax is given below:

Parameters:

x, y: These two parameters should be an array and have the same length.

There are two types of linear regression.

Simple regression
Multivariable regression

Simple Regression

Simple linear regression is a method for predicting a response using a single feature. It is assumed that the two variables are linearly related, which means the other variable can accurately predict one variable. For example, using temperature in the degree Celsius, it is correctly predicted in Fahrenheit.

Multivariable Regression

Multiple linear regression is described as the relationship between one continuous dependent variable and two or more independent variables.

price(dependent variable) = m1*area + m2*bedrooms + m2*age(independent variable)

The variable price is dependent on the other variables.

Next TopicSciPy Sparse Matrix

← prev next →

For Videos Join Our Youtube Channel: Join Now

Feedback

Send your Feedback to [email protected]

Help Others, Please Share

Learn Latest Tutorials

Splunk

SPSS

Swagger

Transact-SQL

Tumblr

ReactJS

Regex

Reinforcement Learning

R Programming

RxJS

React Native

Python Design Patterns

Python Pillow

Python Turtle

Keras

Preparation

Aptitude

Reasoning

Verbal Ability

Interview Questions

Company Questions

Trending Technologies

Artificial Intelligence

AWS

Selenium

Cloud Computing

Hadoop

ReactJS

Data Science

Angular 7

Blockchain

Git

Machine Learning

DevOps

B.Tech / MCA

DBMS

Data Structures

DAA

Operating System

Computer Network

Compiler Design

Computer Organization

Discrete Mathematics

Ethical Hacking

Computer Graphics

Software Engineering

Web Technology

Cyber Security

Automata

C Programming

C++

Java

.Net

Python

Programs

Control System

Data Mining

Data Warehouse

^{Like/Subscribe us for latest updates or newsletter}

Python SciPy Tutorial

SciPy Stats

Normal Continuous Random Variable

Descriptive Statistics

T-Test

T-score

SciPy Linear Regression

Feedback

Help Others, Please Share

Learn Latest Tutorials

Preparation

Trending Technologies

B.Tech / MCA