SciPy StatsThe scipy.stats contains a large number of statistics, probability distributions functions. The list of statistics functions can be obtained by info(stats). A list of a random variable can also be acquired from the docstring for the stat sub-package.
Normal Continuous Random VariableThere are two general distribution classes which have been implemented for encapsulating continuous random variables and discrete random variable. Here we will discuss about the continuous Random Variables: Output: [0.9986501 0.15865525 0.5 0.84134475 0.97724987 0.99996833 0.02275013 0.99999971] In the above program, first, we need to import the norm module from the scipy.stats, then we passed the data as Numpy array in the cdf() function. To get the median of the distribution, we can use the Percent Point Function (PPF), this is the inverse of the CDF. We can generate the sequence of the random numbers; the size argument is necessary to pass the size parameter. Output: [-0.42700905 1.0110461 0.05316053 -0.45002771] The output can vary when we run the program every time. We can use the seed() function to generate the same random numbers. Descriptive StatisticsThe descriptive statistics describe the values of observation in a variable. There are various stats such as Min, Max, and Variance, that take the Numpy array as input and returns the particular results. Some essential functions provide by scipy.stats package are described in the following image.
Let us consider the following program: Output: 0.006283818005153084 -0.03008382588766136 [-2.1865825 2.47537921] DescribeResult(nobs=100, minmax=(-2.1865824992721987, 2.475379209985273), mean=0.006283818005153084, variance=1.0933102537156147, skewness=0.027561719919920322, kurtosis=-0.6958272633471831) T-TestThe t-test is used to compare two averages (means) and tells that if these averages are different from each other. The t-test is also described as significant in the differences between the groups. T-scoreThe t-score is a ratio between two groups and the difference within the groups. The smaller the t-score shows that the groups are relatively similar, and the more significant t-score indicates, the more difference between the groups. Comparing two samples The two samples are given that can come either from the same or from difference distributions and we want to test whether these samples have the same statistical properties. Output: Ttest_1sampResult(statistic=array([0.42271098, 1.1463823 ]), pvalue=array([0.67435547, 0.25720448])) In the above output, a p-value is a probability that the results from your sample data occurred by chance. P-values are from 0% to 100%. SciPy Linear RegressionLinear regression is used to find the relationship between the two variables. The SciPy provides linregress() function to perform linear regression. The syntax is given below: Parameters: x, y: These two parameters should be an array and have the same length. There are two types of linear regression.
Simple Regression Simple linear regression is a method for predicting a response using a single feature. It is assumed that the two variables are linearly related, which means the other variable can accurately predict one variable. For example, using temperature in the degree Celsius, it is correctly predicted in Fahrenheit. Multivariable Regression Multiple linear regression is described as the relationship between one continuous dependent variable and two or more independent variables. The variable price is dependent on the other variables. Next TopicSciPy Sparse Matrix |