Python - Poisson Discrete Distribution in Statistics

One of the fundamental concepts in statistics is the study of random variables and their distributions. This tutorial gives you a thorough understanding of Poisson Discrete Distribution, which is a key component in statistics/probability theory, and finally, learn about its various properties and calculations using Python.

Let us start the discussion by understanding the random variable terms involved:

Random Variable

A random variable is an outcome of a random experiment. It is a numerical quantity whose values belong to the set of possible outcomes of random experiments or events.

For example:

• Coin Toss - Considering we have a fair coin we want to flip, the random variable "X" represents the number of heads obtained when flipping a fair coin three times. The possible values of X are 0, 1, 2, or 3, depending on the number of heads obtained in coin tosses.
• Dice Roll: Let's define a random variable "Y" as the sum of the numbers rolled on two fair six-sided dice. The values of Y can range from 2 (rolling two ones) to 12 (rolling two sixes).

The random variable can be categorized into two types:

1. Discrete Random Variable - can represent only a finite number of values. The above example given are examples of discrete random variables.
2. Continuous Random Variable - can represent any possible value within the range. Example:
• Height of Individuals: Consider a random variable "H" that represents the height of individuals in a population. Heights can take on any value within a certain range (e.g., between 4 feet and 7 feet), so we can say that "H" is a continuous random variable.
• Temperature Value: Let's define a random variable "T" representing the temperature (in degrees Celsius) at a particular location. Temperature can vary continuously within a range, so "T" is a continuous random variable.

Poisson Discrete Distribution

The Poisson distribution is a discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time or space.

A discrete probability distribution is the probability distribution of a random variable that can take on only a countable number of values.

- Wikipedia

It's often used to describe rare and random events, such as the number of phone calls received at a call center in a given hour, the number of accidents at an intersection in a day, or the number of emails received per hour.

Key Characteristics of the Poisson Distribution:

• The events are rare and random.
• The events are independent of each other.
• We assume that the average rate of occurrence is constant across the entire interval.
• The probability of more than one event occurring in an infinitely small interval is negligible.

Probability Mass Function

The probability mass function assigns a probability to each possible value of a random variable.

The probability mass function (PMF) of the Poisson distribution is given by:

Where:

• X is the random variable representing the number of events.
• k represents the number of occurrences (a non-negative integer), (k = 0, 1, 2, …)
• λ (lambda) is the average rate of occurrence.
• e = 2.71828, Euler's number

You might find different symbols representing the pmf formula on different sources, so don't get confused.

Let's explore the Poisson distribution with a couple of examples:

Example 1: Phone Calls at a Call Center

Suppose a call center receives an average of 5 calls per hour. What is the probability of receiving exactly 3 calls in the next hour?

Here, λ = 5 (average rate of calls per hour) and k = 3 (desired number of calls). Putting these values into the Poisson PMF formula:

So, the probability of receiving exactly 3 calls in the next hour is approximately 0.14037, or about 14.04%.

How to Calculate Probabilities using Poisson Distribution in Python?

To calculate the probability using Poisson distribution, we have 'scipy.stats.poisson.pmf' function which is part of SciPy library. This function is used to calculate the probability of observing a specific value "k" from the distribution.

Syntax:

Parameters:

• k - This is the value for which we want to calculate the PMF.
• mu: This is the average rate of occurrence (also known as the lambda parameter) of the Poisson distribution.
• loc - It's an optional parameter and is set to 0 by default. It shifts the distribution by the given value.

In the above example, it is given that the average number of calls = 5 per hour and we need to find the probability of getting exactly three calls in the next hour.

Output:

```Poisson PMF: 0.1403738958142805
```

The above output is equal to what we calculated manually.

Example 2: Accidents at a Crossroad

Suppose a Crossroad experiences an average of 2 accidents per day. What is the probability of having at least 4 accidents and, at most 6 accidents in a day?

Here, λ = 2, and we want to find P(X ≥ 4 and X<=6), which is the sum of probabilities for having 4, 5, and 6 accidents. Calculating each probability and summing them up:

So, the probability of having at least 4 and, at most 6 accidents in a day is approximately 0.1383, or about 13.83%.

Python Code:

Output:

```Poisson PMF: 0.13834273397520408
```

How to Generate Poisson Distribution?

Method 1 - Using NumPy

Let's create a random (1 x 15) distribution with λ = 3.

Here, we started with importing the random method from the NumPy module.

This line returns a list containing 15 random samples from the Poisson distribution. We pass lam=3, meaning the average number of occurrences of an event is 3.

Each number in the array represents the number of events occurring in a fixed interval of time.

When we run the program, we get the output:

```Poisson Distribution: [4 1 7 2 3 4 4 3 3 7 7 5 2 2 0]
```

In this output, each number represents the number of events occurring in a fixed interval, and the distribution reflects the characteristics of the Poisson distribution with an average rate of 3.

Method 2 - Using SciPy

We can use poisson.rvs(mu, size) to generate a Poisson distribution.

Output:

```Poisson Distribution: [8 2 2 4 2 2 3 3 1 3 3 2 0 1 3]
```

How to Plot Poisson Distribution?

To plot the Poisson Distribution, we first need to create a sample. Here, we use scipy.stats.poisson.rvs() method to generate a random sample from the Poisson distribution and matplotlib library to plot a histogram.

Output:

When we run this code, it generates a histogram representing the distribution of 5000 random numbers drawn from a Poisson distribution with a mean of 3. The histogram shows the frequency of occurrence of different values within the specified range, helping us visualize the shape of the Poisson distribution.

Calculating Probabilities of each sample value:

Output:

Explanation:

In the above code, we first created a random sample from the Poisson distribution with mean = 3 and a sample size = 5000.

We create a list `prob_dist` to store the probability of each value from the sample.

We then use a for loop that calculates the probability mass function (PMF) for each value in the array 'x'.

Finally, we display the bar chart, showing the Poisson distribution for the given parameters. The plot represents the probabilities of different numbers of accidents occurring per day, assuming an average rate of 3 accidents per day (λ = 3).

We can also plot the Poisson Distribution with different mean values.

Plot Poisson Distribution with λ = [2, 3, 4, 5]

Output:

Explanation:

In the above code:

• We create a 2x2 grid of subplots using plt.subplots(2, 2, figsize=(10, 8)). This grid layout allows us to have four distinct subplots.
• We use enumerate(λ) to iterate over both the values of λ and the corresponding positions in the subplot grid.
• Inside the loop, we calculate the probability distribution for each value of mu and create the corresponding bar chart in the appropriate subplot.
• We use axs[j] to access the current subplot in the grid.
• We use plt.tight_layout() to ensure that the subplots are properly spaced and do not overlap.

It generates four distinct Poisson distribution subplots for different values of λ.

Cumulative Distribution Function (CDF):

Cumulative Distribution Function (CDF) describes the probability that a random variable takes on a value less than or equal to a specific value.

Mathematically, the Cumulative Distribution Function of a random variable X is defined as:

Where:

• F(x) is the Cumulative Distribution Function (CDF) of the random variable X.
• x is a specific value for which we want to calculate the cumulative probability.
• P(X ≤ x) is the probability that the random variable X takes on a value less than or equal to x.

We can utilize the Poisson CDF function to compute the cumulative probability.

Q. An email server receives an average of 6 emails per hour. What is the probability of receiving fewer than 5 emails in the next hour?

Ans.

Output:

```Probability using PMF P(X < 5) = 0.2850565003166312
Cumulative Probability of X < 5 = 0.2850565003166312
```

In this code:

We set k to 4 because we want to find the probability of receiving fewer than 5 emails, which corresponds to the Poisson random variable X being less than 5.

• In Method 1: We calculated the probability using poisson.pmf() of receiving emails (P(X < 5)) by summing up the probability for X = 0, 1, 2, 3, 4.
• In Method 2: We calculated the CDF using poisson.cdf(k - 1, lambda_value) to receiving up to 4 emails (P(X < 5)).

The final result represents the probability of receiving fewer than 5 emails in the next hour.

Below are some practice problems that you can solve. We encourage you to solve the problem independently before moving towards the given solution.

Q. 1 A factory produces an average of 10 defective items per week. What is the probability of having exactly 8 defective items in a week?

Sol.

Average rate of occurrence, λ = 10

Value for which to calculate pmf, k = 8

Output:

```Poisson PMF: 0.11259903214902009
```

Q. 2 A website experiences an average of 500 visits per hour. What is the probability of having more than 600 visits in a randomly selected hour?

Sol.

Average rate of occurrence, λ = 500

Value for which to calculate pmf, k = 600

Output:

```Poisson PMF: 1.3566714436562893e-06
```

Q. 3 A restaurant serves an average of 15 vegetarian meals per lunchtime. What is the probability of serving exactly 10 vegetarian meals during lunchtime?

Sol.

Average rate of occurrence, λ = 500

Value for which to calculate pmf, k = 600

Output:

```Poisson PMF: 0.04861075082960534
```

Q. A car rental agency rents an average of 4 luxury cars per day. What is the probability of renting fewer than 3 luxury cars on a given day?

Sol.

Average number of cars, λ = 4

Number of cars for which to calculate cdf, k < 3

Output:

```Poisson CDF: 0.23810330555354436
```

Q. 5 A retail store receives an average of 12 customers per hour. What is the probability of having more than 15 customers in the next hour?

Sol.

Average number of Customers = 12

Number of customers for which we want to calculate CDF, k > 15

Output:

```Poisson CDF of attending customers <= 15: 0.7720245323035447
Poisson CDF of attending customers > 15: 0.22797546769645527
```

To be Summarize:

• Random Variable: A random variable is an outcome of a random experiment. It is a numerical quantity whose values belong to the set of possible outcomes of random experiments or events.
• Poisson Discrete Distribution: The Poisson distribution is a discrete probability distribution used to model the probability of a specific number of events occurring in a fixed interval of time or space.
• Probability Mass Function (PMF): The probability mass function assigns a probability to each possible value of a random variable.
• Cumulative Distribution Function (CDF): Cumulative Distribution Function (CDF) describes the probability that a random variable takes on a value less than or equal to a specific value.

Throughout the article, we've provided Python code examples for calculating probabilities using the Poisson PMF and CDF functions. These examples illustrate how to work with Poisson distributions and apply them to real-world scenarios.