# Python - Discrete Hyper-geometric Distribution in Statistics

In data analysis and decision-making, statistics is essential, offering insights into the complex world of uncertainty and variability. From predicting stock market trends to understanding genetic traits, statistical distributions are the building blocks of modeling and analysis. One such distribution that plays a crucial role in scenarios involving finite populations and non-replacement sampling is the Discrete Hypergeometric Distribution. In this article, we embark on a journey into probability and statistics, delving deep into the intricacies of Discrete Hypergeometric Distribution and unveiling Python's scipy.stats module empowers us to harness its power for practical applications. By the end of this exploration, you'll have a solid grasp of this distribution's mechanics and relevance in real-world situations, equipping you with a powerful tool to make well-informed decisions in the face of uncertainty.

## What is Discrete Hyper - geometric Distribution in Statistics

The Discrete Hypergeometric Distribution is a statistical probability distribution to model situations where you are sampling without replacement from a finite population. It describes the probability of getting a certain number of successes in a fixed number of draws from a population containing a known number of successes and failures.

### Key Parameters:

• Population Size (N): It is total number of individuals or items.
• Number of Successes in Population (M): The count of successful individuals or items.
• Number of Draws (n): The total number of draws or samples taken from the population.
• Number of Desired Successes (k): The specific number of successful individuals or items you want to obtain.

### Working

Probability Mass Function (PMF):

The PMF of the Discrete Hypergeometric Distribution calculates the probability of getting exactly 'k' successes in 'n' draws from the population. Mathematically, it is represented as:

Where:

• (M choose k) There are many ways to choose 'k' successes from the 'M' successful items.
• ((N - M) choose (n - k)) is the number of ways to choose the remaining 'n - k' items from the 'N - M' non-successful items.
• (N choose n) is the total number of ways to choose 'n' items from the 'N' items in the population.

Cumulative Distribution Function (CDF):

The CDF of the Discrete Hypergeometric Distribution calculates the probability of getting up to 'k' successes in 'n' draws. It is the sum of the individual probabilities from '0' to 'k':

Steps to Calculate Probabilities:

• Calculate (M choose k), ((N - M) choose (n - k)), and (N choose n) using combinatorial formulas or functions (e.g., scipy.special.comb() in Python's scipy library).
• Plug these values into the PMF formula to calculate the probability of getting exactly 'k' successes.
• For the CDF, sum up the probabilities of getting up to 'k' successes by iterating from '0' to 'k'.

The distribution is "discrete" because it deals with individual counts, not continuous values.

Unlike other distributions, like the Binomial Distribution, which assumes replacement after each draw, the Discrete Hypergeometric Distribution accounts for the probability of success changes with each draw due to the non-replacement sampling.

The Probability Mass Function (PMF) of the Discrete Hypergeometric Distribution gives the probability of getting exactly 'k' successes in 'n' draw. The Cumulative Distribution Function (CDF) provides the probability of getting up to 'k' successes.

## Real-Life Applications

The Discrete Hypergeometric Distribution finds application in various real-world scenarios where we are interested in modeling situations involving finite populations and sampling without replacement. Let's explore some of its detailed applications:

Quality Control in Manufacturing:

In manufacturing processes, ensuring product quality is paramount. The Hypergeometric Distribution can be employed to evaluate the quality of a production lot. A certain number of items are selected from the lot without replacement, and the distribution helps assess the likelihood of obtaining a specific number of defective items. This information aids in deciding whether the lot meets quality standards or requires further inspection.

Genetics and Population Studies:

Genetic studies often involve analyzing the traits or alleles present in a population. When studying genetic traits, researchers may select a sample of individuals without replacement to understand the distribution of specific traits. The Discrete Hypergeometric Distribution is applicable here, considering the changing probability of success (presence of a specific allele) with each draw. This distribution assists in estimating the likelihood of observing a certain number of individuals with a particular trait in the sample.

Audit Sampling:

Auditors use statistical sampling techniques to assess the accuracy and reliability of financial records. The Hypergeometric Distribution can be employed in audit sampling scenarios where a subset of financial transactions is selected without replacement for examination. By applying the distribution, auditors can estimate the probability of finding a certain number of irregular or fraudulent transactions in the sample, helping them identify potential issues and allocate resources effectively.

Ecology and Environmental Studies:

In ecological studies, researchers often survey species abundance in a particular area. The Discrete Hypergeometric Distribution can be used when sampling a specific number of individuals from a population of organisms to determine the probability of observing a certain number of individuals with specific characteristics (e.g., gender, size, or behavior). This aids ecologists in concluding the population dynamics and biodiversity of the ecosystem.

Lottery and Gaming Analysis:

The Hypergeometric Distribution can even be applied to analyze certain aspects of lotteries, games of chance, and gambling. For example, consider a scenario where a specific number of winning tickets are hidden within a larger pool of tickets, and a player purchases a subset of them. The distribution can estimate the likelihood of the player obtaining a particular number of winning tickets in their purchase.

Medical Testing and Clinical Trials:

In medical testing, researchers might be interested in assessing the effectiveness of a treatment on a specific subset of patients. The Hypergeometric Distribution can be used to model the probability of observing a certain number of positive outcomes (successes) in a clinical trial where a limited number of patients are selected without replacement.

Market Research and Surveys:

When conducting market research or surveys, researchers often aim to estimate the prevalence of certain characteristics within a target population. The Discrete Hypergeometric Distribution can help calculate the probability of obtaining a certain number of respondents with specific attributes in a non-replacement sampling scenario, assisting analysts in making inferences about the entire population.

Inventory Management:

Businesses often face decisions related to inventory management, such as restocking items in a store. The Hypergeometric Distribution can be utilized to analyze the probability of selecting a certain number of items with specific features (e.g., defective items) when restocking from a finite supply.

Sports Analytics:

In sports analytics, the Hypergeometric Distribution can be applied to assess the likelihood of specific outcomes. For instance, when predicting the number of successful shots a basketball player makes in a game, the distribution can help account for the limited number of attempts and the changing probability of success with each shot.

Social Sciences and Demographics:

Researchers studying social phenomena or demographic trends might use the Hypergeometric Distribution to analyze a population subset and estimate the likelihood of observing a certain number of individuals with specific characteristics or behaviors.

Ecotoxicology and Environmental Risk Assessment:

When assessing the impact of pollutants on ecosystems, scientists may sample organisms from polluted areas to estimate the prevalence of certain traits or diseases. The Hypergeometric Distribution aids in determining the probability of encountering a particular number of affected organisms in the sample.

Forensic Science:

Forensic scientists use statistical techniques to analyze evidence and draw conclusions in criminal investigations. The Discrete Hypergeometric Distribution can be employed when selecting items for forensic analysis to estimate the probability of finding a specific number of items with relevant characteristics.

In each of these applications, the Discrete Hypergeometric Distribution is a powerful tool for analyzing situations involving non-replacement sampling from finite populations. Its ability to capture the changing probability of success with each draw makes it particularly useful in scenarios where the population size is relatively small compared to the sample size. By applying this distribution and utilizing Python's statistical libraries, professionals from various fields can gain insights, make informed decisions, and draw meaningful conclusions from their data.

In all these applications, the Discrete Hypergeometric Distribution is a valuable tool for quantifying probabilities and making informed decisions based on non-replacement sampling from finite populations. Python's scipy.stats library simplifies the computational aspects, allowing practitioners to focus on the insights and implications of the analysis. By understanding and leveraging this distribution, professionals across various fields can enhance their decision-making processes and gain deeper insights into the dynamics of real-world scenarios.

## Python Implementation

### Creating a Hypergeometric Discrete Random Variable:

In probability theory and statistics, a random variable is a variable whose value is uncertain and determined by the outcome of a random experiment. A hypergeometric discrete random variable is a type of random variable that follows the Discrete Hypergeometric Distribution. It represents the count of successes obtained when drawing a specific number of items from a finite population without replacement.

To create a hypergeometric discrete random variable, you need to define the parameters of the distribution: the population size (N), the number of successes in the population (M), the number of draws (n), and the number of desired successes (k). Once these parameters are defined, you can use the distribution to generate random variates representing the count of successes in your draws.

### Hypergeometric Discrete Variates and Probability Distribution:

A hypergeometric discrete random variable generates hypergeometric discrete variates. These variates represent the number of successes obtained in a sample drawn from a population without replacement. Each time you generate a random variate, you simulate drawing items from the population and count the number of successes in those draws.

The probability distribution of the hypergeometric discrete random variable describes the likelihood of observing each possible count of successes in your draws. This distribution is defined by the hypergeometric probability mass function (PMF). For each possible value of 'k' (the count of successes), the PMF calculates the probability of obtaining exactly 'k' successes in 'n' draws.

### Graphical Representation:

Graphical depiction aids in visualising the probability distribution and gaining understanding into the random variable's behaviour. In the case of the Discrete Hypergeometric Distribution, you can create a probability mass function (PMF) plot to visualize the probabilities of different counts of successes.

To create a PMF plot, you plot the possible values of 'k' on the x-axis and the corresponding probabilities on the y-axis. This plot displays a series of bars or points, where the height of each bar or point represents the probability of getting that specific count of successes.

Creating and visualizing the PMF graph helps you understand the distribution's characteristics, such as the most likely number of successes and the spread of probabilities across different outcomes. It's a valuable tool for interpreting the implications of the Discrete Hypergeometric Distribution in practical scenarios.

Output:

```Generated samples: [3 3 3 4 3 5 4 4 3 3 5 5 4 4 4 4 5 4 4 5]
```

#### Note: Output of the above code will vary each time you Run it:

This code demonstrates the process of creating a hypergeometric discrete random variable, generating variates, calculating the probability distribution, and creating a bar plot to visualize the distribution. Remember to have the scipy and matplotlib libraries installed (pip install scipy matplotlib).

## Conclusion

The Discrete Hypergeometric Distribution is a statistical tool that models the probability of obtaining a certain number of successes when drawing from a finite population without replacement. It considers changing success probabilities with each draw and finds applications in many fields, including manufacturing, genetics, and auditing.

Using Python's scipy.stats module, we can create a hypergeometric random variable, generate variates, and calculate the probability mass function (PMF) for different outcomes. Visualizing the PMF through bar plots enhances our understanding of the distribution's behavior.

In essence, the Discrete Hypergeometric Distribution empowers us to analyze scenarios involving finite populations and non-replacement sampling. It offers insights into the likelihood of achieving desired outcomes, aiding decision-making and data analysis across diverse domains. By grasping its concepts and utilizing Python's capabilities, we equip ourselves with a versatile tool for making informed decisions and extracting valuable insights from real-world data.