Generalized Linear Models
Introduction:
With the ability to accommodate various data distribution and link functions, Generalised Linear Models (GLMs) offer a robust statistical framework for modelling interactions between responses and predictors. GLMs are appropriate for nonnormal and heterogeneous data because they loosen the requirements of constant variation and residual normality, in contrast to classical linear regression.
The randomised component, the methodical component, along with the link function are the three main parts of a GLM. Various forms of data may be modelled since the random component determines the likelihood distribution of the variable in question, which can be gamma, Poisson, binomial, or another type. In order to guarantee that the values that are anticipated stay within the acceptable range for the response variable, the link function is utilised to modify the linear combination of predictor variables that make up the systematic component.
Selecting the right link function to model the connection between predictor and response variables is made flexible with GLMs. The belonging, logit, log, and reverse functions are examples of common link functions that may be used for various data kinds and research issues.
Premises of GLMs:
 Selfreliance: It is believed that each observation stands alone from the others. This indicates that one observation's value is independent of another observation's value. Inaccurate standard errors and skewed parameter estimations can result from violations of independence.
 Parameter Linearity: The linear predictor, which is the mixture of predictors weighed by coefficients, and the predictors are assumed to have a linear connection by GLMs. The link between a linear predictors and the variable that responds should be linear, even if GLMs can handle nonlinear connections with the right transformations.
 Accurate Model Specification: The model that is selected ought to make sense given the available facts. This involves deciding on a suitable link function and the right distributions for the response variable. Biassed estimations and inaccurate conclusions might result from using the improper model specification.
 Uniformity in Variance: While GLMs do not impose the constant volatility assumption of classic linear regression, they still demand that the response variable's variance be constant across predictor levels. Inaccurate standard errors and ineffective parameter estimations might result from heteroscedasticity.
 Without collinearity: When predictor variables have a strong correlation with one another, this is known as collinearity. While high collinearity can result in unstable parameter estimations and interpretive challenges, GLMs are generally resilient to low degrees of collinearity.
GLM components include
 Unpredictable Elements: The response variable's probability distribution, which is supposed to fall within the exponentially family of distributions, is specified by the random element of a GLM. Gaussian (normal), binomial, the Poisson, gamma, and inverted Gaussian distributions are frequently employed in GLMs. The type of responses and the research topic being answered determine which distribution is best.
 Organised Part: The linear predictor, that is an ordered set of predictor variables weighed by their individual coefficients, makes up the systematic part of a GLM. It may be expressed mathematically as ?=X?, where ? is the vectors of coefficients, X is the design's matrix of variables that predict, and ? is a linear predictor.
 Connectivity Function: The connection between the nonlinear predictor and the response variable's predicted value is described by the connection function in a GLM. To make sure that the projected values fall inside the proper range of the response variable, it converts the linear predictor. The belonging, logit, log, as well as inverse functionswhich correspond to various response variable kinds, such as continuous, binary, and countare often utilised link functions.
 Variable of Response: The parameter of interest which is being modelled in the GLM is the response variable, often referred to as the variable that is dependent or outcome variable. The random component specifies an order from the exponentially family, which is expected to be the case.
 Predictive factors: The factors that are used to forecast or explain change in the variable that responds are called predictor variables, often referred to as separate variables or covariates. Binary, category, and continuous variables can be among them.
GLM Types
 The linear regression method: Conventional linear regression is a subset of GLMs even if it might be seen as a special instance of GLMs. When there is a direct correlation between the outcome variable and the predictors and the response variable has a Gaussian (normal) distribution, it is used.
 The Logistic Regression Model: When the answer to a variable is either binary or dichotomous (yes/no, success/failure, etc.), logistic regression is utilised. The logistic function, which converts the linear predictors to the probability scale and guarantees that predicted probabilities lie between 0 and 1, serves as the link function for logistic regression.
 Poisson Regression: When a response variable (such as the number of phone calls or accidents) indicates counts of occurrences that follow the distribution of Poisson, Poisson regression is appropriate. It is frequently applied to count data modelling.
 The Gamma Regression: When a response variable has a gamma distribution and is continous and positively skewed, gamma regression is employed. It is frequently used for modelling data that is strictly positive continuous, such waiting times or medical expenditures.
 Regression using Binomials: Similar to logistic regression, binomial regression is applied when the response variable (i.e., binomially dispersed data) indicates the percentage of successes out of a certain number of trials. Modelling percentages or rates where the total amount of trials has been determined is suitable.
 Regression using Negative Binomials: When the response variable reflects counts of occurrences that have a negative binomial distribution, negative binomial regression is employed. When the variance exceeds the mean in overdispersed count data, it is appropriate to use.
 Regression with Multinomials: When the answer variable contains a minimum of two groups and a multinomial distribution, multinomial regression is employed. It is frequently used in categorical analysis when there are several unordered categories in the result.
 Regression on an ordinal scale: When the response of a variable is ordinalthat is, having ordered categories but nonequal gaps between themordinary regression is employed. It works well for examining ordinal data, such ratings from surveys or Likert scale replies.
GLMs' RealWorld Applications:
 Research in Biomedicine: In biomedical research, GLMs are widely utilised to analyse data from clinical trials, epidemiological studies, and patient outcomes. They can represent continuous outcomes like blood pressure readings, count information like the amount of hospitalisations, and model binary outcomes like the presence or absence of a disease.
 Actuarial science and insurance: In the fields of actuarial science and insurance, general linear models (GLMs) are used to model and forecast claim frequencies and severities. They are appropriate for modelling variables connected to insurance, such the quantities and durations of claims, since they are capable of handling skewed and not negative data distributions.
 Risk management and finance: In finance as well as risk management, GLMs are used to estimate insurance premiums, anticipate loan defaults, and model credit risk. Their ability to handle heavytailed and skewed distributions makes them valuable for modelling financial data, including asset prices and stock returns.
 Analytics for customers and marketing: In advertising and customer analytics, GLMs are used to categorise consumers according to their attributes, model customer behaviour, and forecast purchase probability. They are able to count data, like the quantity of purchases, and represent binary outcomes, like buy/nonbuy events.
 Environmental Science and Ecology: In ecological and environmental studies, GLMs are used to examine trends in biodiversity, distributions of species, and preferences for habitats. Ecological data like species richness, abundance, and presence/absence may be modelled using them.
 Social Science Fields: In the social sciences, GLMs are used for voter behaviour research, educational outcome modelling, and survey data analysis. They are appropriate for analysing a variety of social scientific phenomena since they can handle binary outcomes, ordinal data, and categorical data.
Examples & Case Studies
 Forecast for Insurance Claims: An insurance provider seeks to forecast, from policy and demographic data, the possibility that a policyholder would file a claim. Customers' age, gender, insurance type, and previous claim status are among the information they gather. They may determine the likelihood of an action for each client by developing a model based on logistic regression, which aids in risk management and the setting of fair rates.
 Analysing medical outcomes: A healthcare researcher is looking at what influences the rates of readmission of patients after surgery. They collect information on surgical procedures, postoperative problems, and patient characteristics (age, comorbidities). They can find important determinants of readmission rates and create strategies to lower readmissions by utilising a Poisson or a negative binomial regression model.
 Modelling Market Response: A marketing company is interested in knowing how various advertising platforms affect consumers' purchasing decisions. They gather information on the amount spent on advertising through a variety of media (print, web, and television) and the related sales numbers. Their advertising allocations may be optimised and the impact of each marketing channel on sales can be measured by using a Poisson or linear regression model.
 Modelling Species Distribution: In a nature reserve, an ecologist is researching the dispersal of an endangered species. They gather records of species occurrence as well as environmental data, such as temperature, precipitation, and soil type. They can help with habitat management and conservation efforts by predicting the likelihood of a species' occurrence based on environmental factors by using a logistic regression or binomial regression model.
 Analysis of Sports Performance: A soccer coach is interested in examining the variables affecting his team's performance during games. They collect information on match results, player characteristics (age, position), and goal and assist totals. They can determine the main variables influencing match results (win, draw, or loss) and make calculated decisions to enhance team performance by constructing a multinomial or ordinals regression model.
