The idea of the classification algorithm is very simple. We predict the target class by analyzing the training dataset. We use training datasets to obtain better boundary conditions that can be used to determine each target class. Once the boundary condition is determined, the next task is to predict the target class. The entire process is known as classification.
There are some important points of classification algorithms:
It is an algorithm that maps the input data to a specific category.
- Classification Model
A classification model tries to draw some conclusions from the input values which are given for training. This conclusion will predict class labels/categories for new data.
It is an individual measurable property of an event being observed.
- Binary classification
It is a classification task that has two possible outcomes. E.g., Gender classification, which has only two possible outcomes, i.e., Male and Female.
- Multi-class classification
It is a classification task in which classification is done with more than two classes. An example of multi-class classification is: an animal can be a dog or cat, but not both at the same time.
- Multi-label classification
It is a classification task in which each sample is mapped with a set of target labels. An example of multi-label classification is: a news article that can be about a person, location, and sports at the same time.
Types of Classification Algorithms
In R, classification algorithms are broadly classified in the following types:
- Linear classifier
In machine learning, the main task of statistical classification is to use an object's characteristics for finding to which class it belongs. This task is achieved by making a classification decision based on the value of a linear combination of the characteristics. In R, there are three linear classification algorithms which are as follows:
- Logistic Regression
- Naive Bayes classifier
- Fisher's linear discriminant
- Support vector machines
A support vector machine is the supervised learning algorithm that analyzes data that are used for classification and regression analysis. In SVM, each data item is plotted as a point in n-dimensional space with the value of each attribute, that is the value of a particular coordinate.
Least squares support vector machines is mostly used classification algorithm in R.
- Quadratic classifiers
Quadratic classification algorithms are based on Bayes theorem. These classifiers algorithms are different in their approach for classification from the logistic regression. In logistic regression, it is possible to derive the probability of observation directly for a class (Y = k) for a particular observation (X = x). But in quadratic classifies, the observation is done in the following two steps:
- In the first step, we identify the distribution for input X for each of the groups or classes.
- After that, we flip the distribution with the help of Bayes theorem to calculate the probability.
- Kernel estimation
Kernel estimation is a non-parametric way of estimating the Probability Density Function (PDF) of the continuous random variable. It is non-parametric because it assumes no implicit distribution for the variable. Essentially, on each datum, a kernel function is created with the datum at its center. It ensures that the kernel is symmetric about the datum. The PDF is then estimated by adding all these kernel functions and dividing it by the number of data to ensure that it satisfies the two properties of the PDF:
In R, the k-nearest neighbor is the most used kernel estimation algorithm for classification.
- Every possible value of the PDF should be non-negative.
- The fixed integral of the PDF on its support set should be equal to 1.
- Decision Trees
Decision Tree is a supervised learning algorithm that is used for classification and regression tasks. In R, the decision tree classifier is implemented with the help of the R machine learning caret package. The random forest algorithm is the mostly used decision tree algorithm used in R.
- Neural Networks
The neural network is another classifier algorithm that is inspired by the human brain for performing a particular task or function. These algorithms are mostly used in image classification in R. To implement neural network algorithms, we have to install the neuralnet package.
- Learning vector quantization
Learning vector quantization is a classification algorithm that is used for binary and multi-class problems. By learning the training dataset, the LVQ model creates codebook vectors that represent class regions. They contain elements which are placed around the respective class according to their matching level. If the element matches, it moves closer to the target class, if it does not match, then it proceeds.