Lightbm Multilabel Classification

Multilabel classification allows each instance to be simultaneously assigned to more than one class instead of just one. For instance, a song may be categorized under several genres in a music recommendation system, such as "rock," "blues," and "jazz." In contrast, in conventional classification tasks, an instance is usually allocated to one class out of a set of categories that are mutually exclusive.

Predicting numerous labels for each occurrence leads to the intrinsic difficulty of multilabel categorization. Models that can represent the correlations and interdependencies between various labels are therefore necessary. Conventional techniques might not be able to effectively handle this complexity, particularly when working with big datasets and high-dimensional feature spaces.

Microsoft created LightGBM, a gradient-boosting framework that makes use of tree-based learning techniques. It is made to be scalable, extremely effective, and able to manage massive amounts of data. LightGBM makes this possible with a number of advances, such as leaf-wise growth strategy, efficient handling of big datasets, and decision tree learning based on histograms. The training process is accelerated by the histogram-based techniques, which discretize continuous information into discrete bins. While other gradient-boosting techniques develop trees depth-wise, LightGBM grows trees leaf-wise, concentrating on the leaves with the biggest loss reduction, which improves accuracy. LightGBM is also performance-optimized, which makes it appropriate for big, highly dimensional datasets.

We can use LightGBM's adaptability and effectiveness in managing several binary classification issues at once in order to modify it for multilabel classification. Binary Relevance (BR), Classifier Chains (CC), Label Powerset (LP), and Ensemble Methods are some of the popular techniques. The simplest method is Binary Relevance, in which each label is learned separately using a binary classifier that may be effectively trained using LightGBM. Although this method handles each label separately, label relationships might not be well captured. Contrarily, classifier chains train binary classifiers in a fashion akin to a chain, using each classifier's prediction as an extra feature for the subsequent classifier in the chain. This allows the model to identify correlations and dependencies between labels, improving prediction accuracy. The Label Powerset method treats every possible combination of labels as a distinct class, hence reducing the multilabel issue to a single multiclass problem. It is thus possible to train a multiclass classifier using LightGBM. However, when there are a lot of different label combinations, this method might not be able to be implemented.

Code:

Now for a better understanding of the concept, we will try to classify iris plants with the help of Lightbm.

Importing Libraries

Reading the Dataset


Lightbm Multilabel Classification

Now we will create a custom multi-class log loss function and an accuracy metric for use with LightGBM, a gradient-boosting framework.


Encoding Target

We will convert the target value into ordinal values.

Output:

Lightbm Multilabel Classification

Output:

Lightbm Multilabel Classification

Split Data

We will now use two approaches to training a LightGBM model for a multi-class classification task: one using a custom multi-class log loss function and the other using the built-in multi-class objective function provided by LightGBM.

Output:

Lightbm Multilabel Classification

Output:

Lightbm Multilabel Classification

Multi-Task

Now we will introduce a custom dataset class, MultiLabelDatasetForLGBM, tailored for handling multi-label data in LightGBM, along with a custom loss function, MultiMSEForLGBM, for multi-task mean squared error.



Encode Target

Here we will now use a different approach, we will now encode the target into categorical value.

Output:

Lightbm Multilabel Classification

Split Data

Training

Now we will train the model again.

Output:

Lightbm Multilabel Classification

Output:

Lightbm Multilabel Classification

Comparing Prediction

Now the softmax function is applied to the predicted logits to obtain class probabilities and for regression tasks, predictions are directly obtained from the trained models without additional transformations.

Output:

Lightbm Multilabel Classification

Output:

Lightbm Multilabel Classification

Output:

Lightbm Multilabel Classification




Latest Courses