## Micro, Macro Weighted Averages of F1 ScoreTwo techniques combine the F1 scores of several classes in a classification task: macro and micro-weighted averages. A model's accuracy is determined by calculating its F1 score, which accounts for precision and recall. It ranges from 0 to 1, 1 representing the best possible score. It is the harmonic mean of precision and recall. If the classification problem is multi-class, the F1 score can be computed separately for every class. The average of the F1 scores for each class, without considering the percentage of samples in each class 1, is known as the macro-average F1 score. Adding up all of the true positives, false negatives, and false positives for class 1 yields the micro-average F1 score. The average of the F1 scores for each class, weighted by the quantity of samples in each class 1, is the weighted average F1 score. ## Micro F1 Score:The total number of true positives, false negatives, and false positives across all classes is considered when calculating the micro F1 score. By adding up all of the true positives, false negatives, and false positives, it calculates the F1 score globally. A micro F1 score is appropriate when you wish to assign each data point the same amount of weight regardless of class. Where; TP FP FN ## Example:Let's understand Micro F1 with an example. Suppose we have the following prediction results for multiple class problems:
As you can see, each class's normal F1 score has been determined. We only need to determine the mean of the three class F1 scores to return the macro F1 score, which is as follows:
## Macro F1 Score:The average of the F1 scores across all classes determines the macro F1 score. The macro F1 score is obtained by taking the average of the F1 scores independently computed for each class. When evaluating the model's performance in each class equally, regardless of class imbalance, the macro F1 score is appropriate. Where;
i## ExampleLet's look at an example to solve the Macro F1 problem. Use the following prediction results for a multi-class problem.
As you can see, each class's normal F1 score has been determined. We only need to determine the mean of the three class F1 scores to return the macro F1 score, which is as follows:
## Weighted F1 Score?The weighted F1 score is a metric used in machine learning to evaluate the performance of a model, especially in scenarios where class imbalance exists. Let's break it down: ## F1 Score:The F1 score consists of precision and recall into a single value. It is computed as the harmonic mean of precision and recall. Precision represents the accuracy of positive predictions, while recall measures how well the model identifies actual positive cases. The F1 score ranges from 0 to 1, with 1 being the best. ## Weighted-Averaged F1 Score:The weighted-averaged F1 score considers each class's support (i.e., the number of classes in the dataset). It is calculated by taking the mean of all per-class F1 scores, weighted by their support. For example, if there's only one observation with an actual label of Boat, its support value would be 12. ## Sample-Weighted F1 Score:It is Ideal for class-imbalanced data distributions. It's a weighted average of class-wise F1 scores, where the number of samples in each class determines weights. Remember that the F1 score ranges between 0 and 1 only, and it's a valuable metric for assessing a model's overall performance. ## How to Calculate Weighted F1 Score?Calculate the weighted average by allocating a weight to each class's F1 scores based on the number of instances in that class. Where: N is the total number of classes. Support i.## Calculation Through Python CodeThis is an example of Python code that calculates the precision and recall scores on a micro- and macro-average basis for a model trained on the SkLearn IRIS dataset, which comprises three distinct classes: setosa, versicolor, and virginica. The model is trained using a single feature to produce a confusion matrix with numbers in every cell. Observe the training data X that iris has been assigned.data[:, [1]].
With a single feature, the model trained on the IRIS data set would have this confusion matrix. The Python code used to calculate the precision scores for the micro- and macro-averages looks like this: ## True positive prediction (diagonally) for all the classesprecisionScore_manual_microavg, precisionScore_manual_macroavg The Sklearn recall_score, f1-score, and precision_score methods can also be used to calculate the same thing. To find the micro-average, macro-average, and weighted average scores, the parameter "average" must be passed through three levels.
The F1 score, also called the F-measure, is a frequently used indicator to see how a classification model performs. We use averaging techniques to compute the F1 score when dealing with multi-class classification, producing different average scores like macro, micro, and weight scores., in the classification report. The following section will describe the average scores, how to calculate the average scores using Python code, and why and how to choose the best one. ## Which is Better for Imbalanced Datasets?Both micro and macro F1 scores have advantages and considerations for imbalanced datasets:
## Why Does the Scikit-Learn Classification Report Not Have a Micro Average?Yes, a micro average for precision, recall, and F1-score is given in the scikit-learn classification_report. Nevertheless, the output of the classification report does not specifically identify it as Micro. Rather, it is the weighted average of all classes' precision, recall, and F1 score, with each class contributing in accordance with the number of instances in which it occurs. ## How Does it Work?**Precision Micro Average:**To calculate the micro average precision, add all of the true positives in all classes, then divide that total by the total of all the false positives and true positives in all classes.**Micro Average Recall:**To calculate the micro average recall, add all of the true positives in every class, then divide that total by the total of all the false negatives and true positives in every class.**Micro Average F1-Score:**The harmonic mean of micro average recall and precision is the micro average F1-score.
These values, weighted by the total number of instances in each class, represent the aggregated performance metrics across all classes and are presented in the classification_report without being explicitly labelled as micro averages. ## Differentiates of Micro & Macro F1
## Which One Should I Go With, Micro F1 or Macro F1?Selecting between the Micro and Macro F1 scores will rely on the particulars of your classification task as well as your goals: ## When Use Micro F1?- When the model's overall performance across all classes is your main concern-especially when there is a class imbalance-use the Micro F1-score.
- When the dataset is unbalanced, the Micro F1-score can be a more accurate measure of the model's overall performance by giving larger classes more weight.
- It works well when the classes are noticeably different in size or you wish to prioritise the majority class's performance.
## When Use Macro F1?- When you wish to assess the model's performance equally across all classes, regardless of size, use the macro F1-score.
- The macro F1-score gives information about the model's performance on an individual class basis before averaging the results across all classes.
- It is helpful when you don't want the evaluation skewed in favour of the majority class and want to ensure the model performs well across all classes.
## ConclusionThe weighted averages, macro, and micro of the F1 score are selected based on the goals and class distribution of the classification task. While the macro F1-score treats all classes equally, the micro F1-score prioritizes overall performance and favours larger classes. Weighted averages balance class sizes. Knowing these metrics makes choosing the best evaluation technique easier and guarantees a thorough model assessment. Next TopicAssumptions of Linear Regression |