Micro, Macro Weighted Averages of F1 ScoreTwo techniques combine the F1 scores of several classes in a classification task: macro and microweighted averages. A model's accuracy is determined by calculating its F1 score, which accounts for precision and recall. It ranges from 0 to 1, 1 representing the best possible score. It is the harmonic mean of precision and recall. If the classification problem is multiclass, the F1 score can be computed separately for every class. The average of the F1 scores for each class, without considering the percentage of samples in each class 1, is known as the macroaverage F1 score. Adding up all of the true positives, false negatives, and false positives for class 1 yields the microaverage F1 score. The average of the F1 scores for each class, weighted by the quantity of samples in each class 1, is the weighted average F1 score. Micro F1 Score:The total number of true positives, false negatives, and false positives across all classes is considered when calculating the micro F1 score. By adding up all of the true positives, false negatives, and false positives, it calculates the F1 score globally. A micro F1 score is appropriate when you wish to assign each data point the same amount of weight regardless of class. Where; TPi: True Positive for class i FPi: False Positive for class i FNi: False Negative for class i Example:Let's understand Micro F1 with an example. Suppose we have the following prediction results for multiple class problems:
As you can see, each class's normal F1 score has been determined. We only need to determine the mean of the three class F1 scores to return the macro F1 score, which is as follows: Micro F1 Score is: 35 / (35 + 0.5 * (13 + 16)) = 0.71 Macro F1 Score:The average of the F1 scores across all classes determines the macro F1 score. The macro F1 score is obtained by taking the average of the F1 scores independently computed for each class. When evaluating the model's performance in each class equally, regardless of class imbalance, the macro F1 score is appropriate. Where; n: Total number of classes F1i: F1 Score or class i ExampleLet's look at an example to solve the Macro F1 problem. Use the following prediction results for a multiclass problem.
As you can see, each class's normal F1 score has been determined. We only need to determine the mean of the three class F1 scores to return the macro F1 score, which is as follows: Macro F1 Score is: (0.8+0.6+0.8)/3 = 0.73 Weighted F1 Score?The weighted F1 score is a metric used in machine learning to evaluate the performance of a model, especially in scenarios where class imbalance exists. Let's break it down: F1 Score:The F1 score consists of precision and recall into a single value. It is computed as the harmonic mean of precision and recall. Precision represents the accuracy of positive predictions, while recall measures how well the model identifies actual positive cases. The F1 score ranges from 0 to 1, with 1 being the best. WeightedAveraged F1 Score:The weightedaveraged F1 score considers each class's support (i.e., the number of classes in the dataset). It is calculated by taking the mean of all perclass F1 scores, weighted by their support. For example, if there's only one observation with an actual label of Boat, its support value would be 12. SampleWeighted F1 Score:It is Ideal for classimbalanced data distributions. It's a weighted average of classwise F1 scores, where the number of samples in each class determines weights. Remember that the F1 score ranges between 0 and 1 only, and it's a valuable metric for assessing a model's overall performance. How to Calculate Weighted F1 Score?Calculate the weighted average by allocating a weight to each class's F1 scores based on the number of instances in that class. Where: N is the total number of classes. Support_{i} is the number of instances in the class i. Calculation Through Python CodeThis is an example of Python code that calculates the precision and recall scores on a micro and macroaverage basis for a model trained on the SkLearn IRIS dataset, which comprises three distinct classes: setosa, versicolor, and virginica. The model is trained using a single feature to produce a confusion matrix with numbers in every cell. Observe the training data X that iris has been assigned.data[:, [1]]. Output: With a single feature, the model trained on the IRIS data set would have this confusion matrix. The Python code used to calculate the precision scores for the micro and macroaverages looks like this: True positive prediction (diagonally) for all the classesprecisionScore_manual_microavg, precisionScore_manual_macroavg The Sklearn recall_score, f1score, and precision_score methods can also be used to calculate the same thing. To find the microaverage, macroaverage, and weighted average scores, the parameter "average" must be passed through three levels. Output: The F1 score, also called the Fmeasure, is a frequently used indicator to see how a classification model performs. We use averaging techniques to compute the F1 score when dealing with multiclass classification, producing different average scores like macro, micro, and weight scores., in the classification report. The following section will describe the average scores, how to calculate the average scores using Python code, and why and how to choose the best one. Which is Better for Imbalanced Datasets?Both micro and macro F1 scores have advantages and considerations for imbalanced datasets:
Why Does the ScikitLearn Classification Report Not Have a Micro Average?Yes, a micro average for precision, recall, and F1score is given in the scikitlearn classification_report. Nevertheless, the output of the classification report does not specifically identify it as Micro. Rather, it is the weighted average of all classes' precision, recall, and F1 score, with each class contributing in accordance with the number of instances in which it occurs. How Does it Work?
These values, weighted by the total number of instances in each class, represent the aggregated performance metrics across all classes and are presented in the classification_report without being explicitly labelled as micro averages. Differentiates of Micro & Macro F1
Which One Should I Go With, Micro F1 or Macro F1?Selecting between the Micro and Macro F1 scores will rely on the particulars of your classification task as well as your goals: When Use Micro F1?
When Use Macro F1?
ConclusionThe weighted averages, macro, and micro of the F1 score are selected based on the goals and class distribution of the classification task. While the macro F1score treats all classes equally, the micro F1score prioritizes overall performance and favours larger classes. Weighted averages balance class sizes. Knowing these metrics makes choosing the best evaluation technique easier and guarantees a thorough model assessment.
Next TopicAssumptions of Linear Regression
