Like many other researchers, I have struggled with the holy grail of representing the confusion matrix with a single value. Surely, it may be easy to compare two confusion matrices, for example, you can say the confusion matrix 2 is better than confusion matrix 1, below.
These two confusion matrices are trivially comparable confusion matrices. Confusion matrix 2 is better than confusion matrix 1, implying that the targeting system underlying confusion matrix 2 is better than the targeting system underlying the confusion matrix 1.
More confusion (pun intended) arises, when two confusion matrices are not trivially comparable. What are we to do in that case? Firstly, let us give these a name and a definition.
Definition 1: Two confusion matrices C1 and C2 are trivially comparable if and only if :
(FP(C1) <= FP(C2) and FN(C1) <= FN(C2)) or (FP(C2) <= FP(C1) and FN(C2) <= FN(C1)).
The matrix with lower number of false positives and false negatives can then be called a safely better confusion matrix.
So, back to the discussion of what can we do if two confusion matrices are not trivially comparable? How can we compare them then?
F-measure, or harmonic mean of recall and precision is a good example of such a measure, but it is woefully inadequate in specific vertical that I operate in. I had many other “home-grown” measures, which are home grown for a reason – they haven’t had the full academic review yet. More recently I have come across Matthews correlation coefficient, and am sometimes amazed at how nicely it represents the confusion matrices.
No single metric works in all situations, but this one comes pretty darn close.