Recently, I received a slide deck extolling the virtue of an exciting new classification system with a purported accuracy of 62.5%. While the number itself is not very high to begin with, the value of that 62.5% begins to diminish further once we evaluate what accuracy really represents. Accuracy is defined as
(number of items correctly classified) / (total number).
Suppose the classes are not equally represented, and rather they are represented in a ratio of 2 to 1. That is, class 1 is the right classification for 2/3rd of the items, and the class 2 is the correct classification for 1/3rd of the items. Consider a degenerate classification system that simply assigns class 1 to all items. The accuracy of that degenerate system is then 67%. And that system does not even do anything!
This simply observation is the reason that there are so many other objective functions – for example, kappa statistic, matthews correlation coefficient, F1 measure, etc, that are considered so much more appropriate than the “accuracy”. Kappa statistic, for example, compares the accuracy of the system to the accuracy of a random system.