Category Archives: Analytics & BI

business intelligence, data mining, enterprise/operational decision making and data analytics

My Review of “Centralized Allocation of Human Resources: An Application to Public Schools”

My Review of “Centralized Allocation of Human Resources: An Application to Public Schools” for Computing Reviews is now online here (requires Computing Reviews membership).

It covers a very interesting work done by Laura López-Torres and Diego Prior of Universitat Autònoma de Barcelona related to workforce planning in the context of public schools in Catalonia, in northeast Spain.

Workforce allocation using artificial intelligence is one of the core strengths of BizMerlinHR, so this paper was a natural fit for me to review.

My Review of Data Characterization paper by Wang et al

Review of “An improved data characterization method and its application in classification algorithm recommendation” paper by GuangtaoWang, Qinbao Song and Xiaoyan Zhu is now available on Computing Reviews here.

Classification is an active research problem, and numerous classification algorithms have been proposed over the past few years. Some algorithms perform better than others, based on the dataset. “No Silver Bullet” or “No Free Lunch Theorem” is an informal theorem that states that no single classification algorithm outperforms other classification algorithms on all data sets. This informal theorem is essentially what keeps many data scientists in business – each data set has its own idiosyncrasies, and different classification algorithms need to be explored to find the one that best meets the needs of the problem at hand. Continue reading full review.

Mahout and Third Party “Similarity” Services

Mahout is an extremely popular platform for recommendations these days. It is only a matter of time before people will start selling “similarity” services – give two SSNs and find the similarity between users – give two ISBNs and find similarity between books, give two SKUs and find similarity between products. Once such services are available, a facade to build on top of that is straightforward as well.

Consider a facade signature as:

double getSimilarity(String reference, String id1, String id2);

For example:

Why “Accuracy” of a Classification System may be a Useless Metric?

Recently, I received a slide deck extolling the virtue of an exciting new classification system with a purported accuracy of 62.5%. While the number itself is not very high to begin with, the value of that 62.5% begins to diminish further once we evaluate what accuracy really represents. Accuracy is defined as (number of items correctly classified) / (total number).

Suppose the classes are not equally represented, and rather they are represented in a ratio of 2 to 1. That is, class 1 is the right classification for 2/3rd of the items, and the class 2 is the correct classification for 1/3rd of the items. Consider a degenerate classification system that simply assigns class 1 to all items. The accuracy of that degenerate system is then 67%. And that system does not even do anything!

This simply observation is the reason that there are so many other objective functions – for example, kappa statistic, matthews correlation coefficient, F1 measure, etc, that are considered so much more appropriate than the “accuracy”. Kappa statistic, for example, compares the accuracy of the system to the accuracy of a random system.