Apps  Contact  Seminars 

Archive for ‘Analytics & BI’

August 12th, 2012

Job scheduling mechanisms in Clouds

by Amrinder Arora

As more and more organizations have started to use the cloud for their computing needs (I have been using Amazon Web Services, commonly known as the EC2 cloud for more than two years now), a relatively new set of challenges has arisen. The cloud providers provide computing resources, while the organizations care about their analytical and computing jobs being completed, irrespective of the computing resources they require. The organizations would like to place a value on the job, instead of the resource, and there is currently a missing link between the two.

As a regular reviewer for Computing Reviews, I have just finished a formal review for Near-optimal Scheduling Mechanisms for Deadline-Sensitive jobs in Large Computing Clusters, and that work specifically tries to address these new kinds of challenges that we are all observing as the movement to cloud computing gathers further steam.


May 30th, 2012

Distributed tuning of machine learning algorithms using MapReduce clusters

by Amrinder Arora

My review of Ganjisaffar et al’s 2011 LDMTA paper is available at:

February 27th, 2012

Artificial Intelligence

by Amrinder Arora

View the entire Rum Raisin Toon book.

January 30th, 2012

Comparing two confusion matrices

by Amrinder Arora

Comparing two confusion matrices is a standard approach for comparing the respective targeting systems, but by no means is it the only one. As we will discuss in the coming days, you can also compare two score based targeting systems by comparing their lists. But for now, let us focus on comparing the targeting systems by comparing their respective confusion matrices.

The standard approach is to use a single value metric to reduce each matrix into one value, and then to compare the metric values.  In other words, to compare M1 and M2, we simply compare f(M1) and f(M2), where function f is the single value metric.

Here are some single value metrics that can be considered as candidates:

  1. Kappa Statistic
  2. F1 measure
  3. Matthews Correlation Coefficient
  4. Reward/Cost based
  5. Sensitivity (Recall)
  6. Specificity (Precision)

A related approach can also be to take a matrix difference of the two matrices, and then using a dot product (or scalar product), but it is easy to see that transforms to using reward/cost based metric.  [A.C - B.C = (A-B).C etc.]


January 16th, 2012

Targeting Systems vs. Classification Systems

by Amrinder Arora

It is generally said that targeting system is a degenerate classification system with only two labels.  However, this is misleading.  For example, consider that you are trying to identify a list of customers who are good prospects for an upselling opportunity.   So, you run the system and generate a list of the “to call” list.  The system has then separated your universe of customers into “good candidate” and “not good candidate” customer sets.  Based on the traditional definition, then the decision system is a targeting system.  However, consider the scenario in which, you decide to separate the list into three parts based on how strong prospects they are – “very likely”, “somewhat likely” and “not likely”.  Then, has the same system suddenly stopped being a targeting system, and become a broader classification system?  Of course not, and this example helps us refine the concept of targeting systems vis-à-vis classification systems.  A system can be a multi-level targeting system which is trying to assign of the possibly many levels of actions.  It is possible to have a targeting system with infinite levels, by simply requiring the system to output a “percent likelihood” instead of yes/no.

On the flip side, a two label classification system, which is trying to separate the given list of objects into two different labels shouldn’t really be considered a targeting system.  For example, consider the case in which the classification system is trying to classify a given fossil to be belonging to one of the many known dinosaur species.  As the age of the fossil and the other characteristics continue to get analyzed, the choice narrows down to either the Struthiomimus or Ornithomimus.  Since it now has two labels, does it suddenly become a targeting system?  We argue that it doesn’t, since the goal is to assign one of the labels to it.  The goal was not to target to do something based on that decision.

Thus, the real difference between a classification system and a targeting system is the intent. If you intend to target some objects out of given set to take one action (investigate, upsell, give a free upgrade, stop from boarding the plane), then that is a targeting system.  If you intend to assign one of the labels to the given object, then that is a classification system.