BBC is one of the leading news websites. It is often ranked in top 10 most visited websites. Due to this, it wields an enormous amount of influence.
BBC monitors the popularity of a news item on at least two counts (i) Number of times the news item was read, and (ii) Number of times the news item was emailed. These correspond to the “Most E-mailed” and “Most Read” segments of the BBC website.
As we show next, any hacker can propel a selected news item to the top of the “Most E-mailed” news story. The effect of this is quite significant and is self-feeding, as the news story that stays on the most emailed section then continues to get attention, and therefore continues to be emailed.
What is Statistics Hacking?
We define Statistics Hacking to be a process in which a malicious user manages to modify the system usage statistics. Statistics Hacking explicitly refers to the situation where the resource (website or system) is available to the malicious user for acceptable usage, but the user is able to modify the system usage statistics using some unacceptable methodology.
Basic Vulnerability in the BBC’s System
This section highlights in detail the vulnerability in the BBC’s “Email this to a friend” system, and how it can be exploited by Statistics Hackers.
As of now, BBC does not employ any of the advanced methods to prevent statistics hacking. Instead, it only uses a small hash value, that is hardcoded inside the HTML form. It is not clear if this hash was intended as a security mechanism at all, anyway it has zero impact in this respect.
In the scenario below, we assume that the Statistical Hacker is a dedicated health services professional, who wants to highlight the availability of HIV Home Screening Kit, a news story carried by BBC http://news.bbc.co.uk/2/hi/health/6212467.stm.
Hacker begins by opening that page manually in a web browser, and then manually clicking on the “Email this to a friend” link. When the smaller window with email form opens, the hacker views the source of that page. The source of the page reveals most of the information that the hacker requires to submit that form.
Hacker’s Code involves a very basic Java program, in which a URL connection is obtained to the URL of the “Email this to a friend” page. Using the hidden variables and their values obtained from the form source, hacker creates the content that is then written to the output stream of the URLConnection object.
[Specific code for this story can be found in the PDF version cited below.]
Methods of Protection Against Statistics Hacking
Following categories of methods are available against Statistics Hacking.
- Use Computerized Turing Test to Distinguish Humans from Computers
- Use clustering techniques or improved counting to ignore double counts when analyzing statistics
This is a simplified version of an in-print journal paper. Full pdf version of paper can be found below.
 “Statistics Hacking – Exploiting Vulnerabilities in News Websites”, Amrinder Arora, International Journal of Computer Science and Network Security, March 2007.