Apps  Contact  Seminars 

Google’s universal search gives non-deterministic answers? (Perhaps due to MapReduce?)

by Amrinder Arora
August 24th, 2010

One of the innovations at Google was the launch of the universal search a couple of years ago.  While it was considered a drastic change by outsiders (something that can fundamentally change the user experience), the search giant was able to roll it out without much fuss, and pretty much all users are very familiar with it by now.  You search for “Elvis” and you can see books about Elvis, blog posts, images, regular web pages, all interspersed using that magical ranking that made the search engine the king (no pun intended).

However, sometimes the universal search gives different results just a few second apart.  Here, consider the first try for RYN:

Google Search RYN - Try 1

Google Search RYN - Try 1

Now, let us try the same search again:

Google Search RYN - Try 2

Google Search RYN - Try 2

So, sometimes the stock results are shown, and sometimes not.  You can try this behavior yourself, by clicking on this search a few times: RYN.  I can’t say how many times you might have to try it, but chances are, you will be able to replicate this behavior easily.

Now, the universal search likely uses the MapReduce paradigm (I am entering purely speculative mode here, so be forewarned.)  Say the map function of a search term returns a list of search categories (which are then say farmed out to worker machines to process).  Some worker machines may or may not return to the master in time, and in the reduce phase, the master may be only putting together the results that it received in time (and ordering it using the search results rank and such).

So, in case the worker does not return the results for the search term from the “finance” search category in time, the master is not able to include those results.

Also, by repeatedly trying out the search, I observe that that this non-deterministic behavior manifests mostly for finance and image search categories.  One would imagine that Google has a check in place that if the “core webpage” search category worker has not returned, the results are not considered valid, and master must wait for that.

That is enough speculation for a day, so I guess I will just wait until someone with REAL knowledge of how Google’s universal search can enlighten everyone on this matter.

Oh, and btw, the original MapReduce paper does address non-determinism, but only as:

When the map and/or reduce operators are nondeterministic, we provide weaker but still reasonable semantics. In the presence of non-deterministic operators, the output of a particular reduce task R1 is equivalent to the output for R1 produced by a sequential execution of the non-deterministic program.

Clearly, this non-determinism is not directly related to the end users non-determinism that we refer to here.

Enter Comments