Tuesday, February 9, 2010

The long tail of search results...

The long tail: "20% of all searchable items account for 80% of all search results."

(For those that don't know what the tong tail is, have a look at:http://en.wikipedia.org/wiki/Long_Tail)

Problem: "I have specific needs that correspond to specific items that are far away on the right hand of the tail. How can I do?"

Well, digitize the world.
Thanks to the digitization of the world, consumers can now reach easily the remaining 80% of all possible items that exist in the collection.
This is the case in Amazon.com. But it's not the case in Walmart: you cannot have everything in the store because of the physical limitations.

So by relying on a Web site, people can now reach the items they are interested in.
Well on one condition: to be able to search for those items.

As you may know, there are two ways to search for things on the Internet:

1) search bar
2) recommendation engine.

Let's see the pros and cons of each one.

The search bar is a wonderful tool if you know exactly the name or the ID of the item you are looking for. In other words, you use the search bar as a fetch dog. Let' s say that you wanna buy "2001: A Space Odyssey" on Amazon.com. You will just have to input the characters string into the search bar and it will immediately display the movie if it's in the database.

But let's say now that you don't know the name of that thing you are looking for.
In that case, you rely on the probability ranking principle of information retrieval: you put the criteria that you are looking for into the search bar and the system will figure out the items that are the most relevant to those criteria.
This can be based on popularity ranking, items recentness, customers' evaluation etc...

In the same way, Google outputs popular results based on PageRank. They use in fact the same principle but applied to a linked database.

You see the problem here: the system will display only popular results, that is, those 80% of all results that correspond to 20% of all items, and therefore you would not be able to reach the specific items, the ones that exactly fit your personality and your criteria, the ones that are in the remaining 80% of the long tail. In other words, you cannot do customized search based on a search bar.

Conversely, the recommendation engine is good in the sense that the system can suggest you items that are far away on the right hand of the long tail without asking you to do anything: you only have to see with your eyes (and your wonderful visual patterns recognition ability) and choose what you like. But the thing is that the suggestions are displayed and proposed to you without any proper justification: you don't know why the system has displayed those particular suggestions. In other words, you don't have control over them. Therefore it may be very frustrating sometimes since you may think that the recommendations are random.

Let's recap: "I wanna discover new items that correspond to my taste and my criteria but I don't know the name of those items."

In the search bar paradigm, you have full control over the criteria but the system outputs only popular items that you are likely to already know or that are likely to be boring.

In the recommendation engine paradigm, the system outputs items that you don't know yet but you don't have full control over the criteria.

I personally think that It's high time that we built an information retrieval system where, not only you have full control over the criteria but also where you can discover things you don't know yet.

A solution: to enable searchers to put more keywords, in other words much more information, much more criteria.

Well there are two big problems in this stuff.

a) the fact that current search engines are not able to handle many keywords: a Boolean paradigm is stuck to 2.1 keywords. (See the previous post on that). Put more keywords and you get worse results.

b) the fact that it's just troublesome for the user to put many keywords: the human brain is good at recognizing but not at remembering things.

With ascot project, we believe we found an elegant solution for both problems. In fact, we recently applied for a patent in the field of information retrieval.

Two important ideas in that patent: "concepts-based search", "suggested terms".


a) can be solved by creating a search engine that is able to handle concepts.
b) can be solved by creating a search engine that is able to suggest related terms and related concepts.

Stay tuned. We are in contact with a VC right now...


  1. This is very exciting. I'm working on a project that addresses the same problem. I too, identify keywords as the core problem, but judging from what you've hinted at the end of the post your solution takes a different approach than mine (content mapping).

    I'm really curious how the ascot project will progress. Keep me updated, I think we already follow each other on Twitter.


  2. Hi Dan.
    Tks for your comment.
    You are doing also a blog which I follow now.
    I ll take a look and tell you what I think.
    Maybe we can collaborate in the future, who knows.