Friday, January 8, 2010

Help struggling search engines solve the free text query problem!





Currently, this is how a full search system looks like.
This is rather ugly, at least not sexy at all.
Isn't?


It's based on clumsy ideas such as that free query parser that tries to guess which Boolean operators link each keyword together.
But why trying to guess? Because the searcher doesn't input explicitly any operators inside the query. The user's information input interface provided by the search bar is too narrow.


The fact of the matter is that current search interfaces are still basic search bars and therefore don't get you excited about using "AND", "OR" or "NOT".
Right?


"Web search engines such as Google have popularized the notion that a set of terms typed into the query box carries the semantic of a conjunctive query that only retrieves documents containing all or most query terms".
This is a quotation from "Introduction to information retrieval", written by Christopher Manning , the associate professor at Stanford that helped Larry and Sergey do some cool stuff in 1997...


Again, let's stop trying to guess systematically for the user, for a minute. Let's enable searchers to put more information. Let's try to process that additional information in an elegant and efficient fashion. And let's try to create some substantial data that will be used to create algorithms that will be the basis for systems whose purpose will be to guess for the user.


I think that technological innovation is based on a very tight collaboration between "humans" and "machines", a "human" being an electrochemical computer, a "machine" being an electronic computer. I think that technological innovation is based on an alternation between humans' brains contribution and machines contribution, all the time.


And search is no exception.

3 comments:

  1. My impression is that you want to point out the fact that the reason why search-box based IR is struggling is because they under-exploit (perhaps because they under-estimate users' brain) the ability proper to the human brain.
    And I think it's a good point;
    I think that before writing software, we should first determine clearly what the user (with her brain) is able to do in the system and then complete (i.e. help him with software)
    But for that, you have to acknowledge that the human brain is no more no less a computational device... maybe that's the main issue.

    About the quote of C.Manning;
    "Web search engines such as Google have popularized the notion that a set of terms typed into the query box carries the semantic of a conjunctive query that only retrieves documents containing all or most query terms".

    What did he exactly mean?

    ReplyDelete
  2. exactly, the main problem is philosophical.
    the limits are just settled by ourselves.
    it s like aging.
    you cannot solve aging because from the beginning you think it s not solvable.

    about the quote, well "conjunctive" means only "AND" between 2 arguments.
    so the point it that with google search bar, people just do not need to worry about the operators in between.
    it s just AND, never OR

    ReplyDelete