benad (benad) wrote,
benad
benad

Semantic Search Engines

The big news last year was the launch of the oddly-named Bing search engine by Microsoft, followed by Yahoo!'s migration from Google to Bing as their search engine backend. With the amount of money Microsoft just threw at it, they became the "other" search engine.

There used to be more competition in the search engine world before, at least before the concept of "search engined" became closely tied with online advertising. I still remember the days of AltaVista and Lycos. Over time, Google won because of the sheer size of its index, its speed, and because of its "page ranking" algorithm the quality of the results. Competitors tried to compete on "features", for example Ask and A9.com, but those were quite gimmicky, restricted to a relatively small set of topical queries and the "Web 2.0" user interface.

So, do I like Bing? Not really. It's just too much like Google already is. The same kind of features, but with a page index that's still not as big as Google's. Bing's spider "MSNBOT" kind of sucks (blocked by CPAN and reddit). Still, it's not just about how large the index is, but also the quality of the search results. That's something that Cuil learned the hard way.

At the end of the day, I use search engines for two reasons. 70% of the time, I'm searching for some obscure error message text I got in a log or on the command-line. The rest of the time, I search on about a topic, and I usually either land on some Wikipedia page, or the official site of whatever I'm trying to learn about. Well, for "normal" people, they use Google 95% of the time for "topical" searches. To the point where some people use the search box as a replacement for the address bar. You could make the case that a "keyword search" would be better for them, but since Google became "good enough" for keyword searches it killed other services like "AOL Keywords" and Go.com.

A here's the problem: Google is still not very good at semantic search. If I search about something like Hudson, the continuous integration server, by searching for "hudson", then while it's there in the first page of the results, it's still there linearly clobbered amongst tons of other things that I'm not looking for. Other search engines tried to have more "friendly" search results by hand-crafting certain result pages. In the end, if you want to learn about something but aren't sure of what to look for, Google is just as bad as it was in the last decade.

So, while I was looking for alternate search engines, I found Duck Duck Go. OK, once you stopped laughing at the name, try searching for "hudson". What you see are incredible semantic results, i.e. a "did you mean?" but for meaning, not just typos. Of course, "hudson" is so common that to find the software tool you have to scroll down to "Others" (linked right at the beginning of the results), and click on "More Topics..." to find "Hudson (software)".

You'll then notice that Duck Duck Go then displays right at the top of the "proper" results page (here, "Hudson (software)") a box called "Zero-click info" that displays one-line descriptions extracted from Wikipedia. You can hover the mouse on some of the links in that box to expand more descriptions. This whole user experience is quite amazing. For those "30% topical searches", Duck Duck Go is now my preferred search engine. Why? Because it's the only search engine out there that helps me extract meaning out of a few keywords I could remember, which directly affects the search result pages. In a way, it's a combination of Google-style searches with Yahoo!-style categories based on Wikipedia information.

I truly believe that the next step in UI and search is going to be based on self-organizing semantic classification. At first, we had to classify things manually in a "file manager" of some kind. Now we can search for keywords or phrases, but there's still so semantic clustering of the results, making a mess of information simply more "linear". Tagging seems to be a preview of semantic classification, but is still missing from most information retrieval interfaces, and it's still requires you to manually tag everything.

Anyway, I'm now rooting for another "underdog" is this new wave of search engine wars. Google is getting stagnant and gimmicky, Microsoft just threw a lot of money at it, Yahoo! and Ask dropped the ball years ago. Yet this tiny Duck Duck Go thing manages to get better topical search result pages than all of these for a tiny fraction of the budget. They must be on something...

Subscribe
  • Post a new comment

    Error

    Anonymous comments are disabled in this journal

    default userpic

    Your IP address will be recorded 

  • 1 comment