A non-Web-2.0 (TM O'Reilly, Inc.) way of looking at it.
First there was the Internet, and then came the possibility of putting up web pages. People started making pages containing lists of other pages that they liked, and that's how you found those pages. Essentially, by word of mouth.
Pretty soon there were far too many pages for people to find by word of mouth. We needed a way to search for things on the web. All the users knew this long before the first search engines appeared, and when those showed up they were pretty sucky. Understandably, the companies creating these search engines were concerned about how they were going to make money doing it, and this may have been more of a concern than creating a good search engine.
Finally, Google came along with clever ways to do search and a way to monetize: targeted advertising. The jury is still out on whether this advertising is all that effective, but it seems obviously better to give people links to what they might be interested rather than slamming an ad for your thing in front of absolutely everyone.
I remember when Google first appeared; I found out by someone telling me, in person. Other companies eventually realized that people wanted to actually search for what they wanted and so fixed their competing search engines (in particular, those engines that gave higher priority to paying advertisers without telling you stopped, or went out of business).
In the meantime, the number of pages have continued to grow exponentially or geometrically or by some other power function. And the effectiveness of Google searches has been diminishing for awhile now. If you're looking for something, you have to put more and more work into finding what you really want.
(Web 2.0, which seems primarily to be "a way for users to put information back into the web," has only had the effect of adding more pages to the Internet, so it's part of the problem).
At the same time, newspapers, once the gatekeepers of information, have not been able to cope and are rapidly going out of business (albeit with thrashing along the way, but without any hope that they will reinvent themselves to fit into the Web). First to go are the people who've been paid to collect news and give perspective on events; last to go are the advertising folks.
Apparently even television viewing has taken a big impact because of the Web. It's even possible that people have started to trust the opinions of random people on the Web more than they do those of television news -- as if they suspect that advertisers are influencing the news.
So people are going to the web for information, and the information-seeking machinery has already gotten creaky and is failing. What do we need next so that we can continue to solve the problem?
Not just searching, but filtering.
I'm not sure how this will work, but we need it. We have already seen some attempted solutions for the problem. The most obvious one is people who make lists of things they have reviewed. If you trust that person, you look at their list. For example, Kevin Kelly's Cool Tools is full of interesting things he and his group have researched; I've bought a number of items he has recommended. I have also found the Best Free Software List to be invaluable. Once you find reliable lists your life becomes vastly easier.
An essential aspect of recommendations is that you trust the person who is doing it. The problem here is that if money changes hands, trust is reduced or eliminated, so how does the person make money for the time they invest? Dedicated amateurs are great but they can only do so much. Google ads might become a sort of disinterested intermediary, but I have my doubts.
We'll still use search engines, but those engines need the additional option of only searching trusted resources, things that can't be gamed. There might also appear trust levels, at the top a human who has acquired some large number of "trust points" from users.
I think for any of this to work, we will need to solve the problem of digital identity. As long as people can remain anonymous, everything can be gamed. And it's holding us back. (I don't advocate that you be identifiable, just that you have a unique identity so no one can pretend to be you, and you can't pretend to be thousands of people). If nothing else, it could eliminate spam overnight.
Will Google solve the filtering problem? I wonder if they can, or whether they've gotten stuck in the mire of their one trick. Perhaps companies only ever get one trick, and then the success that ensues bogs them down with collective stupidity. I've heard hints of a lot of interesting ideas coming out of Google, but those ideas never seem to materialize.
For example, I'm quite sure I once heard they were thinking of ads that only cost the advertiser when a customer purchases. Would you have any trouble buying such an ad? It would be a no brainer, since you'd be taking no risk at all. This approach might be prevented by the lack of digital identity.
And Google is spending lots of time and effort preventing its search engine from being gamed, and dealing with AJAX issues -- things that don't move it forward. They seem to be making inroads against PayPal with their much-more-attractive Google Checkout, but I haven't seen many big improvements in that for awhile. In general, the kinds of new things I've seen from Google seem like those that could come from much smaller, less-well-funded companies.
From the user's standpoint, the next big thing will be filtered, trusted information. Although the existing players like Google and Yahoo would seem to be the ones that would pull this off (oops, I forgot to mention Microsoft. Why is that?), it appears more likely that we'll see it come from a new player with a fresh start on the problem. (I shall continue to be vexed by the fact that the big, successful companies aren't going to create the next big innovation).
Let's follow the dots (in the global socio-economic sense).
Google is an advertising entity, nothing more. They have invented (perhaps) the first trackable advertising mechanism. Whether or not they really have, they've managed to convince many.
Gresham's Law: bad money drives out good. In this case, this advertising mechanism absorbs ALL advertising $$. While there may be some commerce not seduced, let's not assume so.
So what do we lose? Trackable news sources. "Free" TV. Billboards. Magazines. Anything, in fact, which depends on advertising $$.
Since the USofA has a significantly smaller public broadcasting venture than Europe, and we already see commercial TV descending rapidly into a horror show, there is little to no compensatory "quality" broadcasting from non-commercial.
The division of the Haves from the Have Nots gets wider. As internet advertising becomes an internal competition, with the other venues collapsing, less of the Web becomes available as a practical matter to those with non-BroadBand.
The demands of more sophisticated (glitzy) ads will drive low-bandwidth sites out of business, since they will not lure the high value click-throughs. More of the bandwidth will be devoted to ever more greedy ads, thus lowering both quality and quantity of content. More bandwidth, on both ends of the wire, gets devoted to non-content.
In TV, there is the FCC (barely, under the Right Wing) to at least keep something of a level playing field. There is no such on the Web. Not coincidentally, the AT&T (nee: SouthWest Bell) web over phone has effectively removed local content.
At some point, the energy equation will kick in (sooner than later): the cost of supporting all those electrons (at both ends of the wire) reduces the numbers who can afford it.
And let's not forget Nicholas Carr's astute article dealing with Web Stoopidity. Rastling on TV, and factoid bits on the Web. A planet of ADHD on the one hand, and reactionary religious zealots on the other.
All brought to you by a couple of kids who claim to not be evil. And people believed this?
But I have huge doubts about single "internet identity" that cannot be switched off. Now, it usually takes some effort to find out the identity of any person with significant online presence. Adequate measures can cover you from being identified as a physical person pretty reasonably — in those cases when it's really needed. Lose this, and you lose many important things, including the feeling of the Internet as a place for more or less free speech. (Care to become a Chinese blogger?)
But the ability to have multiple identities rarely undermines the ability to gain reputation using a certain well-known identity. This reputation is valuable, and would do for most practical cases of collaborative filtering, imho.
Such sites are built by the collective effort of people having access to information and genuinely interested in informing other people, not by some economically interested entities, as such, they are more likely to publish quality news than TV stations which have to pay specialized and expensive people to gather news and pay again heavily to distribute them.
Do you think these two will go out of business because broadband advertising? They don't publish advertising at all. Fact is, keeping a narrowband site up is cheap, whereas keeping a free TV up is expensive, so I expect more quality info for who actually wants to get it on the web than on TV.
I also don't think filtering is what's going to be next. The web is an open, deregulated system for exchanging all sorts of information. If you don't like what one search engine provides, and there are enough other people like you, a new search engine will evolve, searching the way _you_ like it. But this won't be filtering, and it won't drive other search engines out of business, as long as they still have their users/supporters.
I also think you don't mean filtering. You mean something like human-performed indexing. At least that's what the sites you mention are about. In my understanding, filtering would be some app sitting on top of google (or yahoo, for that matter) displaying a subset of all search results, filtered by some rules I at least cannot imagine - after all, if the rules could be implemented in an app, why would google not do so themselves?
I wonder about this proposal. Sites like Artima, Lambda The Ultimate, programming.reddit and others already do some filtering of news and events around programming. So I have a hard time to find anything that has to be supplemented: we have search engines ( a technical process ) that makes the web accessible and we have communities who filter according to their interests.
Maybe I don't share enough the liberal idea of an individual that knows exactly what it wants and endlessly tunes artificial parameters/filters in order to live in a closed bubble of its own self.
Information overflow was a largely debated problem by feuilleton authors in the '90s and it turned out to be none. Maybe there is a pattern: whenever someone suggests to solve a problem and the solution requires AI it turns out to be not a problem at all. The personalized newspaper that contains only the news I'm interested in is such a solution.
Something struck me. People are turning to strangers on blogs as 'trusted' sources of information rather than the news. How could that be?
I think America is extremely cynical about the corporate media circus and is turning away. The web provides a much more efficient way of acquiring information including news. The quality of that information, however, is highly suspect.
These same people seem willing to trust in those who have little or no reputation - what you refer to as "trust points."
What will be the implosion point that creates the sea change to vetted information? Google has Knol that seems to be headed in this direction for knowledge. Could a similar model be profitably applied to the media?
I actually had some insight on this over the weekend. I also came to the conclusion that some sort of intelligent filtering of search results is the only way forward. I thought long and hard about other software systems I'd seen do some pretty amazing tasks along the same lines and came up with a parallel that I haven't seen drawn before. Now, I'm confident that I can not only tell you how it's going to look, but how it could work underneath...
My idea is that you can still use regular keyword search to start. And what does Google currently do if you ask for something (most likely misspelled) with 10 results, but a very similar query would give you 10 million? It pops up a friendly "did you mean...?" link. And most of the time, it helps you get "better" (more targeted) results to click it.
Have you ever seen the web version of 20 questions?? What if we applied this concept to the "did you mean...?" links and took it all a step farther? If you ask for "web design", maybe you show some pages about HTML/CSS/etc and 10 pages down there's a few on different spider web patterns that you're never going to find manually. Now say we've got a little "hepler questions" section on top; maybe it asks you "did you mean 'spider web design' or 'HTML web design'?" and everyone who really wanted more targeted results for the most popular variations would be able to get to them quickly. This is the first step toward getting a keyword based filter running.
The way I understand 20q to work is by a kind of "binary search". You take your total potential answer set (in the above case, all web pages returned for a search of "web design") and try to find a question/feature/keyword that will most evenly divide the answer set in half. In our example, we're assuming that half of the pages contained an additional keyword of "HTML" (and no mention of "spider) and half contained only "spider", so those are the two search term enhancements our filter suggests.
The eventual goal would be to get the filter system to think not only in keywords, but also in more abstract features. You could start with easy ones that might be generated by a spider: "pages crawled before or after [date]", "blog posts or forum posts", "pages with a dark or light background", on and on. The questions would be chained together such that when you're done filtering, you just stop answering the "did you mean..." questions and start using the search results given; at each step though, you're given better and more meaningful results. The hard AI part would be picking the next question to ask - for all I know, this is impossible with a data set as large as Google's. But it could start as simply as an extension of "did you mean...?" and follow an established successful AI model (20q).
So when did you want to hire me to start on it, exactly? :)
Well, that is really a better way of generating the search itself. What I want is not so much better search -- and although your suggestion will certainly help, the ultimate problem of "too much unqualified stuff" will still trump -- but better quality of results. Meaning that somehow you know that the list of possibilities you're looking at has been vetted by someone or something so that you aren't wasting time hunting through things that do you no good.
There are some trillion pages now, and it's growing. Rough search is not going to do the job for much longer.
Bruce said, > So people are going to the web for information, and the > information-seeking machinery has already gotten creaky > and is failing. What do we need next so that we can > continue to solve the problem? > > Not just searching, but filtering. > > I'm not sure how this will work, but we need it. We > have already seen some attempted solutions for the > problem. The most obvious one is people who make lists of > things they have reviewed. If you trust that person, you > look at their list. For example, Kevin Kelly's Cool > Tools is full of interesting things he and his group > have researched; I've bought a number of items he has > recommended. I have also found the Best Free Software > List to be invaluable. Once you find reliable lists > your life becomes vastly easier. > > An essential aspect of recommendations is that you > trust the person who is doing it. The problem here is that > if money changes hands, trust is reduced or eliminated, so > how does the person make money for the time they invest? > Dedicated amateurs are great but they can only do so much. > Google ads might become a sort of disinterested > intermediary, but I have my doubts.
I have difficulty reading what you describe as something other than, "Better Search".
The key component of search is filtering. You're faced with an index of billions of pages, you search, and... ten results. That's filtering.
What about trust and recommendations? That's what Google's algorithm *does*; that's what set Google apart from the rest. "This link is trusted more than that link, this page recommends these other pages..." By this definition, Google isn't really a search company, but a filtering one. It filters a network of related pages that reference one another - that (usually) don't link to sites they don't know or trust.
Sure, their automated trust model and filtering techniques may grow creaky with age and smart attackers, but that doesn't change the nature of what they do.
Google doesn't rely on a rigorous trust model, but it doesn't have to be perfect. It just has to be good enough. You could hook it up to a social network style human trust model ("Search on only pages my friends have visited"), but I'm not sure that would ever achieve the desired effect.
Moving the evaluation of trust from web-pages (is this web-page trusted?) to people (is this person trusted?) doesn't solve anything, it just moves the problem. Of course, web-pages are much easier to evaluate, because they already have a strong digital identity.
Web-pages are already excellent reflections of their authors (company site X => company X, forum X => community X). The one exception to this is when someone is gaming the system, but then they'd do that with digital profiles, too.
d potter said: "The hard AI part would be picking the next question to ask [...]"
Have the newfangled search users themselves invent--and answer--the narrowing questions, as in http://www.rubyquiz.com/quiz15.html. However, just like the old trick of gaming a web page's "keywords," this runs the risk of abuse by SEOs and spammers.
>list of possibilities you're looking at has been vetted by someone or something so that you aren't wasting time hunting through things that do you no good.
i see what you mean, and in thinking about "personalized search", that was always my assumption on how it would have to be done. my opinion now is that unless you've explained to such a filtering system very thoroughly exactly what subcategories and features differentiate results that would "do you no good" from those you want, your filtering is going to be no better than keyword search. and any system optimized to "know you" (by passing all your web traffic through bayesian analyzers?) well enough to auto-categorize what you "probably meant" still isn't going to handle novel input very well.
honestly, a combination of "personalized" search and "20q" might be best: where the filtering questions are prioritized to reflect more of the features/keywords you've used to filter on in the past. maybe a "social" filtered search could also prioritize features submitted/used most by your friends.
but i don't see a model where you try to describe everything up front and click the equivalent of "i'm feeling lucky" working as well as an iterative question/answer model that eventually leads you down a narrowing path to the right site, however those questions might be generated.
initially, thinking about how to secure that made my brain start hurting. you can't even make a vote-up model un-riggable (e.g.: digg.com). then i realized that to make your question usable in general across the entire google index, you'd essentially have to have the question submitter classify each page google's ever indexed in terms of their question!! (this is another way to move the idea toward "personalized" search). i'm not sure this is possible for a human to do. so i simplified my mental model to just add keywords automatically based on the statistics alone and it started seeming feasible.
now that i've had to explain the model here, i'm wondering if it might be necessary to tie a question to a certain keyword or search term ("web design" for example is the only term that will ever care to ask "did you mean 'web page design' or 'spider web design'?) it would give the question-author less pages to have to classify...
in any event, another thought experiment: how could a spammer try to use a user generated question to their advantage? they'd want a generic "web design" search to go straight to their "web design" ad-farm page, right? so their question is going to take all the results for "web design" and break them into two categories, one with 99.9% of the results and one with just the single page they want to send you to. if you've done your binary search correctly, though, that question/feature will never be used because it splits the pages so unevenly. i imagine people with large link farms might find a way to game this eventually, and greifers/spammers can try to use it to push brand names/penny stocks, but Google already does a pretty good job of filtering those results from their indexes and email boxes. maybe they can make it work here too.
Flat View: This topic has 16 replies
on 2 pages