This post originated from an RSS feed registered with Java Buzz
by Michael Cote.
Original Post: Cheap Search
Feed Title: Cote's Weblog: Coding, Austin, etc.
Feed URL: https://cote.io/feed/
Feed Description: Using Java to get to the ideal state.
Somehow, I got to thinking about how you could get really cheap search for intranets. Good search feels like a sort of "web right," which is to say, any web (intranet or otherwise) deserves it. Large companies certainly have the doe to pay for something fancy like a Google mini or Appliance; though, they can take forever to go through the approval process. While smaller companies could quickly "approve" such a buy, they often have more important things to spend $3-50,000 on than searching their intranet.
So, I've been trying to kick around some models to exploit those constraints.
The Appliance
We have a Google mini at work, and it's worth it's weight in gold. Like most big companies, we've got piles of data sitting around on our Intranet. Until the mini, there wasn't a quick and easy way to search all of it. Now, everyone loves the thing, and it makes finding data in that unstructured pile easy.
The price tag for the mini is around $3,000. So, if you could put together an appliance for $500-1,000, it seems like you could cater to a smaller market: one where there's only 10-20,000 documents that need indexing instead of the mini's 100,000.
The boxes would be cheap grey-box hardware (or racks?), with an install of Nutch. Last I used Nutch, it was pretty damn nice and looked like it needed just a few additions (like a friendly UI) to make it something anyone could use and configure. I have no idea about the hardware: you'd just want some large disks and something that wouldn't burn-up. I don't really know hardware, though, so maybe the idea of getting the right grade of hardware cheap enough is ludicrous.
Zero Install Search
Instead of selling hardware, you could take another angle and offer search as a service. Instead of having to install a whole appliance on their network, companies could install a small software agent that pumped data back up to a centralized server outside their firewall, hosted by the search services.
This would, of course, (1.) eat up a lot of bandwidth transferring all that data, and, (2.) freak-out people who worry about sending data across their firewalls. But, again, for smaller businesses and organizations, the drawbacks may be acceptable for the price. You would, of course, secure everything as much as possible.
Since there isn't hardware to deliver -- there's plenty of server hardware that'd need care and feeding, along with the bandwidth charges -- I'd think the price would be cheaper. And, you could scale the price up as needed. You could start it at $20-50/month, and increase the price as their # of pages increased.