Over in the UK, they are trying to filter out illegal content at the ISP level. The problem? The blacklist filters cast way, way too big a net over things:
According to multiple customers of Demon Internet - now owned by Brit telecom Thus - the London-based ISP is blocking access to all sites stored in the archive. When they query the Wayback Machine, hoping to retrieve archived pages, customers are met with generic "not found" error pages. But judging from their urls, these pages are generated by a web filter based on the blacklist compiled by the Internet Watch Foundation, a government-backed organization charged with policing online pornography.
This is where well intentioned - but too simple - schemes go awry. I've seen this kind of thing myself. I have a simple minded filter for comments on the blogs here, and it's been known to block legitimate comments based on accidental matches against poorly chosen keywords. Basically, when you decide to filter, you have to decide what level of false positive you're willing to put up with. Sure, Baysian filters do a better job - but heck, even there, I have to continually go in and check the junk folder. For awhile, my mail client had decided that everything our company President sent was spam. None of these systems are perfect.
Technorati Tags:
spam, filtering