More Tim Bray "On Search" articles
I'm a fan of Tim Bray's On Search series of articles, where he's exploring various topics about building search engines, primarily in the web context. He's got several new articles in the series since I last looked, and I particularly think the ones on interfaces and XML - they're great reading. The next time I teach a course on searching, I'll use these articles as the text.
From the interfaces article:
"I think Lucene’s API is well thought out, but there are not one but two elephants in the room that it’s trying hard to ignore. The first is the fact that the world contains many who are not members of the Church of Java, who will be left cool by the notion of, for example, “converting from text from ajava.io.Reader into a TokenStream.” The second elephant is the Web. Suppose I don’t want to write Java code, I just want to tell my website to index this directory. A pure Web interface would solve both these problems..."
From the XML article:
"people don’t want to compose queries and do flexible, powerful structure-sensitive searches. As I’ve written here previously, people in general want to type the minimal number of keystrokes into a search window and say Go, and have the system figure it out for them. Secondly, descriptive markup is a form of metadata, and there is no cheap metadata, and XML is no exception. If your text inventory is in Word or HTML, XMLifying it in any useful way is going to be very, very expensive. Which is to say, XML may not be cost-effective strictly in terms of making search run better."

Leave a comment